103_tokenization.zig 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
  1. //
  2. // The functionality of the standard library is becoming increasingly
  3. // important in Zig. On the one hand, it is helpful to look at how
  4. // the individual functions are implemented. Because this is wonderfully
  5. // suitable as a template for your own functions. On the other hand,
  6. // these standard functions are part of the basic equipment of Zig.
  7. //
  8. // This means that they are always available on every system.
  9. // Therefore it is worthwhile to deal with them also in Ziglings.
  10. // It's a great way to learn important skills. For example, it is
  11. // often necessary to process large amounts of data from files.
  12. // And for this sequential reading and processing, Zig provides some
  13. // useful functions, which we will take a closer look at in the coming
  14. // exercises.
  15. //
  16. // A nice example of this has been published on the Zig homepage,
  17. // replacing the somewhat dusty 'Hello world!
  18. //
  19. // Nothing against 'Hello world!', but it just doesn't do justice
  20. // to the elegance of Zig and that's a pity, if someone takes a short,
  21. // first look at the homepage and doesn't get 'enchanted'. And for that
  22. // the present example is simply better suited and we will therefore
  23. // use it as an introduction to tokenizing, because it is wonderfully
  24. // suited to understand the basic principles.
  25. //
  26. // In the following exercises we will also read and process data from
  27. // large files and at the latest then it will be clear to everyone how
  28. // useful all this is.
  29. //
  30. // Let's start with the analysis of the example from the Zig homepage
  31. // and explain the most important things.
  32. //
  33. // const std = @import("std");
  34. //
  35. // // Here a function from the Standard library is defined,
  36. // // which transfers numbers from a string into the respective
  37. // // integer values.
  38. // const parseInt = std.fmt.parseInt;
  39. //
  40. // // Defining a test case
  41. // test "parse integers" {
  42. //
  43. // // Four numbers are passed in a string.
  44. // // Please note that the individual values are separated
  45. // // either by a space or a comma.
  46. // const input = "123 67 89,99";
  47. //
  48. // // In order to be able to process the input values,
  49. // // memory is required. An allocator is defined here for
  50. // // this purpose.
  51. // const ally = std.testing.allocator;
  52. //
  53. // // The allocator is used to initialize an array into which
  54. // // the numbers are stored.
  55. // var list = std.ArrayList(u32).init(ally);
  56. //
  57. // // This way you can never forget what is urgently needed
  58. // // and the compiler doesn't grumble either.
  59. // defer list.deinit();
  60. //
  61. // // Now it gets exciting:
  62. // // A standard tokenizer is called (Zig has several) and
  63. // // used to locate the positions of the respective separators
  64. // // (we remember, space and comma) and pass them to an iterator.
  65. // var it = std.mem.tokenizeAny(u8, input, " ,");
  66. //
  67. // // The iterator can now be processed in a loop and the
  68. // // individual numbers can be transferred.
  69. // while (it.next()) |num| {
  70. // // But be careful: The numbers are still only available
  71. // // as strings. This is where the integer parser comes
  72. // // into play, converting them into real integer values.
  73. // const n = try parseInt(u32, num, 10);
  74. //
  75. // // Finally the individual values are stored in the array.
  76. // try list.append(n);
  77. // }
  78. //
  79. // // For the subsequent test, a second static array is created,
  80. // // which is directly filled with the expected values.
  81. // const expected = [_]u32{ 123, 67, 89, 99 };
  82. //
  83. // // Now the numbers converted from the string can be compared
  84. // // with the expected ones, so that the test is completed
  85. // // successfully.
  86. // for (expected, list.items) |exp, actual| {
  87. // try std.testing.expectEqual(exp, actual);
  88. // }
  89. // }
  90. //
  91. // So much for the example from the homepage.
  92. // Let's summarize the basic steps again:
  93. //
  94. // - We have a set of data in sequential order, separated from each other
  95. // by means of various characters.
  96. //
  97. // - For further processing, for example in an array, this data must be
  98. // read in, separated and, if necessary, converted into the target format.
  99. //
  100. // - We need a buffer that is large enough to hold the data.
  101. //
  102. // - This buffer can be created either statically at compile time, if the
  103. // amount of data is already known, or dynamically at runtime by using
  104. // a memory allocator.
  105. //
  106. // - The data are divided by means of Tokenizer at the respective
  107. // separators and stored in the reserved memory. This usually also
  108. // includes conversion to the target format.
  109. //
  110. // - Now the data can be conveniently processed further in the correct format.
  111. //
  112. // These steps are basically always the same.
  113. // Whether the data is read from a file or entered by the user via the
  114. // keyboard, for example, is irrelevant. Only subtleties are distinguished
  115. // and that's why Zig has different tokenizers. But more about this in
  116. // later exercises.
  117. //
  118. // Now we also want to write a small program to tokenize some data,
  119. // after all we need some practice. Suppose we want to count the words
  120. // of this little poem:
  121. //
  122. // My name is Ozymandias, King of Kings;
  123. // Look on my Works, ye Mighty, and despair!
  124. // by Percy Bysshe Shelley
  125. //
  126. //
  127. const std = @import("std");
  128. const print = std.debug.print;
  129. pub fn main() !void {
  130. // our input
  131. const poem =
  132. \\My name is Ozymandias, King of Kings;
  133. \\Look on my Works, ye Mighty, and despair!
  134. ;
  135. // now the tokenizer, but what do we need here?
  136. var it = std.mem.tokenizeAny(u8, poem, ???);
  137. // print all words and count them
  138. var cnt: usize = 0;
  139. while (it.next()) |word| {
  140. cnt += 1;
  141. print("{s}\n", .{word});
  142. }
  143. // print the result
  144. print("This little poem has {d} words!\n", .{cnt});
  145. }