More Types

This section adds simple types, including boolean, integer, and float. Other types such as Table and UserData will be implemented in subsequent chapters.

We first improve the lexical analysis to support the tokens corresponding to these types, and then generate the corresponding bytecodes through the syntax analysis, and add support for these bytecodes in the virtual machine. Finally we modify the function call to support printing these types.

Improve Lexical Analysis

The lexical analysis in the previous chapter only supports 2 tokens. So now no matter what features are added, the lexical analysis must be changed first to add the corresponding Tokens. In order to avoid adding tokens piecemeal in each chapter in the future, it is now added here in one go.

The Lua official website lists the complete lexical conventions. It includes:

  • Name, which has been implemented before, is used for variables, etc.

  • Constants, including string, integer, and floating-point constants.

  • Keywords:

    1. and break do else else if end
    2. false for function goto if in
    3. local nil not or repeat return
    4. then true until while
  • Symbols:

    1. + - * / % ^ #
    2. & ~ | << >> //
    3. == ~= <= >= < > =
    4. ( ) { } [ ] ::
    5. ; : , . .. ...

The corresponding Token is defined as:

```rust, ignore {{#include ../listing/ch02.variables/src/lex.rs:token}}

  1. The specific implementation is nothing more than tedious string parsing, which is skipped here. For the sake of simplicity, this implementation only supports most simple types, but does not support complex types such as long strings, long comments, string escapes, hexadecimal numbers, and floating point numbers only support scientific notation Law. These do not affect the main features to be added later.
  2. ## Type of Values
  3. After lexical analysis supports more types, we add these types to Value:
  4. ```rust, ignore
  5. {{#include ../listing/ch02.variables/src/value.rs:value}}

One of the special places is that the debug mode is used for the output of floating-point numbers: {:?}. Because Rust’s common output format {} for floating-point numbers is integer + decimal format, and a more reasonable way should be to choose a more suitable one between “integer decimal” and “scientific notation”, corresponding to %g in C language’s printf. For example, it is unreasonable to output "0.000000" for the number 1e-10. This seems to be a historical issue of Rust. For compatibility and other reasons, only the debug mode {:?} can be used to correspond to %g. I don’t get into it here.

In addition, in order to facilitate the distinction between “integer” and “floating point number without decimal part”, in Lua’s official implementation, .0 will be added after the latter. For example, 2 will be output as 2.0 for the floating point number. The code is as follows. This is so sweet. And this is also the default behavior of Rust’s {:?} mode, so we don’t need special handling for this.

  1. if (buff[strspn(buff, "-0123456789")] == '\0') { /* looks like an int? */
  2. buff[len++] = lua_getlocaledecpoint();
  3. buff[len++] = '0'; /* adds '.0' to result */
  4. }

Before Lua 5.3, Lua has only one numeric type, the default is floating point. I understand this because Lua was originally intended for configuration files, for users rather than programmers. For ordinary users, the concepts of integer and floating point numbers are not distinguished, and there is no difference between configuring 10 seconds and 10.0 seconds; in addition, for some calculations, such as 7/2, the result is obviously 3.5 and Not 3. However, with the expansion of Lua’s use, for example, as a glue language between many large programs, the demand for integers has become increasingly strong, so integers and floating-point numbers are distinguished at the language level.

Syntax Analysis

Now we add support for these types in the parser. Since currently only the function call statement is supported, that is, the format of function parameter; and the “function” only supports global variables, so this time only the “parameter” part needs to support these new types. For function calls in Lua voice, if the parameter is a string constant or a table structure, then the parentheses () can be omitted, as in the “hello, world!” example in the previous chapter. But for other cases, such as several new types added this time, brackets () must be required. So the modification of the parameter part is as follows:

```rust, ignore Token::Name(name) => { // function, global variable only let ic = add_const(&mut constants, Value::String(name)); byte_codes.push(ByteCode::GetGlobal(0, ic as u8));

  1. // argument, (var) or "string"
  2. match lex. next() {
  3. Token::ParL => { // '('
  4. let code = match lex. next() {
  5. Token::Nil => ByteCode::LoadNil(1),
  6. Token::True => ByteCode::LoadBool(1, true),
  7. Token::False => ByteCode::LoadBool(1, false),
  8. Token::Integer(i) =>
  9. if let Ok(ii) = i16::try_from(i) {
  10. ByteCode::LoadInt(1, ii)
  11. } else {
  12. load_const(&mut constants, 1, Value::Integer(i))
  13. }
  14. Token::Float(f) => load_const(&mut constants, 1, Value::Float(f)),
  15. Token::String(s) => load_const(&mut constants, 1, Value::String(s)),
  16. _ => panic!("invalid argument"),
  17. };
  18. byte_codes. push(code);
  19. if lex.next() != Token::ParR { // ')'
  20. panic!("expected `)`");
  21. }
  22. }
  23. Token::String(s) => {
  24. let code = load_const(&mut constants, 1, Value::String(s));
  25. byte_codes. push(code);
  26. }
  27. _ => panic!("expected string"),
  28. }

}

  1. This code first parses the function. Like the code in the previous chapter, it still only supports global variables. Then parse the parameters. In addition to the support for string constants, a more general way of parentheses `()` is added. Which handles various type constants:
  2. - Floating-point constants, similar to string constants, call the `load_const()` function, put it in the constant table at compile time, and then load it through `LoadConst` bytecode during execution.
  3. - Nil and Boolean types, there is no need to put Nil, true and false in the constant table. It is more convenient to encode directly into bytecode, and it is faster at execution time (because there is one less memory read). So `LoadNil` and `LoadBool` bytecodes are added.
  4. - Integer constants combine the above two approaches. Because a bytecode has 4 bytes, the opcode occupies 1 byte, the destination address occupies 1 byte, and there are 2 bytes left, which can store the integer of `i16`. Therefore, for numbers in the range of `i16` (this is also a high probability event), it can be directly encoded into the bytecode, and the `LoadInt` bytecode is added for this purpose; if it exceeds the range of `i16`, it is stored in the constant table . This is also the official implementation of Lua for reference. From this we can see Lua's pursuit of performance, it adds a bytecode and process codes in order to reduce a memory access only. We will see many such cases in the future.
  5. Since only the function call statment is currently supported, the function is fixed at the `0` position of the stack during execution, and the parameter is fixed at the `1` position. The target addresses of the above bytecodes are also fixedly filled with `1`.
  6. The main code has been introduced. The definition of the function `load_const()` used to generate `LoadConst` bytecode is listed below:
  7. ```rust, ignore
  8. fn add_const(constants: &mut Vec<Value>, c: Value) -> usize {
  9. constants. push(c)
  10. }
  11. fn load_const(constants: &mut Vec<Value>, dst: usize, c: Value) -> ByteCode {
  12. ByteCode::LoadConst(dst as u8, add_const(constants, c) as u8)
  13. }

Test

So far, the parsing process has completed the support for new types. The rest of the virtual machine execution part just supports the newly added bytecode LoadInt, LoadBool and LoadNil. Skip it here.

Then you can test the following code:

  1. {{#include ../listing/ch02.variables/test_lua/types.lua}}

The output is as follows:

  1. [src/parse.rs:64] &constants = [
  2. print,
  3. print,
  4. print,
  5. print,
  6. 123456,
  7. print,
  8. 123456.0,
  9. ]
  10. byte_codes:
  11. GetGlobal(0, 0)
  12. LoadNil(1)
  13. Call(0, 1)
  14. GetGlobal(0, 0)
  15. LoadBool(1, false)
  16. Call(0, 1)
  17. GetGlobal(0, 0)
  18. LoadInt(1, 123)
  19. Call(0, 1)
  20. GetGlobal(0, 0)
  21. LoadConst(1, 1)
  22. Call(0, 1)
  23. GetGlobal(0, 0)
  24. LoadConst(1, 2)
  25. Call(0, 1)
  26. nil
  27. false
  28. 123
  29. 123456
  30. 123456.0

There is a small problem left over from the last chapter, that is, print appears many times in the constant table. This needs to be modified to check whether it already exists every time a constant is added.

Add Constants

Modify the add_const() function above as follows:

```rust, ignore fn add_const(constants: &mut Vec, c: Value) -> usize { constants.iter().position(|v| v == &c) .unwrap_or_else(|| { constants. push(c); constants.len() - 1 }) }

  1. `constants.iter().position()` positions the index. Its parameter is a [closure](https://doc.rust-lang.org/stable/book/ch13-01-closures.html), which needs to compare two `Value`, for which `Value` needs to be implemented `PartialEq` trait:
  2. ```rust, ignore
  3. {{#include ../listing/ch02.variables/src/value.rs:peq}}

Here we think that two integers and floating point numbers that are numerically equal are different, such as Integer(123456) and Float(123456.0), because these are indeed two values, and the two cannot be combined when dealing with constant tables value, otherwise in the test code in the previous section, the last line will also load the integer 123456.

But during Lua execution, these two values are equal, that is, the result of 123 == 123.0 is true. We will deal with this issue in a later chapter.

Going back to the position() function, its return value is Option<usize>, Some(i) means found, and returns the index directly; while None means not found, you need to add a constant first, and then return the index. According to the programming habit of C language, it is the following if-else judgment, but here we try to use a more functional way. Personally, I feel that this method is not clearer, but since you are learning Rust, try to use the Rust method first.

```rust, ignore if let Some(i) = constants.iter().position(|v| v == &c) { i } else { constants. push(c); constants.len() - 1 }

  1. After completing the transformation of the `add_const()` function, duplicate values can be avoided in the constant table. The relevant output is intercepted as:

[src/parse.rs:64] &constants = [ print, 123456, 123456.0, ] ```

Although the above will check for duplicates when adding constants, the check is done by traversing the array. The time complexity of adding all constants is O(N^2). If a Lua code segment contains a lot of constants, such as 1 million, the parsing will be too slow. For this we need a hash table to provide fast lookups. TODO.