Binary operations

Compared with the unary operation in the previous section, although the binary operation only has one more operand, it introduces many problems, mainly including BNF left recursion, priority, operand type, and evaluation order, etc.

BNF Left Recursive

The complete syntax of the binary operation statement in Lua is as follows:

  1. exp ::= nil | false | true | Numeral | LiteralString | '...' | functiondef |
  2. prefixexp | tableconstructor | exp binop exp | unop exp

For simplicity, the other parts are simplified to OTHERS, then we get:

  1. exp ::= exp binop exp | OTHERS

It is a left recursion rule, we need to eliminate left recursion according to the method introduced before, and get:

  1. exp ::= OTHERS A'
  2. A' := binop exp A' | Epsilon

The previous exp() function only implemented the OTHERS part of the first line above, and now we need to add the A' part of the second line, which is also a recursive reference, which is implemented using a loop. Modify the exp() function structure as follows:

```rust, ignore fn exp(&mut self) -> ExpDesc { // OTHERS let mut desc = match self. lex. next() { // The original various OTHERS type processing is omitted here };

  1. // A' := binop exp A' | Epsilon
  2. while is_binop(self. lex. peek()) {
  3. let binop = self.lex.next(); // operator
  4. let right_desc = self.exp(); // second operand
  5. desc = self. process_binop(binop, desc, right_desc);
  6. }
  7. desc
  8. }
  1. Among them, the second operand right_desc is also recursively called `exp()` function to read, which leads to a problem: priority.
  2. ## Priority
  3. In the unary operation statement in the previous section, the `exp()` function is also called recursively to read the operand, but because there is only one operand, so no need for priority. Or we can say that all unary operators have the same priority. And unary operators are right associative. For example, the following two examples of consecutive unary operations are executed in order from right to left, regardless of the specific operator:
  4. - `~ -10`, take negative first, then invert bit by bit,
  5. - `- ~10`, first bitwise invert, then negative.
  6. But for the binary operation statement, it is necessary to consider the priority. For example, the following two statements:
  7. - `a + b - c`, perform the previous addition first, and then perform the subsequent subtraction,
  8. - `a + b * c`, perform the subsequent multiplication first, and then perform the previous addition.
  9. Corresponding to the `exp()` function code above, the `OTHERS` part at the beginning reads the first operand `a`; then reads the operator `+` in the `while` loop; and then calls the `exp()` function recursively to read the right operand, so it needs to be calculated at this time. Also take the above two sentences as an example:
  10. - `a + b - c`, end after reading `b` and use it as the right operand; then perform addition `a + b`; and then loop through the following `- c` part again;
  11. - `a + b * c`, after reading `b`, continue down, read and execute the entire `b * c` and use the execution result as the right operand; then perform addition; and end the loop.
  1. - +

/ \ / \

  • c a * / \ / \ a b b c ```

So in syntax analysis, how to judge which of the above situations is the case? After reading b, should we stop parsing and calculate addition first, or continue parsing? It depends on the priorities of the next operator and the current operator:

  • When the priority of the next operator is not greater than the current operator, it is the first case, stop parsing and complete the current operation first;
  • When the priority of the next operator is greater than the current operator, it is the second case and needs to continue parsing.

For this, refer to the list of all operator precedence in the Lua language:

  1. or
  2. and
  3. < > <= >= ~= ==
  4. |
  5. ~
  6. &
  7. << >>
  8. ..
  9. + -
  10. * / // %
  11. unary operators (not # - ~)
  12. ^

From top to bottom, the priority becomes higher. The connectors .. and exponentiation ^ are right associative, and other operators are left associative. In the judging rules listed above, parsing is stopped (instead of continuing parsing) for cases of equal priority, so the default is left associative. Therefore, special treatment is required for two right-associated operators, that is, different priorities are defined for them to the left and to the right, and the one to the left is higher, which will become a right-association.

In summary, define the priority function:

```rust, ignore {{#include ../listing/ch05.arithmetic/src/parse.rs:binop_pri}}

  1. For Tokens that are not binary operators, `-1` is returned, which is the lowest priority, and parsing can be stopped no matter what the current operator is. According to Rust's customary practice, this function should return `Option<(i32, i32)>` type, and then return `None` for tokens that are not binary operators. But it is simpler to return `-1` at the calling place, and there is no need to process Option one more time.
  2. This function appears to be a property of the `Token` type, so it seems to be a suitable method defined as `Token`. But `Token` type is defined in `lex.rs`; while priority is a concept of syntax, it should be implemented in `parse.rs`. The Rust language does not allow methods to be added to a type's non-defining file. So the above function is defined as an ordinary function in the `parse.rs` file (rather than the method of `ParseProto` like other functions).
  3. Now, according to the priority, modify the `exp()` function again:
  4. ```rust, ignore
  5. fn exp(&mut self) -> ExpDesc {
  6. self.exp_limit(0)
  7. }
  8. fn exp_limit(&mut self, limit: i32) -> ExpDesc {
  9. // OTHERS
  10. let mut desc = match self. lex. next() {
  11. // The original various OTHERS type processing is omitted here
  12. };
  13. // A' := binop exp A' | Epsilon
  14. loop {
  15. let (left_pri, right_pri) = binop_pri(self. lex. peek());
  16. if left_pri <= limit {
  17. return desc; // stop parsing
  18. }
  19. // continue parsing
  20. let binop = self. lex. next();
  21. let right_desc = self.exp_limit(right_pri);
  22. desc = self. process_binop(binop, desc, right_desc);
  23. }
  24. }

First, add a limit parameter to exp(), as the priority of the current operator, and limit the subsequent parsing range. However, this parameter belongs to the internal concept of the statement, and the caller of this function does not need to know this parameter; therefore, the actual processing function exp_limit() is added, and exp() is turned into an outer encapsulation function, using limit=0 to call the former. The reason why the initial call uses limit=0 is that 0 is less than any binary operator priority defined in the binop_pri() function, so the first operator will continue to be parsed (rather than return to exit the loop ); but 0 is greater than the priority -1 of the non-operator, so if it is followed by the non-operator, it will also exit normally.

The above parsing code combines loops and recursive calls, which is very difficult for those who are not familiar with the algorithm (like me), and it is difficult to write the complete code directly. However, according to the BNF specification after eliminating left recursion, the loop and recursion can be completed, and then the function can be easily completed according to the priority and conditional exit.

In addition, it should be noted that unary operators are also listed in the operator precedence table above, so when parsing unary operation statements in the previous section, the exp() function cannot be used when reading the operand expression (initial Priority 0), instead specify an initial priority of 12:

```rust, ignore {{#include ../listing/ch05.arithmetic/src/parse.rs:exp_unop}}

  1. The priority of the exponentiation operation `^` is actually higher than that of the unary operator, so the execution order of the statement `-a^10` is: first exponentiation, and then negation.
  2. ## Evaluation Order
  3. There is a very subtle bug in the parsing code above, which concerns the order in which the operands are evaluated.
  4. The processing of each operand requires 2 steps: first call the `exp()` function to read the operand and return ExpDesc, and then call the `discharge()` function to discharge the operand to the stack for bytecode operation. The binary operation has 2 operands, so a total of 4 steps are required. Now discuss the sequence of these 4 steps.
  5. According to the processing logic of the binary operation in the `exp()` function of the current version:
  6. - read the first operand first, `desc`;
  7. - After judging that it is a binary operation, call `exp_limit()` recursively, and read the second operand, `right_desc`;
  8. - Then discharge the ExpDesc of the above two operands to the stack in the `process_binop()` function.
  9. Simplified is:
  10. - parse the first operand;
  11. - parse the second operand;
  12. - discharge the first operand;
  13. - discharge the second operand.
  14. During the parsing and discharge stages, bytecode may be generated. So in this order, the bytecodes related to the two operands may be interspersed. Like the following example:
  15. ```lua
  16. local a = -g1 + -g2

Ignoring the previous local variable definition, and ignoring the operation of undefined global variables will throw an exception. Here, the focus is only on the subsequent addition statement. Generates the following bytecode sequence with the current version of the interpreter:

  1. constants: ['g1', 'g2']
  2. byte_codes:
  3. GetGlobal(0, 0) # parse the first operand
  4. GetGlobal(1, 1) # parse the second operand
  5. Neg(2, 0) # discharge the first operand
  6. Neg(3, 1) # discharge the second operand
  7. Add(0, 2, 3)

It can be seen that the bytecodes related to the two operands are interspersed here. In this example, interleaving is fine. But in some cases, parsing the second operand will affect the evaluation of the first operand, and interleaving will cause problems at this time. Like the following example:

  1. local t = { k = 1 }
  2. local function f(t) t.k = 100; return 2 end -- modify the value of t.k
  3. local r = t.k + f(t)*3

For the last sentence, we expected 1 + 2*3, but if we follow the current order of evaluation:

  1. First parse the left operand t.k to generate ExpDesc::IndexField, but not discharge;
  2. Then parse the right operand f(t)*2, and execute f(t) during the parsing process, thus modifying the value of t.k to 100;
  3. Then discharge the left operationNumber, generate GetField bytecode, but at this time t.k has been modified by the previous step! Here comes the error. What is actually executed is 100 + 2*3.

In summary, we need to ensure that the bytecodes of the two operands cannot be interspersed! Then modify the exp_limit() function as follows:

```rust, ignore fn exp_limit(&mut self, limit: i32) -> ExpDesc { // The original various OTHERS type processing is omitted here

  1. loop {
  2. // Omit the processing of judging the priority
  3. // discharge the first operand! ! !
  4. if !matches!(desc, ExpDesc::Integer(_) | ExpDesc::Float(_) | ExpDesc::String(_)) {
  5. desc = ExpDesc::Local(self. discharge_top(desc));
  6. }
  7. // continue parsing
  8. let binop = self. lex. next();
  9. let right_desc = self.exp_limit(right_pri); // parse the second operand
  10. desc = self. process_binop(binop, desc, right_desc);
  11. }
  12. }
  1. Discharge the first operand onto the stack before parsing the second operand. However, this is not necessary for constant types, because:
  2. - the constant will not be affected by the second operand as in the above example;
  3. - Constants are also to be directly folded in subsequent attempts.
  4. So far, the transformation of `exp_limit()` function for binary operation syntax analysis has been completed. As for the specific processing `process_binop()` function of the binary operation, it is introduced below.
  5. ## Bytecode
  6. The unary operation introduced in the previous section has only one operand, which can be divided into two cases: constants and variables. Constants are evaluated directly, and variables generate bytecodes. So each unary operation has only one bytecode. Binary operations are more complicated because they involve 2 operands.
  7. First of all, although binary operators are mostly numerical calculations, because Lua's metatable is similar to operator overloading, other types of constants (such as strings, bools, etc.) may be legal operands. When parsing unary operations, these types of constants will directly report an error, but for binary operations, it needs to be executed at the execution stage to determine whether it is legal.
  8. Secondly, if both operands are constants of numeric type (integer and floating point), then the result can be directly calculated during syntax analysis, which is called constant folding.
  9. Otherwise, bytecode is generated and executed by the virtual machine. Similar to [Read Global Variables](./ch02-00.variables.md) and [Read Table](./ch04-05.table_rw_and_bnf.md) operations that have been supported before, each binary operator is also set to 3 types of right operands: variables on the stack, constants, and small integers.
  10. The left operand is uniformly discharged to the stack, because it is rare for the left operand to be a constant. If we also add corresponding bytecodes for constants and small integer types, such as `10-a`, then there are too many bytecode types.
  11. Finally, for addition and multiplication that satisfy the commutative law, if the left operation is a constant, then it can be exchanged. For example, `10+a` can be converted to `a+10` first. Since the right operand `10` is a small integer, it can be use `AddInt` bytecode then.
  12. ## ExpDesc
  13. Similar to the new ExpDesc type introduced by the unary operation introduced in the previous section, the binary operation also needs a new type because it has one more operand:
  14. ```rust, ignore
  15. enum ExpDesc {
  16. UnaryOp(fn(u8,u8)->ByteCode, usize), // (opcode, operand)
  17. BinaryOp(fn(u8,u8,u8)->ByteCode, usize, usize), // (opcode, left-operand, right-operand)

Syntax analysis

So far, the basic requirements of the binary operation statement have been introduced. Let’s look at the code implementation, that is, the process_binop() function called in the exp() function:

```rust, ignore fn process_binop(&mut self, binop: Token, left: ExpDesc, right: ExpDesc) -> ExpDesc { if let Some(r) = fold_const(&binop, &left, &right) { // constant fold return r; }

  1. match binop {
  2. Token::Add => self.do_binop(left, right, ByteCode::Add, ByteCode::AddInt, ByteCode::AddConst),
  3. Token::Sub => self.do_binop(left, right, ByteCode::Sub, ByteCode::SubInt, ByteCode::SubConst),
  4. Token::Mul => self.do_binop(left, right, ByteCode::Mul, ByteCode::MulInt, ByteCode::MulConst),
  5. // omit more types
  6. }
  7. }
  1. Try constant folding first. This part of the function is introduced in the next section because it involves the processing of integer and floating point types. Because the two operands are not necessarily constants, they may not be able to be folded. If the fold is not successful, then the operator and the two operands will be used later, so the `fold_const()` function here can only pass in references.
  2. If it is not a constant and cannot be folded, then call the `do_binop()` function to return ExpDesc. Here, the enum tag is used as a function, which has been introduced [before](./ch04-04.expdesc_rewrite.md#table_constructor), and will not be introduced here.
  3. Let's look at the `do_binop()` function:
  4. ```rust, ignore
  5. {{#include ../listing/ch05.arithmetic/src/parse.rs:do_binop}}

First, judge if it is addition or multiplication, and the left operand is a numeric constant, then exchange the two operands, so that the bytecode of xxCoust or xxInt can be generated later.

Then, discharge the left operand onto the stack;

Then, judge whether the type of the right operand is a numeric constant, or discharge it to the stack.

Finally, ExpDesc::BinaryOp is generated.

So far, the grammatical analysis of the binary operation statement is basically completed.

Integer and Float

So far, we have introduced the general analysis process of binary operations, but there is still a detail, that is, the different processing rules for integer and floating point types. Since there is a lot of content in this aspect, and it is relatively independent from the above-mentioned main analysis process, it will be introduced separately in the next section.