Define and Call
Our interpreter only supported sequential execution at first, and later added control structures to support conditional jumps, and blocks also make the scope of variables. Functions, on the other hand, exist more independently in terms of resolution, execution, or scope. To do this, the current framework for parsing and virtual machine execution needs to be modified.
Transform ParseProto
The definition of a function can be nested, that is, the function can be defined again inside another function. If the entire code is regarded as the main function, then our current syntax analysis is equivalent to only supporting this one function. In order to support nested definitions of functions, the parsing needs to be modified. First transform the data structure.
Currently, the context structure of the parsing process is ParseProto
, and this is also the structure returned to the virtual machine for execution. It is defined as follows:
```rust, ignore
pub struct ParseProto
sp: usize,
locals: Vec<String>,
break_blocks: Vec<Vec<usize>>,
continue_blocks: Vec<Vec<(usize, usize)>>,
gotos: Vec<GotoLabel>,
labels: Vec<GotoLabel>,
lex: Lex<R>,
}
The specific meaning of each field has been introduced in detail before, and will be ignored here. Here only the fields are distinguished according to the independence of the function:
- the final `lex` field is parsed throughout the code;
- All remaining fields are data inside the function.
In order to support nested definitions of functions, the global part (`lex` field) and the function part (other fields) need to be disassembled. The newly defined data structure `PerFuncProto_` parsed by the function (because we will not adopt this solution in the end, `_` is added to the name of the structure), including other fields left after removing `lex` from the original `ParseProto`:
```rust, ignore
struct PerFuncProto_ {
pub constants: Vec<Value>,
pub byte_codes: Vec<ByteCode>,
sp: usize,
... // omit more fields
}
In order to support the nesting of functions, it is necessary to support multiple function analysis bodies at the same time. The most intuitive idea is to define a list of function bodies:
```rust, ignore
struct ParseProto
Each time a new layer of functions is nested, a new member is pushed into the `funcs` field; it pops up after the function is parsed. The last member of `funcs` represents the current function. This definition is very intuitive, but there is a problem. It is very troublesome to access all the fields of the current function. For example, to access the `constants` field, you need `self.funcs.last().unwrap().constants` to read or `self .funcs.last_mut().unwrap().constants` writes. It's too inconvenient, and the execution efficiency should also be affected.
If it is C language, then this problem is easy to solve: add a pointer member of type `PerFuncProto_` in `ParseProto`, such as `current`, which points to the last member of `funcs`. This pointer is updated every time the function body is pushed or popped. Then we can directly use this pointer to access the current function, such as `self.current.constants`. This approach is very convenient but Rust thinks it is not "safe", because the validity of this pointer cannot be guaranteed at the Rust syntax level. Although there are only two places to update this pointer, which is relatively safe, but since you use Rust, you must follow the rules of Rust.
For Rust, a feasible solution is to add an index (rather than a pointer), such as `icurrent`, pointing to the last member of `funcs`. This index is also updated every time the function body is pushed or popped. When accessing the current function information, we can use `self.funcs[icurrent].constants`. While the Rust language allows this, it's really just a variant of the pointer scheme above, and can still cause bugs due to incorrect updates of the index. For example, if the index exceeds the length of `funcs`, it will panic, and if it is smaller than expected, there will be code logic bugs that are more difficult to debug. In addition, during execution, Rust's list index will be compared with the length of the list, which will also slightly affect performance.
There is also a less intuitive solution that doesn't have the problems above: use recursion. When parsing nested functions, the most natural way is to recursively call the code of the parsing function, then each call will have an independent stack (Rust's call stack), so we can create a function parsing body every time you call it and use it Parse the current Lua function, and return the parsing body for the outer function to process after the call ends. In this solution, only the information of the current function can be accessed during the parsing process, and the information of the outer function cannot be accessed. Naturally, the problem of inconvenient access to the information of the current function just mentioned does not exist. For example, accessing constants still uses `self.constants`, even without modifying existing code. The only thing to solve is the global data `Lex`, which can be passed on as a parameter of the analysis function.
In this solution, there is no need to define a new data structure, just change the `lex` field in the original `ParseProto` from `Lex` type to `&mut Lex`. The syntax analysis function definition for parsing Lua functions is originally the method of `ParseProto`, which is defined as:
```rust, ignore
impl<'a, R: Read> ParseProto<'a, R> {
fn chunk(&mut self) {
...
}
Now change to a normal function, defined as:
```rust, ignore
fn chunk(lex: &mut Lex
The parameter `lex` is global data, and each recursive call is directly passed to the next layer. The return value is the parsed information of the current Lua function created inside `chunk()`.
In addition, the `chunk()` function internally calls the `block()` function to parse the code, and the latter returns the end Token of the block. Previously, the `chunk()` function was only used to process the entire code block, so the end Token could only be `Token::Eos`; but now it may also be used to parse other internal functions, and the expected end Token is `Token ::End`. Therefore, the `chunk()` function needs to add a new parameter, indicating the expected end Token. So the definition is changed to:
```rust, ignore
fn chunk(lex: &mut Lex<impl Read>, end_token: Token) -> ParseProto {
...
}
Add FuncProto
We just modified ParseProto
and the type of lex
. Now let’s do a small optimization by the way. The first two pub
modified fields in ParseProto
are also returned to the virtual machine for execution; most of the latter fields are only used for syntax analysis, which are internal data and do not need to be returned to the virtual machine. These two parts can be disassembled so that only the part needed by the virtual machine is returned. To do this, add the FuncProto
data structure:
```rust, ignore
// Return information to the virtual machine to execute
pub struct FuncProto {
pub constants: Vec
[derive(Debug)]
struct ParseProto<’a, R: Read> { // Return information to the virtual machine to execute fp: FuncProto,
// syntax analysis internal data
sp: usize,
locals: Vec<String>,
break_blocks: Vec<Vec<usize>>,
continue_blocks: Vec<Vec<(usize, usize)>>,
gotos: Vec<GotoLabel>,
labels: Vec<GotoLabel>,
lex: Lex<R>,
// global data
lex: &'a mut Lex<R>,
}
So the return value of the `chunk()` function is changed from `ParseProto` to `FuncProto`. Its full definition is as follows:
```rust, ignore
fn chunk(lex: &mut Lex<impl Read>, end_token: Token) -> FuncProto {
// Generate a new ParseProto to parse the current new Lua function
let mut proto = ParseProto::new(lex);
// call block() parsing function
assert_eq!(proto.block(), end_token);
if let Some(goto) = proto. gotos. first() {
panic!("goto {} no destination", &goto.name);
}
// only returns the FuncProto part
proto.fp
}
In this way, when syntactically analyzing Lua built-in functions, just recursively call chunk(self.lex, Token::End)
. The specific syntax analysis is introduced below.
Syntax Analysis
The general process of parsing Lua functions is introduced above, now let’s look at the specific syntax analysis. By now, we should be familiar with syntax analysis already, and it can be executed according to BNF. Lua’s function definition has 3 places:
- Global functions;
- Local functions:
- An anonymous function is a case of the expression
exp
statement.
The BNF rules are as follows:
stat :=
`function` funcname funcbody | # 1. Global function
`local` `function` Name funcbody | # 2. Local function
# omit other cases
exp := functiondef | omit other cases
functiondef := `function` funcbody # 3. Anonymous function
funcbody ::= '(' [parlist] ')' block end # Function definition
It can be seen from the above rules that the difference between these three definitions is only at the beginning, and at the end they all belong to funcbody
. Here only the simplest second case, the local function, is introduced.
``rust, ignore
fn local_function(&mut self) {
self.lex.next(); // skip keyword
function`
let name = self.read_name(); // function name, or local variable name
println!(“== function: {name}”);
// currently does not support parameters, skip `()`
self.lex.expect(Token::ParL);
self.lex.expect(Token::ParR);
// Call the chunk() parsing function
let proto = chunk(self.lex, Token::End);
// Put the parsed result FuncProto into the constant table
let i = self.add_const(Value::LuaFunction(Rc::new(proto)));
// load function through LoadConst bytecode
self.fp.byte_codes.push(ByteCode::LoadConst(self.sp as u8, i as u16));
// create local variable
self. locals. push(name);
}
The parsing process is simple. It should be noted that the processing method of the function prototype FuncProto returned by the `chunk()` function is to put it in the constant table as a constant. It can be compared that a string is a constant composed of a series of character sequences; and the function prototype FuncProto is a constant composed of a series of constant tables and bytecode sequences. It also exists in the constant table, and it is also loaded with `LoadConst` bytecode.
To this end, it is necessary to add a new Value type `LuaFunction` to represent the Rust function, and change the type that originally represented the Lua function from `Function` to `RustFunction`:
```rust, ignore
pub enum Value {
LongStr(Rc<Vec<u8>>),
LuaFunction(Rc<FuncProto>),
RustFunction(fn (&mut ExeState) -> i32),
The data type associated with LuaFunction
is Rc<FuncProto>
, and it can also be seen from here that it is similar to a string constant.
The syntax analysis of “defining a function” is completed above, and the syntax analysis of “calling a function” is related to functions. But when “calling a function”, the Lua function and the Rust function are treated equally, and the Lua programmer does not even know what the function is implemented when calling the function; since the Rust function print()
has been called before Syntactic analysis, so there is no need to perform syntax analysis specifically for Lua function calls.
Virtual Machine Execution
Like syntax analysis, our previous virtual machine execution part only supports one layer of Lua functions. In order to support function calls, the easiest way is to recursively call the virtual machine to execute, that is, the execute()
function. code show as below:
```rust, ignore ByteCode::Call(func, _) => { self. func_index = func as usize; match &self. stack[self. func_index] { Value::RustFunction(f) => { // previously supported Rust functions f(self); } Value::LuaFunction(f) => { // new Lua function let f = f. clone(); self.execute(&f); // recursively call the virtual machine! } f => panic!(“invalid function: {f:?}”), } }
However, special handling of the stack is required. During parsing, each time a new function is parsed, the stack pointer (the `sp` field in the `ParseProto` structure) starts from 0. Because during syntax analysis, the absolute starting address of the stack when the virtual machine is executed is not known. Then, when the virtual machine is executing, when accessing the stack, the stack index in the bytecode used needs to add the offset of the stack start address of the current function. For example, for the following Lua code:
```lua
local a, b = 1, 2
local function foo()
local x, y = 1, 2
end
foo()
When parsing the foo()
function definition, the stack addresses of the local variables x and y are 0 and 1, respectively. When the last line of code is executed and the foo()
function is called, the function foo
is placed at the absolute index 2 of the stack. At this time, the absolute indexes of the local variables x and y are 3 and 4. Then when the virtual machine executes, it needs to convert the relative addresses 0 and 1 into 3 and 4.
absolute relative
address address
+-----+ <---base of main function
0 | a | 0
+-----+
1 | b | 1
+-----+
2 | foo | 2
+-----+ <---base of foo()
3 | x | 0
+-----+
4 | y | 1
+-----+
| |
When executing the Rust function print()
before, in order to allow the print()
function to read the parameters, the func_index
member is set in ExeState
to point to the address of the function on the stack. Now call the Lua function, still the same. However, func_index
is renamed to base
here, and points to the next address of the function.
```rust, ignore ByteCode::Call(func, _) => { self.base += func as usize + 1; // Set the absolute address of the function on the stack match &self.stack[self.base-1] { Value::RustFunction(f) => { f(self); } Value::LuaFunction(f) => { let f = f. clone(); self. execute(&f); } f => panic!(“invalid function: {f:?}”), } self.base -= func as usize + 1; // restore }
All previous write operations to the stack were called `set_stack()` method, now need to add self.base offset:
```rust, ignore
fn set_stack(&mut self, dst: u8, v: Value) {
set_vec(&mut self.stack, self.base + dst as usize, v); // plus self.base
}
All previous read operations on the stack were directly self.stack[i]
, and now a new function get_stack()
is also extracted, and the self.base offset is added when accessing the stack:
rust, ignore
fn get_stack(&self, dst: u8) -> &Value {
&self.stack[self.base + dst as usize] // plus self.base
}
So far, we have completed the most basic definition and calling of Lua functions. Thanks to the power of recursion, the code changes are not big. But it’s just the beginning of the full feature. The next section adds support for parameters and return values.