5.5.6 Lexical Addressing - SICP Comparison Edition

Permalink copied!

One of the most common optimizations performed by compilers is the optimization of variable name lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup-variable-value lookup_symbol_value operation of the evaluator machine. This searches for a This searches for a variable name by comparing it it with each variable name that is currently bound, working frame by frame outward through the runtime environment. This search can be expensive if the frames are deeply nested or if there are many variables. names. For example, consider the problem of looking up the value of x while evaluating the expression (* x y z) x * y * z in an application of the procedure function of five arguments that is returned by

Original	JavaScript
`(let ((x 3) (y 4)) (lambda (a b c d e) (let ((y (* a b x)) (z (+ c d x))) (* x y z))))`	`((x, y) => (a, b, c, d, e) => ((y, z) => x * y * z)(a * b * x, c + d + x))(3, 4)`

Original		JavaScript
Since a `let` expression is just syntactic sugar for a `lambda` combination, this expression is equivalent to `((lambda (x y) (lambda (a b c d e) ((lambda (y z) (* x y z)) (* a b x) (+ c d x)))) 3 4)`

Each time lookup-variable-value lookup_symbol_value searches for x, it must determine that the symbol x is not eq? to y or z (in the first frame), nor to a, b, c, d, or e (in the second frame). "x" is not equal to "y" or "z" (in the first frame), nor to "a", "b", "c", "d", or "e" (in the second frame).

Original		JavaScript
We will assume, for the moment, that our programs do not use `define`—that variables are bound only with `lambda`.

Because our language is lexically scoped, the runtime environment for any expression component will have a structure that parallels the lexical structure of the program in which the expression component appears.[1] Thus, the compiler can know, when it analyzes the above expression, that each time the procedure function is applied the variable x binding for x in (* x y z) x * y * z will be found two frames out from the current frame and will be the first variable binding in that frame.

We can exploit this fact by inventing a new kind of variable-lookup name-lookup operation, lexical-address-lookup, lexical_address_lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables bindings to pass over in that frame. Lexical-address-lookup The operation lexical_address_lookup will produce the value of the variable name stored at that lexical address relative to the current environment. If we add the lexical-address-lookup lexical_address_lookup operation to our machine, we can make the compiler generate code that references variables names using this operation, rather than lookup-variable-value. lookup_symbol_value. Similarly, our compiled code can use a new lexical-address-set! lexical_address_assign operation instead of set-variable-value!. assign_symbol_value.

Original		JavaScript
		With lexical addressing, there is no need to include any symbolic references to names in the object code, and frames do not need to include symbols at run time.

In order to generate such code, the compiler must be able to determine the lexical address of a variable name it is about to compile a reference to. The lexical address of a variable name in a program depends on where one is in the code. For example, in the following program, the address of x in expression $e_1$ is (2,0)—two frames back and the first variable name in the frame. At that point y is at address (0,0) and c is at address (1,2). In expression $e_2$, x is at (1,0), y is at (1,1), and c is at (0,2).

Original	JavaScript
`((lambda (x y) (lambda (a b c d e) ((lambda (y z) e1) e2 (+ c d x)))) 3 4)`	`((x, y) => (a, b, c, d, e) => ((y, z) => $e_1$)($e_2$, c + d + x))(3, 4);`

One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which variables bindings will be at which positions in which frames in the runtime environment when a particular variable-access name-access operation is executed. The compile-time environment is a list of frames, each containing a list of variables. symbols. (There will of course be no values bound to the variables, since values are not computed at compile time.) There will be no values associated with the symbols, since values are not computed at compile time. (Exercise 5.50 will change this, as an optimization for constants.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses an empty compile-time environment. a compile-time-environment that includes the names of all primitive functions and primitive values. When a lambda body the body of a lambda expression is compiled, compile-lambda-body compile_lambda_body extends the compile-time environment by a frame containing the procedure's function's parameters, so that the sequence making up the body is compiled with that extended environment.

Original		JavaScript
		Similarly, when the body of a block is compiled, `compile_block` extends the compile-time environment by a frame containing the scanned-out local names of the body.

At each point in the compilation, compile-variable compile_name and compile-assignment compile_assignment_declaration use the compile-time environment in order to generate the appropriate lexical addresses.

Original		JavaScript
Exercises 5.43 through 5.47 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercise 5.49 describes another use for the compile-time environment.		Exercises 5.43 through 5.46 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercises 5.48 and 5.50 describe other uses for the compile-time environment.

Original		JavaScript

Exercise 5.43 Write a procedure function lexical-address-lookup lexical_address_lookup that implements the new lookup operation. It should take two arguments—a lexical address and a runtime environment—and return the value of the variable name stored at the specified lexical address. Lexical-address-lookup The function lexical_address_lookup should signal an error if the value of the variable of the name is the symbol *unassigned*.[2] string "*unassigned*". The footnote can be safely dropped, because the scanning method is the default in the JavaScript adaptation. Also write a procedure function lexical-address-set! lexical_address_assign that implements the operation that changes the value of the variable of the name at a specified lexical address.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.44 Modify the compiler to maintain the compile-time environment as described above. That is, add a compile-time-environment argument to compile and the various code generators, and extend it in compile-lambda-body. compile_lambda_body and compile_block.

Exercise 5.45 Write a procedure function find-variable find_symbol that takes as arguments a variable symbol and a compile-time environment and returns the lexical address of the variable symbol with respect to that environment. For example, in the program fragment that is shown above, the compile-time environment during the compilation of expression $e_1$ is

Original		JavaScript
`((y z) (a b c d e) (x y))`.		`list(list("y", "z"), list("a", "b", "c", "d", "e"), list("x", "y"))`

Find-variable The function find_symbol should produce

Original	JavaScript
`(find-variable 'c '((y z) (a b c d e) (x y)))(1 2)`	`find_symbol("c", list(list("y", "z"), list("a", "b", "c", "d", "e"), list("x", "y")));list(1, 2)`

Original	JavaScript
`(find-variable 'x '((y z) (a b c d e) (x y)))(2 0)`	`find_symbol("x", list(list("y", "z"), list("a", "b", "c", "d", "e"), list("x", "y")));list(2, 0)`

Original	JavaScript
`(find-variable 'w '((y z) (a b c d e) (x y)))not-found`	`find_symbol("w", list(list("y", "z"), list("a", "b", "c", "d", "e"), list("x", "y")));"not found"`

Exercise 5.46 Using find-variable find_symbol from exercise 5.45, rewrite compile-variable and compile-assignment compile_assignment_declaration and compile_name to output lexical-address instructions.

Original		JavaScript
In cases where `find-variable` returns `not-found` (that is, where the variable is not in the compile-time environment), you should have the code generators use the evaluator operations, as before, to search for the binding. (The only place a variable that is not found at compile time can be is in the global environment, which is part of the runtime environment but is not part of the compile-time environment.[3] Thus, if you wish, you may have the evaluator operations looks directly in the global environment, which can be obtained with the operation `(op get-global-environment)`, instead of having them search the whole runtime environment found in `env`.) This is not possible in an interactive system that keeps extending the program environment. Note the new `driver_loop` in 4.1.4 and the same idea recurring in `compile_and_go` in 5.5.7.		In cases where `find_symbol` returns `"not found"` (that is, where the name is not in the compile-time environment), you should report a compile-time error.

Test the modified compiler on a few simple cases, such as the nested lambda lambda combination at the beginning of this section.

Original		JavaScript
Exercise 5.47 We argued in section 4.1.6 that internal definitions for block structure should not be considered real `define`s. Rather, a procedure body should be interpreted as if the internal variables being defined were installed as ordinary `lambda` variables initialized to their correct values using `set!`. Section 4.1.6 and exercise 4.24 showed how to modify the metacircular interpreter to accomplish this by scanning out internal definitions. Modify the compiler to perform the same transformation before it compiles a procedure body. There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.		Exercise 5.48 In JavaScript, an attempt to assign a new value to a name that is declared as a constant leads to an error. Exercise 4.16 shows how to detect such errors at run time. With the techniques presented in this section, we can detect attempts to assign a new value to a constant at compile time. For this purpose, extend the functions `compile_lambda_body` and `compile_block` to record in the compile-time environment whether a name is declared as a variable (using `let` or as a parameter), or as a constant (using `const` or `function`). Modify `compile_assignment` to report an appropriate error when it detects an assignment to a constant. There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Original		JavaScript
Exercise 5.49 In this section we have focused on the use of the compile-time environment to produce lexical addresses. But there are other uses for compile-time environments. For instance, in exercise 5.42 we increased the efficiency of compiled code by open-coding primitive procedures. Our implementation treated the names of open-coded procedures as reserved words. If a program were to rebind such a name, the mechanism described in exercise 5.42 would still open-code it as a primitive, ignoring the new binding. For example, consider the procedure `(lambda (+ * a b x y) (+ (* a x) (* b y)))` which computes a linear combination of `x` and `y`. We might call it with arguments `+matrix`, `matrix`, and four matrices, but the open-coding compiler would still open-code the `+` and the `` in `(+ (* a x) (* b y))` as primitive `+` and `*`. Modify the open-coding compiler to consult the compile-time environment in order to compile the correct code for expressions involving the names of primitive procedures. (The code will work correctly as long as the program does not `define` or `set!` these names.) There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.		Exercise 5.50 Knowledge about constants at compile time opens the door to many optimizations that allow us to generate more efficient object code. In addition to the extension of the compile-time environment in exercise 5.48 to indicate names declared as constants, we may store the value of a constant if it is known at compile time, or other information that can help us optimize the code. A constant declaration such as `const` $name$ `=` $literal$`;` allows us to replace all occurrences of $name$ within the scope of the declaration by $literal$ so that $name$ doesn't have to be looked up in the runtime environment. This optimization is called constant propagation. Use an extended compile-time environment to store literal constants, and modify `compile_name` to use the stored constant in the generated `assign` instruction instead of the `lookup_symbol_value` operation. Function declaration is a derived component that expands to constant declaration. Let us assume that the names of primitive functions in the global environment are also considered constants. If we further extend our compile-time environment to keep track of which names refer to compiled functions and which ones to primitive functions, we can move the test that checks whether a function is compiled or primitive from run time to compile time. This makes the object code more efficient because it replaces a test that must be performed once per function application in the generated code by one that is performed by the compiler. Using such an extended compile-time environment, modify `compile_function_call` so that if it can be determined at compile time whether the called function is compiled or primitive, only the instructions in the `compiled_branch` or the `primitive_branch` are generated. Replacing constant names with their literal values as in part (a) paves the way for another optimization, namely replacing applications of primitive functions to literal values with the compile-time computed result. This optimization, called constant folding, replaces expressions such as `40 + 2` by `42` by performing the addition in the compiler. Extend the compiler to perform constant folding for arithmetic operations on numbers and for string concatenation. There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

[1] This is not true if we allow internal definitions, unless we scan them out. See exercise 5.47.

[2] This is the modification to variable lookup required if we implement the scanning method to eliminate internal definitions (exercise 5.47). We will need to eliminate these definitions in order for lexical addressing to work.

[3] Lexical addresses cannot be used to access variables in the global environment, because these names can be defined and redefined interactively at any time. With internal definitions scanned out, as in exercise 5.47, the only definitions the compiler sees are those at top level, which act on the global environment. Compilation of a definition does not cause the defined name to be entered in the compile-time environment.

< Previous

Next >

5.5.6 Lexical Addressing