5.5.1 Structure of the Compiler - SICP Comparison Edition

Permalink copied!

In section 4.1.7 we modified our original metacircular interpreter to separate analysis from execution. We analyzed each expression component to produce an execution procedure function that took an environment as argument and performed the required operations. In our compiler, we will do essentially the same analysis. Instead of producing execution procedures, functions, however, we will generate sequences of instructions to be run by our register machine.

The procedure function compile is the top-level dispatch in the compiler. It corresponds to the evalevaluate procedure function of section 4.1.1, the analyze procedure function of section 4.1.7, and the eval-dispatch eval_dispatch entry point of the explicit-control-evaluator in section 5.4.1. The compiler, like the interpreters, uses the expression-syntax procedures component-syntax functions defined in section 4.1.2.[1] Compile The function compile performs a case analysis on the syntactic type of the expression component to be compiled. For each type of expression, component, it dispatches to a specialized code generator:

Original

JavaScript

(define (compile exp target linkage) (cond ((self-evaluating? exp) (compile-self-evaluating exp target linkage)) ((quoted? exp) (compile-quoted exp target linkage)) ((variable? exp) (compile-variable exp target linkage)) ((assignment? exp) (compile-assignment exp target linkage)) ((definition? exp) (compile-definition exp target linkage)) ((if? exp) (compile-if exp target linkage)) ((lambda? exp) (compile-lambda exp target linkage)) ((begin? exp) (compile-sequence (begin-actions exp) target linkage)) ((cond? exp) (compile (cond->if exp) target linkage)) ((application? exp) (compile-application exp target linkage)) (else (error "Unknown expression type - - COMPILE" exp))))

function compile(component, target, linkage) { return is_literal(component) ? compile_literal(component, target, linkage) : is_name(component) ? compile_name(component, target, linkage) : is_application(component) ? compile_application(component, target, linkage) : is_operator_combination(component) ? compile(operator_combination_to_application(component), target, linkage) : is_conditional(component) ? compile_conditional(component, target, linkage) : is_lambda_expression(component) ? compile_lambda_expression(component, target, linkage) : is_sequence(component) ? compile_sequence(sequence_statements(component), target, linkage) : is_block(component) ? compile_block(component, target, linkage) : is_return_statement(component) ? compile_return_statement(component, target, linkage) : is_function_declaration(component) ? compile(function_decl_to_constant_decl(component), target, linkage) : is_declaration(component) ? compile_declaration(component, target, linkage) : is_assignment(component) ? compile_assignment(component, target, linkage) : error(component, "unknown component type -- compile"); }

Targets and linkages

Compile The function compile and the code generators that it calls take two arguments in addition to the expression component to compile. There is a target, which specifies the register in which the compiled code is to return the value of the expression. component. There is also a linkage descriptor, which describes how the code resulting from the compilation of the expression component should proceed when it has finished its execution. The linkage descriptor can require the code to do one of the following three things:

proceed to the next instruction in sequence (this is specified by the linkage descriptor next), "next"),
return from the procedure being compiled jump to the current value of the continue register as part of returning from a function call (this is specified by the linkage descriptor return), "return"), or
jump to a named entry point (this is specified by using the designated label as the linkage descriptor).

For example, compiling the expression literal 5 (which is self-evaluating) with a target of the val register and a linkage of next "next" should produce the instruction

Original	JavaScript
`(assign val (const 5))`	`assign("val", constant(5))`

Compiling the same expression with a linkage of return "return" should produce the instructions

Original	JavaScript
`(assign val (const 5)) (goto (reg continue))`	`assign("val", constant(5)), go_to(reg("continue"))`

In the first case, execution will continue with the next instruction in the sequence. In the second case, we will return from a procedure call. we will jump to whatever entry point is stored in the continue register. In both cases, the value of the expression will be placed into the target val register.

Original		JavaScript
		Our compiler uses the `"return"` linkage when compiling the return expression of a return statement. Just as in the explicit-control evaluator, returning from a function call happens in three steps: reverting the stack to the marker and restoring `continue` (which holds a continuation set up at the beginning of the function call) computing the return value and placing it in `val` jumping to the entry point in `continue` Compilation of a return statement explicitly generates code for reverting the stack and restoring `continue`. The return expression is compiled with target `val` and linkage `"return"` so that the generated code for computing the return value places the return value in `val` and ends by jumping to `continue`.

Instruction sequences and stack usage

Each code generator returns an instruction sequence containing the object code it has generated for the expression. component. Code generation for a compound expression compound component is accomplished by combining the output from simpler code generators for component expressions, subcomponents, just as evaluation of a compound expression compound component is accomplished by evaluating the component expressions. subcomponents.

The simplest method for combining instruction sequences is a procedure function called append-instruction-sequences. append_instruction_sequences, It takes as arguments any number of instruction sequences which takes as arguments two instruction sequences that are to be executed sequentially; it sequentially. It appends them and returns the combined sequence. That is, if $seq_1$ and $seq_2$ are sequences of instructions, then evaluating

Original	JavaScript
`(append-instruction-sequences $seq_1$ $seq_2$)`	`append_instruction_sequences($seq$$_1$, $seq$$_2$)`

produces the sequence

Original	JavaScript
`$seq_1$ $seq_2$`	`$seq$$_1$ $seq$$_2$`

Whenever registers might need to be saved, the compiler's code generators use preserving, which is a more subtle method for combining instruction sequences. Preserving The function preserving takes three arguments: a set of registers and two instruction sequences that are to be executed sequentially. It appends the sequences in such a way that the contents of each register in the set is preserved over the execution of the first sequence, if this is needed for the execution of the second sequence. That is, if the first sequence modifies the register and the second sequence actually needs the register's original contents, then preserving wraps a save and a restore of the register around the first sequence before appending the sequences. Otherwise, preserving simply returns the appended instruction sequences. Thus, for example,

Original	JavaScript
`(preserving (list $reg_1$ $reg_2$) $seq_1$ $seq_2$)`	`preserving(list($reg$$_1$, $reg$$_2$), $seq$$_1$, $seq$$_2$)`

produces one of the following four sequences of instructions, depending on how $seq$$_1$ and $seq$$_2$ use $reg$$_1$ and $reg$$_2$:

Original		JavaScript
\[ \begin{array}{l\|l\|l\|l} \textit{seq}_1 & \texttt{(save}\ \textit{reg}_1\texttt{)} & \texttt{(save}\ \textit{reg}_2\texttt{)} & \texttt{(save}\ \textit{reg}_2\texttt{)} \\ \textit{seq}_2 & \textit{seq}_1 & \textit{seq}_1 & \texttt{(save}\ \textit{reg}_1\texttt{)} \\ & \texttt{(restore}\ \textit{reg}_1\texttt{)} & \texttt{(restore}\ \textit{reg}_2\texttt{)} & \textit{seq}_1 \\ & \textit{seq}_2 & \textit{seq}_2 & \texttt{(restore}\ \textit{reg}_1\texttt{)} \\ & & & \texttt{(restore}\ \textit{reg}_2\texttt{)} \\ & & & \textit{seq}_2 \end{array} \]		\[ \begin{array}{l\|l\|l\|l} \textit{seq}_1 & \texttt{save(}\textit{reg}_1\texttt{),} & \texttt{save(}\textit{reg}_2\texttt{),} & \texttt{save(}\textit{reg}_2\texttt{),} \\ \textit{seq}_2 & \textit{seq}_1 & \textit{seq}_1 & \texttt{save(}\textit{reg}_1\texttt{),} \\ & \texttt{restore(}\textit{reg}_1\texttt{),} & \texttt{restore(}\textit{reg}_2\texttt{),} & \textit{seq}_1 \\ & \textit{seq}_2 & \textit{seq}_2 & \texttt{restore(}\textit{reg}_1\texttt{),} \\ & & & \texttt{restore(}\textit{reg}_2\texttt{),} \\ & & & \textit{seq}_2 \end{array} \]

By using preserving to combine instruction sequences the compiler avoids unnecessary stack operations. This also isolates the details of whether or not to generate save and restore instructions within the preserving procedure, function, separating them from the concerns that arise in writing each of the individual code generators. In fact no save or restore instructions are explicitly produced by the code generators. generators, except that the code for calling a function saves continue and the code for returning from a function restores it: These corresponding save and restore instructions are explicitly generated by different calls to compile, not as a matched pair by preserving (as we will see in section 5.5.3).

In principle, we could represent an instruction sequence simply as a list of instructions. Append-instruction-sequences The function append_instruction_sequences could then combine instruction sequences by performing an ordinary list append. However, preserving would then be a complex operation, because it would have to analyze each instruction sequence to determine how the sequence uses its registers. Preserving The function preserving would be inefficient as well as complex, because it would have to analyze each of its instruction sequence arguments, even though these sequences might themselves have been constructed by calls to preserving, in which case their parts would have already been analyzed. To avoid such repetitious analysis we will associate with each instruction sequence some information about its register use. When we construct a basic instruction sequence we will provide this information explicitly, and the procedures functions that combine instruction sequences will derive register-use information for the combined sequence from the information associated with the sequences being combined.

An instruction sequence will contain three pieces of information:

the set of registers that must be initialized before the instructions in the sequence are executed (these registers are said to be needed by the sequence),
the set of registers whose values are modified by the instructions in the sequence, and
the actual instructions (also called statements) in the sequence.

We will represent an instruction sequence as a list of its three parts. The constructor for instruction sequences is thus

Original	JavaScript
`(define (make-instruction-sequence needs modifies statements) (list needs modifies statements))`	`function make_instruction_sequence(needs, modifies, instructions) { return list(needs, modifies, instructions); }`

For example, the two-instruction sequence that looks up the value of the variable x symbol "x" in the current environment, assigns the result to val, and then returns, and then proceeds to the continuation, requires registers env and continue to have been initialized, and modifies register val. This sequence would therefore be constructed as

Original	JavaScript
`(make-instruction-sequence '(env continue) '(val) '((assign val (op lookup-variable-value) (const x) (reg env)) (goto (reg continue))))`	`make_instruction_sequence(list("env", "continue"), list("val"), list(assign("val", list(op("lookup_symbol_value"), constant("x"), reg("env"))), go_to(reg("continue"))));`

The procedures functions for combining instruction sequences are shown in section 5.5.4.

Exercise 5.33 In evaluating a procedure function application, the explicit-control evaluator always saves and restores the env register around the evaluation of the operator, function expression, saves and restores env around the evaluation of each operand argument expression (except the final one), saves and restores argl around the evaluation of each operand, argument expression, and saves and restores proc fun around the evaluation of the operand argument-expression sequence. For each of the following combinations, applications, say which of these save and restore operations are superfluous and thus could be eliminated by the compiler's preserving mechanism:

Original	JavaScript
`(f 'x 'y) ((f) 'x 'y) (f (g 'x) y) (f (g 'x) 'y)`	`f("x", "y") f()("x", "y") f(g("x"), y) f(g("x"), "y")`

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.34 Using the preserving mechanism, the compiler will avoid saving and restoring env around the evaluation of the operator of a combination function expression of an application in the case where the operator is a symbol. function expression is a name. We could also build such optimizations into the evaluator. Indeed, the explicit-control evaluator of section 5.4 already performs a similar optimization, by treating combinations with no operands applications with no arguments as a special case.

Extend the explicit-control evaluator to recognize as a separate class of expressions combinations whose operator is a symbol, components applications whose function expression is a name, and to take advantage of this fact in evaluating such expressions. components.
Alyssa P. Hacker suggests that by extending the evaluator to recognize more and more special cases we could incorporate all the compiler's optimizations, and that this would eliminate the advantage of compilation altogether. What do you think of this idea?

[1] Notice, however, that our compiler is a Scheme JavaScript program, and the syntax procedures functions that it uses to manipulate expressions are the actual Scheme procedures JavaScript functions used with the metacircular evaluator. For the explicit-control evaluator, in contrast, we assumed that equivalent syntax operations were available as operations for the register machine. (Of course, when we simulated the register machine in Scheme, JavaScript, we used the actual Scheme procedures JavaScript functions in our register machine simulation.)

< Previous

Next >

5.5.1 Structure of the Compiler