In this section and the next we implement the code generators to which the
compileprocedurefunction
dispatches.
Compiling linkage code
In general, the output of each code generator will end with
instructions—generated by the
procedurefunctioncompile-linkage—thatcompile_linkage—that
implement the required linkage. If the linkage is
return"return"
then
we must generate the instruction
(goto (reg continue)).go_to(reg("continue")).
This needs the continue register and does not
modify any registers.
If the linkage is
next,
"next",
then we needn't include any additional instructions. Otherwise, the
linkage is a label, and we generate a
gotogo_to
to that label, an instruction that does not need or modify
any registers.[1]
The linkage code is appended to an instruction sequence by
preserving the
continue register, since a
return"return"
linkage will
require the continue register:
If the given instruction sequence modifies
continue and the linkage code needs it,
continue will be saved and restored.
function end_with_linkage(linkage, instruction_sequence) {
return preserving(list("continue"),
instruction_sequence,
compile_linkage(linkage));
}
Compiling simple
expressions
components
The code generators for
self-evaluating expressions, quotations, and variablesliteral expressions and names
construct instruction
sequences that assign the required value to the target register
and then proceed as specified by the linkage descriptor.
Original
JavaScript
The literal value is extracted at compile time from the component being
compiled and put into the constant part of the
assign instruction. For a name, an
instruction is generated to use the
lookup_symbol_value operation when
the compiled program is run, to look up the value associated with a
symbol in the current environment. Like a literal value, the symbol is
extracted at compile time from the component being compiled. Thus
symbol_of_name(component) is
executed only once, when the program is being compiled, and the symbol
appears as a constant in the
assign instruction.
All theseThese
assignment instructions modify the target register,
and the one that looks up a
variable
symbol
needs the env register.
Assignments and
definitionsdeclarations
are handled much as they are in the
interpreter.
We recursively generate code that computes the value to be
assigned to the variable, and append to it a
two-instruction sequence that actually sets or defines the
variable and assigns the value of the whole expression
(the symbol ok) to the target
register. The recursive compilation has target
val and linkage
next so that the code will
put its result into val and
continue with the code that is appended after it. The
appending is done preserving
env, since the environment is
needed for setting or defining the variable and the code
for the variable value could be the compilation of a
complex expression that might modify the registers in
arbitrary ways.
The function compile_assignment_declaration
recursively generates code that computes the value to be
associated with the symbol and appends to it a two-instruction
sequence that updates the value associated with the symbol
in the environment and assigns the value of the whole component
(the assigned value for an assignment or undefined for a declaration)
to the target register.
The recursive compilation has target
val and linkage
"next" so that the code
will put its result into val
and continue with the code that is appended after it. The
appending is done preserving env,
since the environment is needed for updating the symbol–value
association and the code for computing the value could be the compilation
of a complex expression that might modify the registers in
arbitrary ways.
The appended two-instruction sequence requires
env and val
and modifies the target. Note that although we preserve
env for this sequence, we do not preserve
val, because the
get-value-codeget_value_code
is designed to explicitly place its result in
val for use by this sequence.
(In fact, if we did preserve val, we would
have a bug, because this would cause the previous contents of
val to be restored right after the
get-value-codeget_value_code
is run.)
Compiling
conditional expressions
conditionals
The code for
an if expression
a conditional
compiled with a given target and linkage has the form
$\langle{}$compilation of predicate, target val, linkage "next"$\rangle$
test(list(op("is_falsy"), reg("val"))),
branch(label("false_branch")),
"true_branch",
$\langle{}$compilation of consequent with given target and given linkage or after_cond$\rangle$
"false_branch",
$\langle{}$compilation of alternative with given target and linkage$\rangle$
"after_cond"
To generate this code, we compile the predicate, consequent,
and alternative, and combine the resulting code with instructions
to test the predicate result and with newly generated labels to mark the
true and false branches and the end of the
conditional.[2]
In this arrangement of code, we must branch around the true branch
if the test is false. The only slight complication is in how the
linkage for the true branch should be handled. If the linkage for the
conditional is
return"return"
or a label, then the
true and false branches will both use this same linkage. If the linkage is
next,
"next",
the true branch ends with a jump around
the code for the false branch to the label at the end of the conditional.
Env
The env register
is preserved around the predicate code
because it could be needed by the true and false
branches, and continue is preserved because it could
be needed by the linkage code in those branches.
The code for the true and
false branches (which are not executed sequentially) is appended using a
special combiner
parallel-instruction-sequencesparallel_instruction_sequences
described in
section 5.5.4.
Original
JavaScript
Note that cond is a derived expression,
so all that the compiler needs to do to handle it is to apply the
cond->if transformer
(from section 4.1.2) and
compile the resulting if expression.
Compiling sequences
The compilation of
sequences (from procedure bodies or explicit begin
expressions)
parallels their evaluation.
statement sequences
parallels their evaluation
in the explicit-control evaluator with one exception:
If a return statement appears anywhere in a sequence, we treat
it as if it were the last statement.
Each
expression
statement
of the sequence is compiled—the last
expression
statement (or a return statement)
with
the linkage specified for the sequence, and the other
expressions
statements
with
linkage
next"next"
(to execute the rest of the
sequence). The instruction sequences for the individual
expressions
statements
are
appended to form a single instruction sequence, such that
env (needed for the rest of the
sequence)
and continue (possibly needed
for the linkage at the end of the sequence) are
preserved.
and continue (possibly needed
for the linkage at the end of the sequence)
are preserved.[3]
Treating a return statement as if it were the last statement in a sequence
avoids compiling any dead code after the return statement that can
never be executed.
Removing the is_return_statement check does not change the behavior
of the object program; however,
there are many reasons not to compile dead code, which are beyond the scope of this book
(security, compilation time, size of the object code, etc.),
and many compilers give warnings for dead code.[4]
Original
JavaScript
Compiling blocks
A
block is compiled by prepending an assign instruction to the compiled
body of the block. The assignment extends the current environment by a frame
that binds the names declared in the block to the value
"*unassigned*". This operation
both needs and modifies the env register.
function compile_block(stmt, target, linkage) {
const body = block_body(stmt);
const locals = scan_out_declarations(body);
const unassigneds = list_of_unassigned(locals);
return append_instruction_sequences(
make_instruction_sequence(list("env"), list("env"),
list(assign("env", list(op("extend_environment"),
constant(locals),
constant(unassigneds),
reg("env"))))),
compile(body, target, linkage));
}
Compiling
lambda
lambda
expressions
Lambda expressions
construct
procedures.functions.
The object code for a lambda expression must have the form
$\langle{}$construct function object and assign it to target register$\rangle$
$\langle{}$linkage$\rangle$
When we compile the lambda expression, we also generate the code for the
procedurefunction
body. Although the body won't be executed at the time of
procedurefunction
construction, it is convenient to insert it into the object code right after
the code for the
lambda.lambda expression.
If the linkage for the lambda expression is a label or
return,
"return",
this is fine. But if the linkage is
next,
"next",
we will need to skip around the code for
the
procedurefunction
body by using a linkage that jumps to a label that is inserted after the
body. The object code thus has the form
$\langle{}$construct function object and assign it to target register$\rangle$
$\langle{}$code for given linkage$\rangle$ $or$ go_to(label("after_lambda"))
$\langle{}$compilation of function body$\rangle$
"after_lambda"
Compile-lambda
The function
compile_lambda_expression
generates the code for constructing the
procedurefunction
object followed by the code for the
procedurefunction
body. The
procedurefunction
object will be constructed at run time by combining the current environment
(the environment at the point of definitiondeclaration) with the entry point to the
compiled
procedurefunction
body (a newly generated label).[5]
Compile-lambda
The function
compile_lambda_expression
uses the special combiner
tack-on-instruction-sequencetack_on_instruction_sequence
(from section 5.5.4) rather
than
append-instruction-sequencesappend_instruction_sequences
to append the
procedurefunction
body to the lambda expression code, because the
body is not part of the sequence of instructions that will be executed when
the combined sequence is entered; rather, it is in the sequence only because
that was a convenient place to put it.
Compile-lambda-body
The function
compile_lambda_body
constructs the code for the body of the
procedure.function.
This code begins with a label for the entry point. Next come instructions
that will cause the runtime evaluation environment to switch to the correct
environment for evaluating the
procedurefunction
body—namely, the
definition
environment of the
procedure,function,
extended to include the bindings of the
formal
parameters to the arguments
with which the
procedurefunction
is called. After this comes the code for the
sequence of expressions that makes up the
procedure body.
function body, augmented to ensure that it ends with a return statement.
Original
JavaScript
The sequence is compiled with linkage return and target
val so that it will end by returning from
the procedure with the procedure result in
val.
The
augmented body is compiled with target val so that its return value will be
placed in val. The linkage descriptor passed to this compilation is
irrelevant, as it will be ignored.[6]
Since a linkage argument is required, we arbitrarily pick "next".
To ensure that all functions end by executing a return statement,
compile_lambda_body
appends to the lambda body a return statement whose return expression is the literal
undefined.
To do so, it uses the function append_return_undefined,
which constructs the parser's tagged-list representation (from section 4.1.2) of a
sequence consisting of the body and a return undefined; statement.
function append_return_undefined(body) {
return list("sequence", list(body,
list("return_statement",
list("literal", undefined))));
}
This simple transformation of lambda bodies is a third way to ensure that a
function that does not return explicitly has the return value
undefined.
In the metacircular evaluator, we used a return-value object, which also played a
role in stopping a sequence evaluation.
In the explicit-control evaluator, functions that did not return explicitly continued
to an entry point that stored undefined
in val.
See exercise 5.36 for a more elegant way
to handle insertion of return statements.
Exercise 5.35
Footnote 4 pointed out that the compiler does not
identify all instances of
dead code. What would be required of a compiler to
detect all instances of dead code?
Hint: The answer depends on how we define dead code. One possible (and useful)
definition is code following a return statement in a sequence—but
what about code in the consequent
branch of if (false) $\ldots$ or
code following a
call to run_forever() in exercise 4.20?
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.36
The current design of append_return_undefined
is a bit crude: It always appends a
return undefined; to a lambda body, even if there is already a return statement in every execution
path of the body. Rewrite
append_return_undefined so that it inserts
return undefined; at the end of only
those paths that do not contain a return statement. Test your
solution on the functions below, substituting any expressions for
$e_1$ and
$e_2$
and any (non-return) statements for
$s_1$ and
$s_2$.
In t, a return statement should be
added either at both (*)'s or just at
(**).
In w and
h, a return statement should be added
at one of the (*)'s.
In m, no return statement should be added.
function t(b) { function w(b) { function m(b) { function h(b1, b2) {
if (b) { if (b) { if (b) { if (b1) {
$s_1~~$ return $e_1;~$ return $e_1;~$ return $e_1$;
(*) } else {
} else { } else { } else { if (b2) {
$s_2~~$ $s_1~~$ return $e_2$; $s_1$
(*) (*) (*)
} } } } else {
(**) (*) return $e_2$;
} } } }
(*)
}
(*)
}
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
[1]
This procedure
uses a feature of Lisp called
backquote (or quasiquote) that is handy for constructing
lists. Preceding a list with a backquote symbol is much like quoting it,
except that anything in the list that is flagged with a comma is
evaluated.
For example, if the value of linkage is the
symbol branch25,
then the expression
`((goto (label ,linkage)))
evaluates to the list
((goto (label branch25))).
Similarly, if the value of x is the list
(a b c),
then
`(1 2 ,x)
evaluates to the list
(1 2 a).
[2]
We can't just use the labels
true-branch,true_branch,false-branch,false_branch,
and
after-ifafter_cond
as shown above, because there might be more than one
ifconditional
in the program.
The compiler uses the
procedurefunctionmake-labelmake_label
to generate labels.
Make-label
The function
make_label
takes a symbolstring as argument and returns a new
symbolstring that begins with the
given symbolstring. For example, successive calls to
(make-label 'a)make_label("a")
would return
a1"a1",
a2"a2",
and so on.
Make-label
The function
make_label
can be implemented similarly to the generation of unique variable names in
the query language, as follows:
let label_counter = 0;
function new_label_number() {
label_counter = label_counter + 1;
return label_counter;
}
function make_label(string) {
return string + stringify(new_label_number());
}
[3]
The continue register would
be needed for a "return"
linkage, which can result from a compilation by
compile_and_go
(section 5.5.7).
[4]
Our compiler does not detect all dead code. For example,
a conditional statement whose consequent and alternative branches both end in a return
statement will not stop subsequent statements from being compiled. See
exercises 5.35 and 5.36.
[5]
We need machine operations to
implement a data structure for representing compiled
procedures,functions,
analogous to the structure for compound
proceduresfunctions
described in section 4.1.3:
function make_compiled_function(entry, env) {
return list("compiled_function", entry, env);
}
function is_compiled_function(fun) {
return is_tagged_list(fun, "compiled_function");
}
function compiled_function_entry(c_fun) {
return head(tail(c_fun));
}
function compiled_function_env(c_fun) {
return head(tail(tail(c_fun)));
}
[6]
The augmented function body is a sequence ending with a return statement.
Compilation of a sequence of statements
uses the linkage "next" for all its component statements except the last,
for which it uses the given linkage.
In this case, the last statement is a return statement, and
as we will see in section 5.5.3,
a return statement always uses the
"return" linkage
descriptor for its return expression. Thus all function bodies will end with a
"return"
linkage, not the "next" we
pass as the linkage argument to compile in
compile_lambda_body.