A conventional computer memory can be thought of as an array of
cubbyholes, each of which can contain a piece of information. Each
cubbyhole has a unique name, called its
address or
location. Typical memory systems provide two primitive operations:
one that fetches the data stored in a specified location and one that
assigns new data to a specified location. Memory addresses can be
incremented to support sequential access to some set of the
cubbyholes. More generally, many important data operations require
that memory addresses be treated as data, which can be stored in
memory locations and manipulated in machine registers. The
representation of list structure is one application of such
address arithmetic.
To model computer memory, we use a new kind of data structure called a
vector. Abstractly, a vector is a compound data object whose
individual elements can be accessed by means of an integer index in an
amount of time that is independent of the index.[1] In order to describe memory operations, we use two
primitive Scheme proceduresfunctions
for manipulating vectors:[2]
(vector-ref
vector n)vector_ref($vector$, $n$)
returns the $n$th element of the vector.
(vector-set!
vector n value
)vector_set($vector$, $n$, $value$)
sets the $n$th element of the vector to the
designated value.
For example, if v is a vector, then
(vector-ref v 5)vector_ref(v, 5)
gets the fifth entry in the vector v and
(vector-set! v 5 7)vector_set(v, 5, 7)
changes the value of the fifth entry of the vector
v
to 7.[3] For computer memory, this access can be implemented
through the use of address arithmetic to combine a base address
that specifies the beginning location of a vector in memory with an
index that specifies the offset of a particular element of the
vector.
Representing
Lisp
data
We can use vectors to implement the basic pair structures required for a
list-structured memory. Let us imagine that computer memory is divided into
two vectors:
the-carsthe_heads
and
the-cdrs.the_tails.
We will represent list structure as follows: A pointer to a pair is an index
into the two vectors. The
carhead
of the pair is the entry in
the-carsthe_heads
with the designated index, and the
cdrtail
of the pair is the entry in
the-cdrsthe_tails
with the designated index. We also need a representation for objects other
than pairs (such as numbers and
symbols)strings)
and a way to distinguish one kind of data from another. There are many
methods of accomplishing this, but they all reduce to using
typed pointers, that is, to extending the notion of
pointer to include information on data type.[4] The data type enables the system to
distinguish a pointer to a pair (which consists of the pair
data type and an index into the memory vectors) from pointers to other
kinds of data (which consist of some other data type and whatever is
being used to represent data of that type). Two data objects are
considered to be the same
(eq?(===)
if their pointers are identical.
Figure 5.17
Figure 5.18
illustrates the use of this method to represent
((1 2) 3 4),list(list(1, 2), 3, 4),
whose box-and-pointer diagram is also shown. We use letter prefixes to
denote the data-type information. Thus, a pointer to the pair with
index 5 is denoted p5, the empty list
is denoted by the pointer e0, and a pointer to
the number 4 is denoted n4. In the
box-and-pointer diagram, we have indicated at the lower left of each pair
the vector index that specifies where the
carhead
and
cdrtail
of the pair are stored. The blank locations in
the-carsthe_heads
and
the-cdrsthe_tails
may contain parts of other list structures (not of interest here).
Original
JavaScript
A pointer to a number, such as n4,
might consist of a type indicating numeric data together with the
actual representation of the number 4.[5]
To deal with numbers that are too large to be represented in the fixed
amount of space allocated for a single pointer, we could use a distinct
bignum data type, for which the pointer designates a list in which
the parts of the number are stored.[6]
Original
JavaScript
A
symbol might be represented as a typed pointer that designates a
sequence of the characters that form the symbol's
printed representation. This sequence is constructed by the
Lisp reader
when the character string is initially encountered in input. Since
we want two instances of a symbol
to be recognized as the same symbol by
eq?
and we want
eq?
to be a simple test for equality of pointers, we must ensure that if the
reader sees the same character string twice, it will use the same pointer
(to the same sequence of characters) to represent both occurrences.
To accomplish this, the reader maintains a table, traditionally called the
obarray, of all the symbols it has ever encountered. When the
reader encounters a character string and is about to construct a
symbol, it checks the obarray to see if it
has ever before seen the same character string. If it has not, it
uses the characters to construct a new symbol (a typed pointer to a
new character sequence) and enters this pointer in the obarray.
If the reader has seen the string before, it returns the
symbol pointer stored in the obarray.
This process of replacing character strings
by unique pointers is called
interning symbols.
A string
might be represented as a typed pointer that designates a
sequence of the characters that form the string's printed
representation. The parser constructs such a sequence
when it encounters a string literal, and the
string-concatenation operator + and
string-producing
primitive functions such as
stringify
construct such a sequence.
Since we want two instances of a string to
be recognized as the same string by
=== and we want
===
to
be a simple test for equality of pointers, we must ensure that if the
system sees the same string twice, it will use the same pointer (to
the same sequence of characters) to represent both occurrences. To
accomplish this, the system maintains a table, called the
string pool,
of all the strings it has ever encountered. When the system
is about to construct a string, it checks the string pool to see if it has ever
before seen the same string. If it has not, it
constructs a new string (a typed pointer to a new
character sequence) and enters this pointer in the string pool. If the
system has seen the string before, it returns the string pointer
stored in the string pool. This process of replacing strings by unique
pointers is called
string interning.
Implementing the primitive list operations
Given the above representation scheme, we can replace each
primitive list operation of a register machine with one or
more primitive vector operations. We will use two registers,
the-carsthe_heads
and
the-cdrs,the_tails,
to identify the memory vectors, and will
assume that
vector-refvector_ref
and
vector-set!vector_set
are available as primitive operations. We also assume that numeric
operations on pointers (such as incrementing a pointer, using a pair pointer
to index a vector, or adding two numbers) use only the index portion of
the typed pointer.
For example, we can make a register machine support the instructions
Cons
The operation
pair
is performed by allocating an unused index and storing the arguments to
conspair
in
the-carsthe_heads
and
the-cdrsthe_tails
at that indexed vector position. We presume that there is a special
register,
free, that always holds a pair pointer
containing the next available index, and that we can increment the index
part of that pointer to find the next free location.[7]
For example, the instruction
simply tests the equality of all fields in the registers, and
predicates such as
pair?,is_pair,null?,is_null,symbol?,is_string,
and
number?,is_number
need only check the type field.
Implementing stacks
Although our register machines use stacks, we need do nothing special
here, since stacks can be modeled in terms of lists. The stack can be
a list of the saved values, pointed to by a special register
the-stack.the_stack.
Thus,
(save reg)save($reg$)
can be implemented as
and
(perform (op initialize-stack))perform(list(op("initialize_stack")))
can be implemented as
Original
JavaScript
(assign the-stack (const ()))
assign("the_stack", constant(null))
These operations can be further expanded in terms of the vector
operations given above. In conventional computer architectures,
however, it is usually advantageous to allocate the stack as a
separate vector. Then pushing and popping the stack can be
accomplished by incrementing or decrementing an index into that
vector.
Exercise 5.20
Draw the box-and-pointer representation and the memory-vector representation
(as in figure 5.17)
(as in figure 5.18)
of the list structure produced by
Original
JavaScript
(define x (cons 1 2))
(define y (list x x))
const x = pair(1, 2);
const y = list(x, x);
with the free pointer initially
p1. What is the final value of
free$\,$? What
pointers represent the values of x and
y?
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.21
Implement register machines for the following
procedures.functions.
Assume that the list-structure memory operations are available as
machine primitives.
Recursive
count-leavescount_leaves
with explicit counter:
Original
JavaScript
(define (count-leaves tree)
(define (count-iter tree n)
(cond ((null? tree) n)
((not (pair? tree)) (+ n 1))
(else (count-iter (cdr tree)
(count-iter (car tree) n)))))
(count-iter tree 0))
function count_leaves(tree) {
function count_iter(tree, n) {
return is_null(tree)
? n
: ! is_pair(tree)
? n + 1
: count_iter(tail(tree),
count_iter(head(tree), n));
}
return count_iter(tree, 0);
}
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.22
Exercise 3.12 of
section 3.3.1
presented an
appendprocedurefunction
that appends two lists to form a new list and an
append!
procedure
append_mutator
function
that splices two lists together. Design a register machine to
implement
each of these
procedures.functions.
Assume that the list-structure memory operations are
available as primitive operations.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
[1]
We could represent
memory as lists of items. However, the access time would then not be
independent of the index, since accessing the
$n$th element of a list requires
$n-1$
cdrtail
operations.
[2]
As mentioned in section 4.1.4
(footnote 2),
JavaScript supports vectors as data
structures and calls them arrays. We use the term vector in
this book, as it is the more common terminology.
The vector functions above are easily implemented using
JavaScript's primitive array support.
[3]
For completeness, we should specify a
make-vectormake_vector
operation that constructs vectors. However, in the present application we
will use vectors only to model fixed divisions of the computer
memory.
[4]
This is
precisely the same
tagged data idea we introduced in chapter 2 for
dealing with generic operations. Here, however, the data types are
included at the primitive machine level rather than constructed
through the use of lists.
Type information may be encoded in
a variety of ways, depending on the details of the machine on which the
LispJavaScript
system is to be implemented. The execution efficiency of
LispJavaScript
programs will be strongly dependent on how cleverly this choice is made, but
it is difficult to formulate general design rules for good choices. The
most straightforward way to implement typed pointers is to allocate a fixed
set of bits in each pointer to be a
type field that encodes the data type. Important questions to be
addressed in designing such a representation include the following:
How many type bits are required? How large must the vector indices
be? How efficiently can the primitive machine instructions be used to
manipulate the type fields of pointers? Machines that include special
hardware for the efficient handling of type fields are said to have
tagged architectures.
[5]
This decision on the
representation of numbers determines whether
eq?,===,
which tests equality of pointers, can be used to test for equality of
numbers. If the pointer contains the number itself, then equal numbers will
have the same pointer. But if the pointer contains the index of a location
where the number is stored, equal numbers will be guaranteed to have
equal pointers only if we are careful never to store the same number
in more than one location.
[6]
This is just like writing a
number as a sequence of digits, except that each digit is a
number between 0 and the largest number that can be stored in a single
pointer.
[7]
There are
other ways of finding free storage. For example, we could link together
all the unused pairs into a
free list. Our free locations are consecutive (and hence can be
accessed by incrementing a pointer) because we are using a compacting
garbage collector, as we will see in
section 5.3.2.
[8]
This is essentially the implementation of
conspair
in terms of
set-car!set_head
and
set-cdr!,set_tail,
as described in section 3.3.1.
The operation
get-new-pairget_new_pair
used in that implementation is realized here by the
free pointer.