Processors in Programming Languages

From Viewpoints Intelligent Archive
Jump to: navigation, search
really
come in a version that
sorry
trickle its way through but Burroughs had this distressing
set of traits
of having the best hardware architects
really in the world for many years and
they could never ever implement these designs partly
because the hardware architects are a little bit purest
major
figure here is Bob Barton who is a
well-known iconoclast
remarkable man he's I think
he said still
my first this is B five thousands the
third machine I ever learned I learned it in 1962
I happened to be in the Air Force and
they had a Burroughs 220 Burroughs
220 was this huge vacuum tube
thing with 5 or
bigger than this room and you could heat coffee
on top of it wait
old machine and had a couple of interesting features on it that actually
partially triggered off some of the ideas to be 5000
Martin first
was really a philosopher he was a philosophy
major at the University of Connecticut and
gotten interested in proving he's already reading philosophy
books and books not logic
a man of parts
and I guess he started coding around
1951 or sellin
boroughs bought a company called datatron
and they set up
they've made a machine called the 204
in the 2:05 were true run taste machines
and a 205 in the borough's 220 was the same
xcept that the tool fire was completely based on a drum
it had the registers on the drum also many
copies of the residues just placed around the drum so
that you didn't have to wait for a full revolution this way to get
things in those days and
they set up this whole latch up in a Safeway
store in Pasadena partly
because of that location they formed the early
association with Cal Tech back
canoes first machine was the
205 and the 220 and this
horrible thing known as mix in the
canoes books is a very reminiscent
of the 205 into 20 in
other words it was a decimal machine
ach position
had
10 decimal digits
and a sign digit and each
one of these had four bits in it so I had
44 bit words and the sign
position for some
reason known only to them and to God they
decided to have all ten possibilities
for the sign position okay
and because there's only plus and minus and then eight more they
decided to assign some meaning to those it was
meaning assigned to those extra sign positions that
that led to
ideas in the be 5,000 I won't go through the whole history but it's very
interesting to watch how these ideas come about one of the sign positions
was saying position six which marked
this thing as a piece of code
okay and in fact the
the machine would not
if you if you tried to
fetch through something if you
tried to fetch a piece of code there
for instance they were trying it would we try
and do a job in other
words they both protected things in a limited way in
memory and there are certain side effects that can be triggered off by
touching things in memory it was that idea which is
being played around with in the 50s that that led to part of what
to be 5,000 was okay the most important
thing about this is
that in 57 and 58 there
is a language developed called Algol 58
and alcohol
58 was a reaction
sort of a purist reaction of the Europeans
Plus Alan Perlis a few other people in this country
too Fortran so they thought
was absolutely detestable but a Fortran had come
arlier so they sat down and designed this thing it was cleaner
and except for the fact of not
having data structures except for arrays and
strings and integers and
floating point typical
alcohol data structures but not having Pascal
data structures this language is essentially what Pascal is
in other words it has one level
of definition
for global
procedures
etc
and [Music]
the some of the versions of this I could
o recursive subroutine calls and other ones couldn't the separate procedures
have local variables
there are a couple other features
language it was basically a very simple but vanilla
language and Barton
head of the project to put this on the burst to
20 and doing so they
a thing that had only been really discovered in
56 57 by by
a few people in Europe which is called the stack the
staff was originally invented for doing compiling and
you have to realize that back in these days in 1952-53
when they're still thinking about compiling
arithmetic expressions this was considered to be an AI problem
okay
what Fortran was was the world's first automatic
programming system that's what it was called automatic
programming was how to get away from writing having
ive explicit sequence when writing machine code
ok so the first task of automatic
programming was to not have to specify sequence
for thanks and the
first parser is back in the early 50s actually
went in and parenthesize effect Fortran yeah
they went and actually put all the parentheses in to
the source code okay
and then went through and removed all the parentheses by
going okay was unbelievable but
that this is the in the days before it be an F there's
no such thing as a compiler compiler so
some Germans
pop up Bower and
a couple of other people
you er can't remember the
other guy Sam Wilson says yeah Santa Valerie
invented
a stack algorithm for doing compiling
arithmetic expressions that used precedents it was the first
precedents algorithm Martin just loved this he just
sucked this up and
the basic idea I won't go
you know you either know it or you don't at
this point this is part of the
that used to be important for people to know but
is yeah the noise now as far as I'm concerned
but basically the idea is you kept
opera Diana from the arithmetic expression you kept operands
if you had something like a plus B times C as
you chucked your way through this thing
you would have shoved into
the stack everything
up until the time side which is the
something and do something with the city and the B and
either have to sometimes they have
two stacks you put the operators and 170 you plus there'd be
an A to B maybe
you push it all the way down so you'd have time
silent CPA here and
as soon as you could find out there's
nothing more for you to do over here and then you know you could generate
code for these guys okay
ou kept on going through in the stack kept
these intermediate results bargain very quickly realized that
a stack could be a dandy idea de for doing
subroutine calls in
fact they were just starting to mumble about the idea of recursive
subroutines for doing problems
so when they implemented the system on the on
the burros b20 they had the idea of
that they were
going to get enough do
things in this order and then generate code through
an assembler multi pass assembler and
they're going to use a stack for subroutine calls they
also had the idea that this
group of global contact should probably be
represented as some kind of table in storage
that had it in either pointers to
arrays allocated somewhere else or
had numbers in them
and in this particularly
implementation the code had to know what these were supposed to
be okay so one
fine guy I don't know when it was 59 or so
after all this had been done
Barton was reading a book
by Lucchese which logic and
what Lucchese which had done is in order to
do certain logical manipulations of things which requires symbolic
manipulation it invented a notation that
had no parentheses
and in fact it was a
I think Lucchese which is
notation was a prefix notation can't
remember can you remember Steve can't
remember whether this prefix anyway the two possibilities for an
expression like this or either saying
something like BC times a
nd
yeah I think the case which is was actually prefix
which there are various ways of writing
that you can say x b c
plus a or plus
a times b SI
prefix use a stack
until you get to something it will actually
has a couple of operands
and think
about Barton is one of these guys has a mind like a net
almost completely
intuitive
your life because he never knew
when he was going to have a good idea in fact he lived for quite a
nd he feared he was never going to have another the
fact he had good ideas all the time I used to absolutely
destroy PhD theses at
Utah where I ran into him because
he'd be working with a thesis student for six months and
all of a sudden he'd wake up in the morning with a total
solution to that thesis and that was the end of that
thesis and so the smart thesis students
like me stayed away from them we
didn't want him to solve our problem and
even though I was a natural
to be his thesis student I went to Dave Evans and
Barton would not talk for me for five years because of that
that's the kind of guy is he's sort of like
William F Buckley to talk to there's
an incredible command in the English language and he's very
very snide delightfully snide
it's a Barton okay
when this when
Bart made the connection between this and this
his mind just started working and
in 48 hours he had designed to be 5,000
okay it was one continuous hit according
was just literally stayed up for 48 hours and everything
fell into place I'll certainly give you an
idea the things that led into it
was the Barton's basic snobbery that
said I never want to write machine code again higher-level
anguages are a good idea then by god what's right in them
never want to write an operating system in the lower level
language never want to write anything in
the lower level language tomah's
the compilers weren't the
machines weren't a good environment for this kind of
thing so he had in the
back of his mind the idea what let's build a machine to run algal
if we think that's a good language and
in fact the Algol that they built it for was this Algol
58 B 5000 doesn't have
all the hooks in it for the
6006 he came along while they're in midstream
machine so here's here
are some of the things he thought about
first thing he did was to
say well
I know we're
gonna be running lots of different jobs in fact the first be
5000 you have to realize in fact most of these machines
had multiple processors on it right from the beginning
teeny little machine -
it was only a 4k 48 50 words had
swapped off a 32k drunk
this is a mainframe computer
and they don't knew they were gonna have to
do lots of fun lots of batch jobs and do compilations
and all those things one of the things that always happened in those days
as a program would start writing into somebody
else's programs core now would lead to strange
results and that
was not a good thing so one of things that they wanted to do is
to protect so we decided that
the
consequences I don't know how he all came to all of us at
once because the interesting thing about to be 5000 is not that
it has a stack that was the least interesting
complete innovation those days there's only one other machine
that has sort of a stack that was called the KDF right
so here's what he did he said
when a program is running
some kind of code here
ach
I only have two kinds of things that I'm really worrying
about there are things that denote operations
and those are things that are going to appeal
directly to the machine so those are
pretty safe and the only other thing I have to worry about
are things up to note variables I notice he didn't say values
because some higher-level languages
don't have values they have variables everything comes down to looking
like a like a variable and
so he asked himself what is a variable and
a variable actually is
some item in this list here or
some item in the list of the procedure and that's all it
is so he immediately misses came
out of this table already but immediately said
there are two kinds of variables are local and global and
all those things are our numbers that are offset
from the beginning of whatever this list is and
since we're going to be running so if
this is the red list here and
this is the red procedure and
over here we have another job that has a green
list and a green procedure
than
any particular variable
like this one down here it's
like variable 19 that's
red and this one over here is variable
19 that's green
completely two different things because they're reference
relative to this guy to this guy
they
said okay that's that's great what we need to have now
is a register that the code can't see the points
to this so that registered
it's called the PRT
whenever a piece of program
is running this PRT register is pointing to one of these table
I'm going to draw it as a table map
okay and now what code turns out to be is
something that can be rather small in
fact this thing I think was up to a thousand law
these are 48 that words
and so to
worry about any
one of these guys the code for that is only needs
to be ten bits long worry about
that and
for various reasons
they
could have done this there's whether they say it's really terrible
criticize this but you know after 20 years and a few
things have been discovered but
they decide also to make the operator
set be 10 bits long so
they could have a thousand operators never
only had more than a few and then to distinguish these two
there are a couple of extra bits
on as headers that told you what kind of code symbols
these were four kinds of code syllables
if I can still remember these
there's an operator
I think I want to get to use three of them
is basically
the operator a what is called a value
call and what is called a
name called
I'll
tell you what those are in a second
so code in this machine was 12
bits long because it had
48 bit words the
register up here for
would hold for instructions
at a time so to have a little instruction cache on
it and just attach those is one of the consequences
of e 5000 and only have to cycle memory every for instructions
now going back to how
executed Barton quickly discovered
that going to post fix is much more efficient
then he asked himself how
am I going to implement the stack I think the first way he thought
of it is that he would simply have another
register here called the s register
that would point into the top of memory
and there would be
the
stack then he realized he needed a couple
of registers to hold stuff
was going to go through the arithmetic part of the part
of the system and so put in two registers
two Hardware registers now actually
what I should do is
let's be consistent here
put the staff like this
okay so I put in a couple of hardware registers that
were logically the top of the stack called the a and B registers
the a and B registers had
an extra bit that marked whether there was something
in the register or not okay
let's call the presents bidder
so
the idea was if you chugged up to a-plus
and there
wasn't a + requires two operands and both
of these guys weren't on this this
machine would know to cycle the stack to make sure there are two operands
in the registers so
did the most efficient thing that it could
would only cycle a stack when absolutely
necessary to memory that all happened automatically
okay so
I'm very logical to think of these these
things are called syllables the code syllables
would have a register to know
what this was called but I'll call it was either called C or
P or something like that
here are
a bunch of code syllables so
this bc times a plus
would be turned into something that would look like
value call for wherever be was
maybe be is number three here so this
would be something like value call
three this
would be value call maybe seven
word see
our operator
50 or something which is a plus
and value call
whenever a is maybe 30
plus
I'm sorry this is x operator
45 which is plus
so
that's what that turns into and this thing
had been this thing had been in originally
an assignment statement into a variable called
e
and the code would have come out
he
underbar a sign
and that would have turned into here
would have been a name call
on wherever D is
these at 50
say Lane called 50 op
sign
they just we
numbers there and just draw these things in
okay so
what name call does is to
generate an address that becomes an operand assignment
is an operation in this machine and
there are a couple of interesting
this is penetrating at an instant
of the most important things you can ever do in
a higher-level language which is to be able to
give other senses for store and fetch
and in the hardware of this machine never been done
programming like reports in the hardware this
machine will talk about what's right and what's wrong with weight it turns
out you did it the wrong way but you be forgiven
because nobody thought before in fact borrows
use of it because most of the programs that borrows do not understand
this particular feature she
okay now a
wonderful thing
further
about this machine is that if a and
B or C we're procedures the
code is exactly the same okay
compiler does not care for
the following reason that one of
the bits in this program reference table
it's called
a flag bit and
it is there to distinguish direct
operands like integers and floating
point numbers and the
way integers and floating point were stored was such that
there's only one Plus that you needed there's
only one operation over here and the
system could tell whether there was an integer floating
point when the flag bit was zero it meant what was stored
here was either an integer or floating-point
okay when the flag bit was one
meant there was something that needed to cause
an interrupt to be triggered off to other parts of the machines hardware
yeah yeah this is what we
want to do is look at these guys when the thing is marked is
what so what I'm saying here is that
there's no such thing number one is the code knowing
what it's going to work on it doesn't it's
a direct translation of this which
in this form B could just as easily be an integer procedure
as anything else and your function
okay and number
two the code is not allowed to
see any of its own state and
note that the code cannot see
its segment that it's in unless
there happens to be a pointer for it in this program
reference tables everybody see that okay
the code contains no addresses and in
fact there's no way for the code to generate any addresses the
only addresses it can ever use are
that's been given in these registers this code
cannot see these registers the code cannot see the
a and the B register it can only use the
absolutely critical this is what it's so clever
all modern protection schemes are based on
this idea this is the idea that turned into what is now known as capabilities
but
done almost perfectly the first time around
when I was done in Baltics it
okay so let's take a look at
these are these
48-bit words with the flag bit of one
they had enough room in here that
they decided what they would
also do is to make the storage conform to the exact
objects that the programmer wanted to use which are little variable length
segments okay and the segment
he thing that not only has a starting address but
also has an ending address okay
so for instance
an array would have something like
the following format
his would be the base address of the array
this would
be the length of the array this
would be the fact that it is an array as
it's additional type information there'd be a couple
of extra bits here one of which would
indicate whether the thing was in storage or not
the idea is this bit where
zero when you ran into it here meant
you had to go out and fetch it before you could use it okay
so what this was was a protected
segmenting scape another interesting
one here is procedure
is one of these guys
think of this code if you want
as a base address
it has a length
as a present spit
so
forth and there a few other ones
okay so what do we have
here we have a scheme
I got draw I haven't shown everything
yet not to show how subroutines are done let's
review for a second we've
got a scheme and where the the code only refers
to off set of registers that can't see
no object in storage
can be referenced directly by the code the
code wants to see something there has to be one of these
guys in its program reference table
what's in the program
reference table is in there unambitious lee because the flag
bit it's not in there as bits if the code has to remember what
the heck it was so
what can you do with this well suppose
one
of the things you can do for instance is replace see
here with
a procedure suppose
we want to enrich in this thing without changing the code
you just put a one flag bit of one and point
this off to another piece of code down
here every time you go through and execute this code
and execute the procedure and generate the result from it
his procedure can just as easily access
a file one of these things can be a
prominent going to another process the process is one
of these whole conglomerations
another one over here
okay so the
in this machine you can implement directly
things like streams in UNIX
okay what you implement it just simply any variable
can act as a stream but
so far we've only talked to that about them as sources
let's go through the how
execute some code here
suppose suppose we have the code
e5
okay right at this point we don't
happen to know whether there is an array or a
procedure and if the caller doesn't know
either what it does is it says but
I'll do it different color so you can see it says now you call
oh I
know what the other one was
no it's no it's just small subscripts
actually it was was a small immediate
operands that's what it was
0 0 0 1 1
0 1 1 yeah
these are small integers
operators
o
this thing would be immediate
five now we put the five
in the top of the stack
then the
I'm not going to go through
right now exactly how
this how this does in fact
president was slightly more baroque way than
it needs to do then
would do a value call assuming it's on this side of the
assignment arrow do a value
EU never is one of
these guys in here down here so if you value call on
95 or something
now if it comes in here
and the flag bid is on the flag that is off
there's a little little more code
generated here if the five minutes is off it's
going to complain it's an opera
opera hat waiting in the stack if the flag
that is on here remember the five is is in
the stack already then
going to look further than if it sees it's an array what
it will do is automatically index
the array against the base address but what it does
is it checks the this guy against blanks first
and everything if everything is okay it generates this
other way generates a bound check error so BAM is checking
done automatically by the hardware here if everything is all right
then what it will fetch into the top of the stack is
the
value from that particular array
important thing to realize is that
arrays there's an array
point razor
things like this program reference table arrays have flag
bits also okay
so it's a multi-dimensional array fetching in one
that has a flag bit on it's going to trigger off another indexing
and we'll start peeling the staff dry from
all the indexes of the rail just appeal so rays are stored
as trees on the be 5000
until you get to something that's an operand then
I'll finally fetch that and answer it will have the answer
even more fun
that comes in here to E and just discovers
it's a procedure it does exactly the same thing except
he procedure with this guy as a parameter
so bharden having a somewhat
symmetric line
ask himself the question of what suppose I
had something like this
what should i do
there now the answers is this is a by
being on this side of the assignment operator this
part of the thing is going to be replaced with a name
what I want you to wind up with is an address
that this thing can operate
on okay so far so good and
happens with array is it ripples through the same way except it doesn't
fetch the last guy just we use the last guy in
the top of the stack this guy can
work on so for symmetry he decided
that if this guy happened
to be a procedure and it was called with the name call he should actually
do something rather than complain this was the master stroke
and what he does do is to call
in such a way that the procedure itself can discover
what it's being asked for a name or a value that
was a test you can make in this language
called Bal Gulf was in the beginning of a
saying if somebody want me to generate an address or to somebody want
a value from me and so in
the complex Bal gall procedures were actually
two-part procedures each part was a function that would produce
a value it was also a procedure for calculating
what that address was
that works for many
ou want to do turns out it doesn't work for all the things
you want to do but it's very close it certainly
allows you to simulate arrays and other
data structures I think you can see that if you
put this thing in you can simulate the semantics
of most data structures completely
cover up whatever representation is actually being
and this was something that was used to a limited extent as
programs didn't really understand this in
particular you can replace variables with files
you could have input
and output pipes it would fire off
things that's what a lot
now the final set of mumbo-jumbo
here is how the
stack was used to
also
store subroutine state
that
that way is as follows there's another register called the F
down here
let's
think of what it has to get shoved into the stack when you call us every
day somehow we
have to have the following information there's
a program counter that must be here somewhere
that is indexing
into this code notice the program counter is
relative to see
okay see see could
be anywhere this is was the first computer to ever have
so the various things we need to store are the program
counter of where we were and that counts
as guess what one
procedure segments these
two guys just bottle up
in there we need to
know where the previous stack
frame was and we have to know
where the current one is because we have to get these
guys relative to this frame register
so the several ways of doing this
at the be 5,000 does not do this in the cleanest possible
fashion a modern
way of doing this is as follows
is to
point in have the frame register point into
here and index local variables
negatively relative
to this so the local variables and the
frame information
the old program counter and stuff like that indexed
negatively going that way and use this
stuff here for the arithmetic operations in the new procedure
ok that's the way small talk does it
the way to be five they didn't realize
they could do this on the me five thousand so they they went through
a more complicated thing of
marking the stack first and a few other little goodies but
this is good enough to show
hat
what is in part of this thing is the Preet the old F
which is a pointer back down the stack
into the previous frame okay
information you so think chugs along as much
as it wants as soon as you hit a new procedure thing that triggers off
a procedure call it bottles up this guy and
that guy into the stack assuming
that in typical Polish postfix
form that these all of these variables
here have been loaded into the stack first
okay it just doesn't even know it's going to call
a procedure just go chucking along the early eventually
run into a procedure call and all of a sudden it discovers
all of the variables have already been
evaluated and they're stuffed in the stack in the right order
and when this F register goes in then the
parameters are just this
reason it knows is that these ten bits
that this thing is built up
into for these two guys is actually broken up into
two smaller segments
for global and local
so that's basically how the
machine works when
you do an inter process call you run into
something that's a process it bundles up
more than this it actually talks all
of the state of what's going on
into the top of the last sack that
you have and stashes that in a program
reference table in the operating system program reference table
the operating system is the scheduler okay
because that's what it is you want scheduler
a bunch of things that have just processed descriptors
in there and you just go from one to the other and
you're either executing or closing them up so the subroutine
call time on this 1961 the machine was
just a couple of microseconds
almost as fast if the operations were itself because
it have to do it didn't have to do much more than
what it was doing on operations the result of
this thing is that the procedures that you wrote
were very close in
thinking to the notion of extending the whole
operator set of the of the computer is like a programmed operator
set that you find in blessing machines like the
but fit
into this one homogenous scheme
so what we have on the unbe 5000
is have complete protection one
this is one of the few architectures that actually
exists today they're still selling these goddamn machines forget
what it's called fifty seven hundred fifty nine hundred out
every couple years they upgrade the
hardware on it but it's basically such a sound
design I have so much code written for it that they're still
selling the goddamn things the reason
is you can't break the operating system just
cannot break it because there's no way you
have to do extremely special things in order to be
able to get any kind of descriptors which what these things are called
- any piece of code or any piece
of segmenting the very first
he third one of the first multiprocessor
systems and again you can see why because
protection is nearly absolute
as you can get you can afford to have several processors kicking around at
the stuff that's there and the prizes are
put to good use because the storage swapping was so slow
that one of the processors would be running was
omething while the other guy was trying to swap stuff yeah
you have programmable data
on balance
checking automatic subroutine
calling and process switching
and a limited amount of code cache one
batch for every
okay so that's that's sort of a quick pass
through what to be 5,000 wise now
funny thing about this sort
of the despair of anybody who actually knows about
this machine is that almost nobody knows about these
ideas these ideas would look goddamn good on
a chip from Intel er Motorola today believe me
I think we can all use them and they're a couple of
things that we might like to do to to fix up going
after Pascal like data structures but this would be one dandy
Pascal machine and it's one of the things you have to realize
that one of Klauss worst teachers was Bob Barton
Klauss went and Dave Evans
okay Klaus was a graduate student
of Dave Evans when Dave and Harry husky when Dave was at Berkeley
and he
then came over to work with Bill McKinnon at
Stanford where they had a be five thousand and thousands
first two languages were
there's language called Euler which is a particularly
good good design it was implemented as
this this whole thing that led to Pascal P codes is
right here okay Pascal P codes
are sort of a crummy software way of doing what this machine did
in hardware this is what is so amazing about the
as most things that you would like to have in a programming
language they had to extend Balga or to
handle features like multi processing the
the protection the fact that you could actually change types
dynamically here which bound all didn't allow but it's
like they came up with a language called s Paul
I can't remember what s Paul stands
for anymore but it's basically an alcohol extended
for systems programming and it had all it was language
that used all these features directly that's what they wrote all the
operating system they never wrote a line of
assembly code there was never a sembly the
fate of this machine was kind of funny because
I remember when when
burrows was Hawking it they were
so proud of what this
machine could do but they went out and hired a bunch of college
graduates as salesmen and actually
taught them what the machine was worst
mistake they ever made because they immediately
went out and completely snowed all of data-processing
managers remember this is 1961-62 telling
them about this machine that was distinctly not like
IBM system they'd ever seen before and
it just scared the out of everybody they made up they
sense of humor they made up a game called compiler
games it was a board game that you played
like Monopoly except what I would do was show you it
would actually generate would take any piece of alcohol
code and generates burroughs syllables
he just went through it had the stacks the little
the stacks you know you just went through and it had a me
and you just went through the thing and by god the thing
would showed you how simple compiling was on
this system that scared everybody that's
how I learned how to compile when I was in the Air Force using
this damn compiler game I'm looking for one ever since
must be some around the other thing they
did is they came out without Fortran on
the grounds that alcohol 58 was infinitely better than the
soon to be a now the algaas they upgraded this to Algol
60 was infinitely better than Fortran which was
true and they could run Algol 60 with no
appreciable overheads unlike how and on the other
machines nobody bought it in fact
this machine languished for years until
they wrote a preprocessor the translated
Fortran into alcohol and then compiled that
into these things and that is
how that is how they finally sell the machine they fired all of the college
graduates and got guys like the old SDS
salesmen you ever asked them a question they would say I don't
know the answer but can I buy you a drink
sent them out and then they finally people started
noticing that the machines didn't crash very often like once every
two years and that
they could even these small machines could
twenty four thirty jobs at the same time interleaving
all the stuff they're incredibly efficient
okay so
generally speaking a a
system like this is this
is sort of the place where
you start thinking what you start about higher-level languages and if
you're thinking about doing Pascal a couple
of modifications this gives you a damn good Pascal
machine it is essentially the algorithm
that the Mesa people
put in micro code in the Xerox
PARC machines Mesa is sort of a super Pascal it's
the algorithm that Cosworth put in the
Loess machines
that were spin-offs of the park
stuff does pretty well what's
wrong with this
well first thing is
that it has such a determined way of
doing things that
one might ask the question
is how does this dough do less okay
the most natural way of doing Lisp here is to
have these guys point to segments that are only two words long
it turns out that is a disaster
because remember the thing thinks it's trying
to swap segments the whole system had
an assumption about how long segments are like they're an average of
40 words long which is a reasonable swapping
size and strings were longer than that so the
first first time Lisp was tried to be put
on the powers be 5000 just
he last time McCarthy ever looked at a machine like this
he made the incorrect assumption
that since list wouldn't run out of be 5000 that a
higher level architecture is a wrong idea that is false
but it was such a failure that an
apartment in fact Burroughs has compounded that error
over the years they grew to love
this with a passion approaching that
of religion and essentially decided
that didn't run on this wasn't a good system
okay which is the usual way he defined
the problem out of existence and
so as a result Barros has never entered the
mainstream of research never ever
and the current borrows machines don't do much better on Lisp than
the old ones did and I
found that this is sort of to my heart because I adapted this architecture
on the first machine I ever designed around
1967 or so tried to do a desktop
machine that directly executed a higher-level language
of the euler type but with multi
processes as being the basic idea that it turns out
it doesn't work the reason it
doesn't work is basically
cost performance
what
do I mean by that well this
to balance what this does for you with how
much machinery actually have to put in there with
how many times you're actually using it with
how many times you're going to change your mind and what the
data structures are going to be like and
notice you can you can make any kind of a
data structure you want here with a programmable system
is it just gets cumbersome because this isn't a good way
of extending things so when we finally
set out to do small touch we had a model in mind
and small talk actually if you squint at it
you discover that this thing
in small talk is what we call the instant state
and it's usually much smaller and
these stack frames and
most of the small talks are actually separate objects called activation
records which are allocated separately rather than
on a single stack and if you do that
then you wind up having an object oriented architecture
okay
that was that was partly led to by reflecting on how Simula might
be executed on this on this machine
but the thing that you have to take care of is
this basic assumption about what storage is
going to be the real question is how much
can you afford to pay for
machinery on data structures most of
don't know what they're going to be when you start out and
at Parc was instead of building any machinery on
this thing we tried to make the Machine run enough faster than
main memory so that we could change our minds frequently
in the microcode as we went along and
we finally staff settled
on a few standard formats larger than the number that's be 5,000
had that would do about 95% of all the
structures that you could construct in small
talk this particular direct
has anybody seen any of the bugs I know
Steve Steve Saunders knows all these but
notice one of the things that happens when you do a process
which is pretty unfortunate
is anybody see
in the stack that you don't want to have in the stack when
you do a process switch
and we can have integers in here that's okay
doesn't matter what else can be in here
yeah we can have some addresses of data and
addresses that they are absolute addresses
okay and that means that
you can have an one
of these array descriptors or a procedure descriptor in
here that's pointing to something in
storage and a frozen process that you're not going to use for
might like to do is to clear out core and
let the new process run for a while and
the problem is if you do that then the chances of these things
to these guys when you come back and is going to be zero so
that led to some immense first
set of immense clue juries which
persists actually to it to this day they're on the
6700 has this horrible
instruction I forgot what it's called but its
purpose is to chase down the stack and find these guys
and make them turn them from absolute guys
the relative guys will then get triggered back
it does it does this when when it something
is going to be moved that belongs to this guy yeah
barf terrible
that one we can't pin on Barton
he was long gone by the time they put that one in there
but it brings up the notion here that the mapping
scheme which worked so well on this fairly primitive machine
doesn't extend I think
the one that yeah
would check to see that
no reason okay the
storage manager made sure that that happened
but one of the one of the problems with it is that
in most of the early successful system they actually used
this structure as the map also in
other words the operating system would look through this thing
and this would be besides the program reference table
for the program it was also the storage mapping structure
of the whole system which is trying
to make do a little bit too much double duty
you see what I mean because you know you
have integers there's there are too many side
that can happen and make it go on so the weakest thing on
this machine they realized by the time we
this stuff at Parc was that the storage mechanism wasn't
sat down and thought about it we realized that we in
actually know almost nothing about allocating
storage it was
problem in the B 5,000 it's just be simply that
getting caught and we're always spending the most time trying
to figure out how to allocate storage how to
how to use it and
in small talk the scheme
that we had was to have much shorter
addresses and the addresses weren't based addresses at all but
actually names of objects
okay so the instead of pointing to anything in core these
things denoted the entire object and the object was found by
looking up in a step separate storage table that
would tell you where the thing was it's more
it's a cleaner solution to the thing and it actually worked
small talks schemed was the first
segmenting scheme that actually worked thanks to Ted
Kaler and Dan Ingalls they actually figured out how to do it
what you have to do in order to make one of these schemes work
is amazing why
should anybody go to that trouble well if
you are swapping objects
thank you always are interested in knowing is what does
this thing called
working set working
set in most operating systems is the
that you have to have in so that you won't swap more
than every like 10 milliseconds or something
like that in other words you want to have enough context
in storage so that you're not swapping on every other reference
and paging
is terribly inefficient because the matter how
allocator is it's always allocating the wrong objects
on pages and the
discovered when we did this in small talk was that
the packing efficiency of having objects
packed in the core by a compacting allocator was
a factor of 15 over what pages give
you okay that is
tremendous it's like having 15 times as much core
unfortunately the overhead for
getting objects is much worse right
you have many more objects and
several thousand objects and stories instead of a you know a
small number of pages you have to go through much
worse mapping scheme it's not just a table lookup anymore
you have to do hashing my name's
you have to fetch small objects off digital
disks which is not good when you're writing things out you have
which is worse and
that was what
caused this scheme to fail and the Flex machine but
then as I say Keller and Engels that
people pitching in from the sidelines came up
with a composite scheme that solved all of those problems
using partly a technique
of realizing that the last thing you ever wanted to have
in storage was written on objects
okay and so whether you whether the system wanted to
r not it was constantly jumping through every couple of seconds it would jump
through and write out every object that had been written on whether
it needed to or not and so only I
think something like 4% of the occasions
bring something in did you ever have to write anything out to make
room it was always clean stuff in court
that turned out to be the key that was adapted
from the great old algorithm on the SDS 940
the Butler Lampson thought out but this time for
objects rather than pages so the result of it is
that the page the swapping efficiency of
Smalltalk 76 worked out to be a factor
of 9 over paging in
fact the in that particular system never ran in
more than 80 K bytes ok every
demonstration we ever did in small effects 76 only swapped
an 80 K bytes something that is quite
and part actually just after we discovered at work we
actually decided to keep it at that size because that was a size
it was going to be possible in commercial machines
very very shortly the equivalent
performance to a system like list was equal to about
almost 10 times that amount of storage it
took a kinder list on the darada you
have to have something like 350 k words
16-bit words in order to get the same swapping
inefficiency and difference they have almost an order
of magnitude which was remarkable okay
so that is that
is the basis for thinking about these
systems now let's talk about a little bit about the wrist
chip
very interesting what they tried to do
wrist chip was designed
an innocence or at least a partial innocence
of the be 5000 which is probably just as well first
hing after learning something like this is to forget it you
know it's sort of a useful fashion you
don't want to remember anything directly just want to survive
the air but this yeah you wanted to seep through
your consciousness when you're designing but you don't want
to try and imitate it
one of the things to realize is that most
of the early machines were reduced instruction set
computers there's only the IBM
and Remington Rand corporation
and control data had this thing where they sold
computers on the night and
that led to a whole bunch of what we used to call it drop
screwdriver instructions debugging
the machine and screwdriver drops inside and shorts
something out the machine does something you
it a feature rather than a bug and write it into the 500
or so instructions
this is what literally what they used to do a
long time almost none of them can be
compiled - and most of them can't be understood by machine code
either you know they just sit there with
cubic feet of hardware to try and execute them
machines like the PDP one and
other delightful things that sprang out of the world
when computer he had these very very
simple way about going to do things and people noticed that they
were pretty efficient even though they hardly did
instruction the reason of course for that is that
yeah if didn't take any time the machine was incredibly
efficient it didn't have to hardly do anything machines
are really sort of like ticking clocks
o from in the sixties
what is the entropy of the program
so in Paterson and se
caen wanted
student project they wanted to do a processor they
had a chance to look at this thing called the OM
processor at Cal Tech which was a
attempt to do an extremely clean computer science
e type processor by Ivan
Sutherland and Carver Mead turned out that was an utter failure
it really was one of these
things where they got so enamored in dealing in the
what the machine was supposed to do they put it I mean
you know all the things that you read about in the books that are good
they tried to do in this thing and it turned out to
be a flop because the chip got very very large
still only a 16-bit processor
just didn't do very much so
basically what
what they did was to take a bunch of
C code and pascal code and
use a variety of ways of
analyzing it's a thing called a gibson
mix analyzer which is a dynamic way of
analyzing what percentage of instruction is getting executed
they look at the code statically
they came
up with I think everybody has the the document
they came up with a set of
percentages of what things got executed when and what you
had to do and they decided they had only optimized
two things one
was to try and have the
instructions that executed 90% of the time executed
one cycle and the other one was to
try to reduce subroutine overhead
okay and that is what they
that is what they did
and so they came up with this mead Conway
design which is
traightforward in architecture as you could want
and
several interesting things about the architecture I'm going to need some help here
sigh I actually didn't get a chance to prepare for this
because I I gave up my copy of that document which I haven't read for
like six months but
so call out
when I start going astray
because of the simplicity and
regularity on this thing they got an incredible
ratio of controls day to
act action state
you know what it is but
I think this is like 25% it's like 75%
maybe it's
30 70 that's about almost opposite
from what it is in most chips
most chips at least half of the stuff
that's on that ship being in control gates one kind
of another so control is very very simple here almost
all of the silicon area on this thing is used for doing real
things and because of that on a forty thousand transistor
chip they could build a full
32-bit machine that's
what they did so this is 32 bits wide
and survive
it really into two parts
and the
uninteresting part is sort of this part here
and that is where
things are actually done there
are bunches of latches and ALU
shifters shifters first I
Wi-Fi recall and
some buffers and so
forth that's where stuff actually gets done
on that ship then this part of
the chip has I think 132
registers and if I remember the number correctly these
registers are mapped registered registers
words they're not registers you can get to directly
but their registers that are used implicitly
by the code in the in the machine and the mapping
is kind of clever let's see if I can remember it
he register
space for each process is 32
so you can get to 32 registers
if I if I'm correct I believe nine
nine of these are
and the
other 24 are local to
each subroutine
then somebody thought of a real clever idea this
is an incredibly clever
yeah they had two problems they hadn't saw one problem
is how do you pass parameters back for instance
parameters are passed in the top of the stack just
as opera and the be 5000
and the other problem the problem
is how can I map these guys
in here in a very rapid way so I
don't have to futz around I want to have 0 over again somebody
got to what we should do is think
of these registers as
being overlaid like this
so
I guess they grow from the bottom to the top so let's do it that way
here's 24
what we do at 60
we're 16
cuts off we overlap the next set of 24
okay
result result of that is that these
two guys share six registers those
registers are what it used for passing parameters forward
and back these are if you will the in/out
variables of the Pascal type
thing that they're doing and thus on it
goes now what that means is that the
start address is of each register block that you care about
is on a boundary
that can be gotten to you by straight logic
okay in other words they only in
order to get one of these offsets they only have to or a
number in to what ever
they don't have to do any actual additions
just or whatever if you want
to get variable five and this guy
here variable
5 this guy you're just oaring a zero 101
into whatever
one this this thing is in this array
so there's no add here
to calculate any of these things just slam the things together
and whichever one is active goes
up and down how many let's see how
many of these is what is this 10 frames
or something 10 frames 12 frames
everybody know 16 so it's probably
9 frames or something like that so you could store
nine frames and of course in a language like
Pascal which is almost never used recursively
this is a perfectly reasonable
depth there's
statistics showed that most processes that were
written and passed down especially in C never
went to more than a depth like this so it means that the thing
never touches storage for all of its temporaries so
it just runs like a bat out of hell
right this
this particular this particular scheme is not nearly
as good for less for instance
for several reasons we'll look at it's not very good for
small target because as we noted the small has
both an object state and an activation state there's
no place in here for the object state it
has to stay outside so if you want to use this chip for
doing something like small talk then you have to
do something to it the
other drawback on this is that the code on
this thing what he thought was is
I want the object code to be like micro code so
the code is humongous every instruction is 32 bits
long whether you want it to be or not that's ridiculous okay
that is what the does that is what the design of this chip
is so if you're willing to put up with 32 bit code
on every squad this thing
will run each instruction that
it has in one cycle through
the thing so there's no appreciable overhead on any of the
calculations
that take place in here that use this thing the cycle
on this thing could be I think the chips that they're making now their first
ones are are executed 200
nanoseconds of crack and the design
this thing was to execute instructions at 100 nanosecond
crawling their way up to that now it's quite easy if you see
there's nothing but the thing is doing look at a 68000
which is less than twice the size of this and it's
a nightmare this thing is just laid out like Manhattan
Island and it's very very simple
the control is very simple so
now
you look at this thing you have
thinking thoughts because a hundred nanoseconds an instruction time
is not that bad especially if you know ahead of time that most
of those instructions are really working for you you've got something
that you're worth it's really worthwhile thinking about building
we know
or at least we pay lip service to the idea that we want
higher-level languages here but very high level
languages so the machine that simply execute
spaz gal and see very rapidly it's
not nearly as interesting and Atari research
as it is in HDD turns out
Apple is very interested in this machine they have been courting Patterson
for some time because
this is their language is
a Lisa is
a language that is
basically Pascal with classes okay
so those some of the things that small talk
so what can you do to this check
I just I think what
I'm gonna do is just give you a couple of hints and into
the next one one of these that we do on this thing will
go through the schema more detail one
of the things that we can do
make it a little bigger
what we sure would like
to feed in here our bike coats
these things like those
code syllables one of the most ridiculous things
in the world if you have a reduced instruction set
is to have the instructions be 32 bits long
why can't we make use of the fact that we're
actually doing a higher-level language unless we know where
is going to be it's really all relative addressing so
we'd like to we'd like to use that but
to keep the control simple at some place
we have to have a PLA that will translate one of these
guys to the wider paths necessary to control everything so
one of the simplest things that we can do here
is to
think of this area as being an instruction cache
now in all the all
of the studies on caching it's only a few facts have
remained one of the caching works where paging doesn't because
ratios between the two memory speeds are different
okay that's basically what it comes
to catching has very small objects
and the ratios are usually no worse than a factor of 10
so it works paging has very large objects
usually in the ratios or factors of thousand or a hundred so
it doesn't work the other thing that's
known about caching is that code behaves much better
than data does like
a reasonable set up machine code
has the
a very important feature that you
override it in the cache you never have to store it back
out so the cache is just sucking stuff in
then the question is what is the cache look like is
it a full of sociation is it a overlapping
code segment
thing that held has to do with how complex
the Association paths for interpreting
addresses are that went down this way this control
side here right up here would
be a barrel shifter
actually limited barrel shifter
that is taking
these syllables and feeding
them can think of these things is really proud of the
variable size maybe 8 16 and 32 besides
things what
it's doing is looking at the leading order bits of whatever's there
making a decision as to how much it's going to map down
through the PLA that's going to cause control to
determine standard sizes
true enough
yeah we should mention besides
being a general man about
town and a good fellow Steve Saunders thesis
was on minimization of code size through
various kinds of compiling techniques and by
going as he says by going to a variable
length by taking the language that you got and going a couple
of steps further than the risk people you can build a compiler
that will generate optimal size code which
can be peeled out just as easily as
variable size of a few things definitely
true providing that
the largest size is no larger than this
otherwise everything else is really quite easy
okay the two more things that you have to do to
make small talk or a language like
it work on this chip there has to be
a place where instance state is put I'm
going to talk about that more next time let's just think that
let's put their turns out in a
language like small talk there's almost no depth of stack
I think you can sort of because
control is always passing sideways to objects you're
rarely going deep you're going sideways all
the time and so the typical stack
depth in the small talks is less than five
so you need to have a cache for
instance state here
we can talk about what that looked like later
on then the final thing that we have to have to make
a scheme like this work for small talk or Lisp is
we have to let arithmetic run quickly
or another way of
putting it there is a subset of these byte codes that
have to fulfill two purposes one
is they have to refer to
operands that are protected like in the B
5000 okay these guys better
not know they're going after the integer or all is
lost so that
hey're picking up it has to be something that was protected by
one of those flag bits somehow and
cause an interrupt of what I picked it up as an address to
something more interesting in an integer okay
that procedure in Lisp is called unboxing
typical Lisp we're
small talk address space so what a small
talk is laid out in small talk 76
16 bits
pointers
so that gives
you a range of 65,000 things you can refer to
each
thing knows how long it is and the storage
manager knows where they are okay so these things
are actually names the average object
size in Smalltalk 76 was about
20 bytes so this
gives you 20 bytes
is about four and a half bits so
this gives you about a 20 and 1/2 bit address
space okay
equivalent objects does everybody understand
I'm sure in
other words so we we could refer to about a million words
using only 16
bits for a pointer size
turns out you'd like to refer
to more than a million words
ixteen is maybe a
power of two but boy that's all it is
everything else is wrong 24
is much better turns out but
then in order to there's
certain small things that you'd like to have implicitly encoded
empty addresses in particularly
encoded I forget what the way wound up
but at one point it was like - mm the integers between
- mm + mm
were encoded into a section
of that that is they never existed as explicit objects and storage
assistant
and resse is in this range that these things are actually integers
that it should unbox these are called boxed integers
there's a box around them
other images were actually encoded as real objects
part
smaller okay
small arms in this
okay and the
so whenever these are encountered
they have to be translated into something
that's a real integer that can be go through the
operation that translation has to be done at the last moment 3600
has a fair amount of circuitry to do that
one thing we could do on this
system what's supposed
to be moseying the thing up to 32 bits wide
like this machine is now
I don't think we feel too bad
about having a flag bit
the fine bit can be very simple in
fact I think in the scheme that I came up with
I put the flag bit forward there
and let it be zero if it
okay
and if it was if it wasn't one then the thing
else in the other words we give up half the address space
for numbers it's like what to be 5,000
does and then I think we can see
that if
we decide to have a particularly simple encoding between direct
operands and addresses
and by the way I
think you all realize that when I say one bit I mean
but one bit is enough for doing all
operating system things we want to do and essentially they're
what we're what we need to do is to have some of this
control state let the action
go ahead because
after the ALU there's this latch here
that has to be in the right state in order for the
operation to mean anything okay and so
one way of handling this problem of how can we write an
system in a really high-level language is
just say every time we do a primitive operator like
plus we realize that the plus may have lots
of different meanings the small cog for instance that are probably
25 different meanings for plus 25 different routines scattered
through the microcode and small talk code for plus 150
meanings are print but
there's one plus that better run as fast as the machine
can run it and that is the one that adds two integers
together okay the way to do that on
this machine is to just let
the right here you
get to get a chance to look at both flag bits just
let the operation perceive and
then the decision
whether to recycle is back on the C bus or the BB
bus like this machine has it has to
state of those two bits are if they're both zeros then
you got the right answer they're not both zeros you're going to have to
do something more in which case the way you to do that is
interrupt but then looks at these things more closely
basically ideas you have the generation generality
you're willing to pay for but if what
not general then you wanted to have no appreciable
overhead whatsoever you do
that you now have a machine that will run if this thing runs in 100 nanoseconds
doing what we doing what we said
it will run it'll still run a higher-level language and run all
things that have to work quickly to do an operating system at
that very same speed you've got
a very nice little chip we'll talk about in more detail next
eve probably has some comments
yes you
might mention something about the kinds of encoding
efficiencies that you get when you actually look at
syndrome
pop-pop-pop-pop
them do an operation with all the
attendant shuffling bits around if you
tell tell the operation what the compiler
knows about it namely whether its operands
we're
computed basically have to been pushed
on the stack or whether they're direct
in which case all you feed it is an address and it just gets it
you
can save a lot of Pushpa
that combined with
variable
size fields dynamically variable size fields
four addresses the only
happening locally
known names besides the locality varies
just to do those things straightforwardly
roll that together with the overlapping
in fact that's where the small named
what kind of packing efficiencies did
you get in here thesis over listen
bliss 11
was taken as justice unity
ctype language right yeah
thousand
benchmark immunity
my thesis stuff without any
optimization that was the set point just
just not customizing
straightforward nation on this variable
and that's that many fewer benches form
with that much smaller phone sessions
that's
why I say 30% like this the reason
this is a very nice for Atari this
is a very nice area and
generally speaking there are other things you'd rather optimize
first like the graphics situation but
once you sort
you can do there this is the next interesting thing which
is how can you run say
artificial intelligence programs of some scope on
a consumer level machine without going
six year you know the double cycle
to get enough memory on the things
this is one way of doing it small
talk was predicated on the idea that there would be some
sort of solid state swapping device like a bubble memory
that would match
up so it turns out a bubble memory is a perfect
device for the particular algorithm that we developed a
small talk 76 because it's not too fast
it turns out it works better
when the on a crummy disc it does huh yeah
what you have is not as good as you'd
be in a particular way we went about doing things where
it's much better and you
get a system that can do real live no IRA is PI
system which is an enormous enormous
system done in a small time he uses everything in them that the dam
system had there's huge millions of bytes of code swapped
in 80k bytes and it ran quite
acceptable e running byte
codes at the rate of about 300,000 per
okay this thing can run what we call byte codes which are those fast ones
at the rate of 10 million per
speaking of aortic recep another piece of work
diversity
this kind of a mechanism which is in fact a general
simple processor the only simple system
that one should wire into it is the exceptionally simple
one of small integer array
there are other simple powerful
subsystems in general simple systems
there are more primitives
out there that ought to run faster we have yeah
absolutely I'd like to I'd like to
some of those some
of the stuff that underlies set
set theory
really be useful yeah well as a
would have the same kind of disadvantage than 55-thousand he had that
programmer would realize that he needed it
but
I think there are a few
simple systems
on a different style than small
integers that would turn out to be the right ones
or a lot of what we do but
I think that you know the this
whole coprocessor thing it's not done very well right now
I don't
I don't think that's right I think
it should be you know you ought
to be a mother s you
that there are that
there are coherent small
simple systems will provide useful basis for things
address arithmetic is really where all the action
is going on and all the stuff small pogrom
is okay but there are a few other things like
hashing functions okay those
only become address or it would take through the contortions
or not why not pick one that's that's Universalist
and build it in reverse so that hashing
is a cheap thing just like indexing
okay think of the power ahead gets you also
reverse rotations for lost
transforms and teas and stuff we
know my list of what I came to Atari I first
started thinking about designing a processor
then I gradually dawned on me that a processor
is actually very low on the totem pole priorities
priorities I came up with us
at the number one in the number two priorities
have to do with sort of frame buffer
and graphics
chip a deadly embrace
between those two because they
trade off against each other so much number three is doing
storage management
ell
you one thing we trade trade this chip and
no storage manager for a 286
the storage manager chip any day of the week
and then finally down here is something
like this is a reasonable CPU that
will actually do something for you one of these
interesting things you can get
efficiency here and then give up a factor of 200
once you start swapping if you haven't worked
out a thing so it's it's a question of where
getting probably these things is that there's so much easier to understand
than these things that's
why no hardly anybody works on these it's nobody
said nobody really understand you build one of these things and
you don't build it programmable you're dead