Monday, February 22, 2010

Here is an interpreter for 01_ that I wrote in 01_. It compiles the 01_ source code into a bit code, and then interprets that bit code.

One limitation is that the top-level function evaluated is the first function defined in the source, instead of the function specified on the command line or inferred from the filename. This could easily be coded to be, in a fashion, specified from the command line. This would be somewhat awkward, since the 01_ code does not get the actual command line data, but rather, the contents of the file named in the command line, so the name of the function to be evaluated would have to be put into a file or piped into stdin to be received by the interpreter.

Another limitation is that it can only pass in a maximum of 9 arguments to the top level function. (Any additional arguments will be given nil.) This limit can be easily changed by changing the arglist function (and the interp and bitcode functions) to take a different number of arguments.
Bit code format

Here is the bit code format used by my 01_ interpreter.

Parenthesis indicates a list. Ellipses indicates zero or more of the
preceding item. A vertical bar indicates alternatives.

program = (name ones names functions)
names = (name name ...)
functions = (definitions definitions ...)
definitions = (definition definition ...)
definition = (patterns body)
patterns = (pattern ...)
pattern = bound-pattern | literal-pattern | wild-pattern
bound-pattern = 0 bits
literal-pattern = 1 0 bits
wild-pattern = 1 1 bits
bits = bit ...
bit = 0 | 1
ones = 1 ...
body = op op ...
op = push | concat
push = 0 value
concat = 1 value
value = literal | binding | funcall
binding = 0 ones
literal = 1 0 bits
funcall = 1 1 (bits ones)

The program has the name of the function to apply, the number of arguments that function takes, a pair of lists forming an association list of names to functions.

A push op pushes its value onto the local stack to be used as an argument to a function call. A concat op concatenates its result to the value that is returned by the current function.

A binding value has an index to the bound parameters of the function. Let n = the number of bound parameters, then 0 is the last parameter, and n - 1 is the first parameter.

A funcall value has the name of the function and the number of arguments that function takes.

All the counts, the binding index and the number of arguments taken by a function, are in base 1.
Library functions


== library

== force a pattern match failure
error 0 = error _.

concat a b = a b.

== lists

== first item in the list
head 0. = _.
head 11rest = 1 head rest.
head 10rest = 0 head rest.
head _ = _.

== the list with the first item removed
tail 0rest = rest.
tail 11rest = tail rest.
tail 10rest = tail rest.
tail _ = _.

list-encode _ = 0.
list-encode 1x = 11 list-encode x.
list-encode 0x = 10 list-encode x.

== association lists

== association list lookup
alist-lookup key klist dlist =
if-equal key head klist
head dlist
alist-lookup key tail klist tail dlist.
alist-lookup . _ . = _.

== association list update
alist-update key data klist dlist =
if-equal key head klist
concat list-encode data
tail dlist
concat list-encode head dlist
alist-update key data
tail klist
tail dlist.

alist-append key data klist dlist =
if-equal key head klist
concat list-encode concat head dlist
list-encode data
tail dlist
concat list-encode head dlist
alist-append key data
tail klist
tail dlist.

== stacks

== new empty stack
empty-stack = 0_.

== number of items in the stack (in base 1)
stack-count 0stack = _.
stack-count 1stack = 1 stack-count stack.

drop-stack-count 0stack = stack.
drop-stack-count 1stack = drop-stack-count stack.

== top of the stack
top 0. = _. == stack underflow
top stack = top' decrement-count stack-count stack drop-stack-count stack.

top' count 0stack-data = _.
top' count 10stack-data = 0 top' count drop-stack-bit count stack-data.
top' count 11stack-data = 1 top' count drop-stack-bit count stack-data.

decrement-count 1count = count.

drop-stack-bit _ stack-data = stack-data.
drop-stack-bit 1count 0stack-data = drop-stack-bit count stack-data.
drop-stack-bit 1count 10stack-data = drop-stack-bit count stack-data.
drop-stack-bit 1count 11stack-data = drop-stack-bit count stack-data.

== returns stack with data pushed on top
push data stack =
1 stack-count stack 0 push' data stack-count stack drop-stack-count stack.

push' _ count stack-data =
0 take-stack-bit count stack-data
push' _ count drop-stack-bit count stack-data.
push' 0data count stack-data =
10 take-stack-bit count stack-data
push' data count drop-stack-bit count stack-data.
push' 1data count stack-data =
11 take-stack-bit count stack-data
push' data count drop-stack-bit count stack-data.

take-stack-bit _ . = _.
take-stack-bit 1count 0stack-data = 0 take-stack-bit count stack-data.
take-stack-bit 1count 10stack-data = 10 take-stack-bit count stack-data.
take-stack-bit 1count 11stack-data = 11 take-stack-bit count stack-data.

== returns stack with the top item popped
pop 0. = 0_. == stack underflow
pop 10. = 0_. == 1 item optimization
pop 1stack = stack-count stack 0 pop' stack-count stack drop-stack-count stack.

pop' count 0stack-data =
take-stack-bit count stack-data
pop' count drop-stack-bit count stack-data.
pop' count 10stack-data =
take-stack-bit count stack-data
pop' count drop-stack-bit count stack-data.
pop' count 11stack-data =
take-stack-bit count stack-data
pop' count drop-stack-bit count stack-data.

== conditional
if-equal 0x 0y true false = if-equal x y true false.
if-equal 1x 1y true false = if-equal x y true false.
if-equal _ _ true . = true.
if-equal . . . false = false.

==
take-bits _ . = _.
take-bits . _ = _.
take-bits 0bits 0data = 0 take-bits bits data.
take-bits 1bits 0data = 0 take-bits bits data.
take-bits 0bits 1data = 1 take-bits bits data.
take-bits 1bits 1data = 1 take-bits bits data.

drop-bits _ data = data.
drop-bits . _ = _.
drop-bits 0bits 0data = drop-bits bits data.
drop-bits 1bits 0data = drop-bits bits data.
drop-bits 0bits 1data = drop-bits bits data.
drop-bits 1bits 1data = drop-bits bits data.

Tokenizer


== 01_ tokenizer
== token tokenized
== 0 0
== 1 1
== _ 00
== . 01
== = 10

== input stream to list of tokens
tokenize in = tokenize' in _.

tokenize' _ . = _.

== comment
tokenize' 00111101 00111101in flag = end-symbol flag
drop-comment in.

== 0
tokenize' 00110000in flag = end-symbol flag
100 tokenize' in _.

== 1
tokenize' 00110001in flag = end-symbol flag
110 tokenize' in _.

== _
tokenize' 01011111in flag = end-symbol flag
10100 tokenize' in _.

== .
tokenize' 00101110in flag = end-symbol flag
10110 tokenize' in _.

== =
tokenize' 00111101in flag = end-symbol flag
11100 tokenize' in _.

== whitespace
tokenize' 00100000in flag = end-symbol flag tokenize' in _. == SPC
tokenize' 00001001in flag = end-symbol flag tokenize' in _. == TAB
tokenize' 00001010in flag = end-symbol flag tokenize' in _. == LF
tokenize' 00001101in flag = end-symbol flag tokenize' in _. == CR

== symbol
tokenize' in . = take-symbol 00000000 in.

end-symbol _ = _.
end-symbol . = 0.

take-symbol _ in = tokenize' in 0.
take-symbol 0bits 0in = 10 take-symbol bits in.
take-symbol 0bits 1in = 11 take-symbol bits in.

== comment
drop-comment _ = _.
drop-comment 00001010in = tokenize' in _.
drop-comment in = drop-comment drop-bits 00000000 in.


==
test-tokenize in = test-tokenize-out tokenize in _.

test-tokenize-out _ . = _.
test-tokenize-out list flag = test-tokenize-out-one head list tail list flag.

test-tokenize-out-one 0_ list . = 00110000 test-tokenize-out list _.
test-tokenize-out-one 1_ list . = 00110001 test-tokenize-out list _.
test-tokenize-out-one 00_ list . = 01011111 test-tokenize-out list _.
test-tokenize-out-one 01_ list . = 00101110 test-tokenize-out list _.
test-tokenize-out-one 10_ list . = 00111101 test-tokenize-out list _.

test-tokenize-out-one one list flag = test-tokenize-out-sep flag one test-tokenize-out list 0.

test-tokenize-out-sep _ = _.
test-tokenize-out-sep . = 00100000.

Parser


== parse a list of 01_ tokens into a list of function names

parse-names tokens = parse-names' head tokens tail tokens _.

parse-names' _ . list = list.
parse-names' 0_ . . = error _. == expecting symbol
parse-names' 1_ . . = error _. == expecting symbol
parse-names' 00_ . . = error _. == expecting symbol
parse-names' 01_ . . = error _. == expecting symbol
parse-names' 10_ . . = error _. == expecting symbol
parse-names' name tokens list =
parse-names' head parse-drop-def head tokens tail tokens
tail parse-drop-def head tokens tail tokens
add-unique name list.

add-unique item _ = list-encode item.
add-unique item list = list-encode head list
if-equal item head list
tail list
add-unique item
tail list.

parse-drop-def 10_ tokens = parse-drop-def-body head tokens tail tokens.
parse-drop-def . tokens = parse-drop-def head tokens tail tokens.

parse-drop-def-body 01_ tokens = tokens.
parse-drop-def-body . tokens = parse-drop-def-body head tokens tail tokens.

test-parse-names in = test-tokenize-out parse-names tokenize in _.

== given a list of items, return a list of empty lists of equal length
== this can be used to initialize association lists
== keys cannot be nil
alist-empty-values keys = alist-empty-values' head keys tail keys.

alist-empty-values' _ . = _.
alist-empty-values' . keys = 0 alist-empty-values' head keys tail keys.

== collect arities from list of 01_ tokens
parse-arities names tokens = parse-arities' names
alist-empty-values names
head tokens
tail tokens.

parse-arities' names arities _ . = arities.
parse-arities' names arities name tokens =
parse-arities-count-args names arities name _
head tokens
tail tokens
_.

parse-arities-count-args names arities name count _ tokens flag = error _.
== unexpected EOF

parse-arities-count-args names arities name count 10_ tokens _ =
== got = without any bits
parse-arities' names
alist-update name count names arities
head parse-drop-def-body head tokens tail tokens
tail parse-drop-def-body head tokens tail tokens.

parse-arities-count-args names arities name count 10_ tokens . =
== got = with leading bits
parse-arities' names
alist-update name
concat 1 count
names arities
head parse-drop-def-body head tokens tail tokens
tail parse-drop-def-body head tokens tail tokens.

parse-arities-count-args names arities name count 0_ tokens . =
parse-arities-count-args names arities name count
head tokens
tail tokens
0.

parse-arities-count-args names arities name count 1_ tokens . =
parse-arities-count-args names arities name count
head tokens
tail tokens
0.

parse-arities-count-args names arities name count . tokens . =
parse-arities-count-args names arities name
concat 1 count
head tokens
tail tokens
_.

test-parse-arities in = test-parse-arities' parse-names tokenize in
tokenize in.
test-parse-arities' names tokens =
test-alist-out names parse-arities names tokens.

test-alist-out _ . = _.
test-alist-out klist dlist =
head klist
00111101
test-bits-out head dlist
00001010
test-alist-out tail klist tail dlist.

test-bits-out _ = _.
test-bits-out 0b = 00110000 test-bits-out b.
test-bits-out 1b = 00110001 test-bits-out b.

== parse
parse tokens = parse' parse-names tokens tokens.
parse' names tokens = parse'' names parse-arities names tokens tokens.
parse'' names arities tokens =
list-encode head names
list-encode head arities
list-encode names
list-encode parse-functions names arities
alist-empty-values names
tokens.

parse-functions names arities functions _ = functions.

parse-functions names arities functions tokens =
parse-functions names arities
alist-append head tokens
parse-def-args names arities _ _ _
head tail tokens
tail tail tokens
names
functions
parse-drop-def head tokens tail tokens.

parse-def-args names arities bindings bits patterns 10_ tokens = == =
if-equal bits _
list-encode patterns
list-encode concat patterns
list-encode concat 10 bits == literal pattern
parse-def-body names arities bindings _ _ _ head tokens tail tokens.

parse-def-args names arities bindings bits patterns 0_ tokens = == 0
parse-def-args names arities bindings
concat bits 0
patterns
head tokens
tail tokens.

parse-def-args names arities bindings bits patterns 1_ tokens = == 1
parse-def-args names arities bindings
concat bits 1
patterns
head tokens
tail tokens.

parse-def-args names arities bindings bits patterns 00_ tokens = == _
== literal pattern
parse-def-args names arities bindings
_
concat patterns
list-encode concat 10 bits
head tokens
tail tokens.

parse-def-args names arities bindings bits patterns 01_ tokens = == .
== wild pattern
parse-def-args names arities bindings
_
concat patterns
list-encode concat 11 bits
head tokens
tail tokens.

parse-def-args names arities bindings bits patterns arg tokens =
== bound pattern
parse-def-args names arities
concat list-encode arg
bindings
_
concat patterns
list-encode concat 0 bits
head tokens
tail tokens.

parse-def-body names arities bindings argcounts funcalls bits _ . = error _.

parse-def-body names arities bindings argcounts funcalls bits 0_ tokens =
parse-def-body names arities bindings
argcounts funcalls
concat bits 0
head tokens
tail tokens.

parse-def-body names arities bindings argcounts funcalls bits 1_ tokens =
parse-def-body names arities bindings
argcounts funcalls
concat bits 1
head tokens
tail tokens.

parse-def-body names arities bindings argcounts funcalls _ 01_ . = _.
== . end of definition

parse-def-body names arities bindings argcounts funcalls _ 00_ tokens =
== _ nil constant
list-encode if-equal argcounts _
110_ == concat nil
010 == push nil
check-argcounts argcounts funcalls
head argcounts
parse-def-body names arities bindings
decrement-argcounts argcounts head argcounts
decrement-funcalls argcounts funcalls head argcounts
_
head tokens
tail tokens.

parse-def-body names arities bindings argcounts funcalls _ token tokens =
== symbol: either binding or funcall
parse-def-try-binding names arities bindings argcounts funcalls tokens
token bindings _
alist-lookup token names arities.

parse-def-body names arities bindings argcounts funcalls _ token tokens =
== symbol: either binding or funcall
parse-def-try-binding names arities bindings argcounts funcalls
tokens
token bindings _
alist-lookup token names arities.

parse-def-body names arities bindings argcounts funcalls bits 00_ tokens =
== _ terminated constant
list-encode if-equal argcounts _
concat 110 bits == concat nil
concat 010 bits == push nil
check-argcounts argcounts funcalls
head argcounts
parse-def-body names arities bindings
decrement-argcounts argcounts head argcounts
decrement-funcalls argcounts funcalls head argcounts
_
head tokens
tail tokens.

parse-def-body names arities bindings argcounts funcalls bits token tokens =
== end of constant
list-encode if-equal argcounts _
concat 110 bits == concat nil
concat 010 bits == push nil
check-argcounts argcounts funcalls
head argcounts
parse-def-body names arities bindings
decrement-argcounts argcounts head argcounts
decrement-funcalls argcounts funcalls head argcounts
_
token
tokens.

parse-def-try-binding names arities bindings argcounts funcalls tokens
name _ . _ =
== not in bindings, nullary funcall
list-encode if-equal _ argcounts
concat 111 concat list-encode name 0
concat 011 concat list-encode name 0
check-argcounts argcounts funcalls
head argcounts
parse-def-body names arities bindings
decrement-argcounts argcounts head argcounts
decrement-funcalls argcounts funcalls head argcounts
_
head tokens
tail tokens.

parse-def-try-binding names arities bindings argcounts funcalls tokens
name _ . arity =
== not in bindings, push funcall
parse-def-body names arities bindings
concat list-encode arity
argcounts
concat list-encode concat list-encode name
list-encode arity
funcalls
_
head tokens
tail tokens.

parse-def-try-binding names arities bindings argcounts funcalls tokens
name bind-list bind-index arity =
if-equal name head bind-list
parse-def-emit-bound names arities bindings
argcounts funcalls tokens
bind-index
parse-def-try-binding names arities bindings
argcounts funcalls tokens
name
tail bind-list
concat 1 bind-index
arity.

parse-def-emit-bound names arities bindings argcounts funcalls tokens
bind-index =
list-encode if-equal _ argcounts
concat 10 bind-index == concat
concat 00 bind-index == push
check-argcounts argcounts funcalls
head argcounts
parse-def-body names arities bindings
decrement-argcounts argcounts head argcounts
decrement-funcalls argcounts funcalls head argcounts
_
head tokens
tail tokens.

end-constant argcounts _ = _.
end-constant argcounts bits =
list-encode if-equal argcounts _
concat 110 bits == concat constant
concat 010 bits. == push constant

check-argcounts _ . . = _.
check-argcounts argcounts funcalls _ =
list-encode concat if-equal _ tail argcounts
111_ == concat funcall
011 == push funcall
head funcalls
check-argcounts tail argcounts tail funcalls
head tail argcounts.
check-argcounts argcounts funcalls 1_ =
list-encode concat if-equal _ tail argcounts
111_ == concat funcall
011 == push funcall
head funcalls
check-argcounts tail argcounts tail funcalls
head tail argcounts.
check-argcounts . . . = _.

decrement-argcounts _ . = _.
decrement-argcounts argcounts _ =
decrement-argcounts tail argcounts head tail argcounts.
decrement-argcounts argcounts 1_ =
decrement-argcounts tail argcounts head tail argcounts.
decrement-argcounts argcounts 1argcount =
list-encode argcount
tail argcounts.

decrement-funcalls _ . . = _.
decrement-funcalls argcounts funcalls _ =
decrement-funcalls tail argcounts tail funcalls head tail argcounts.
decrement-funcalls argcounts funcalls 1_ =
decrement-funcalls tail argcounts tail funcalls head tail argcounts.
decrement-funcalls . funcalls . = funcalls.

Bit code interpreter


== 01_ bitcode interpreter

bitcode bitcode arg arg2 arg3 arg4 arg5 arg6 arg7 arg8 arg9 =
interpret-bitcode bitcode
arglist arg arg2 arg3 arg4 arg5 arg6 arg7 arg8 arg9.

interpret-bitcode bitcode args =
apply head tail tail bitcode == list of function names
head tail tail tail bitcode == list of functions
alist-lookup head bitcode == main function name
head tail tail bitcode == list of function names
head tail tail tail bitcode == list of functions
push-arguments head tail bitcode == main function arity
args
empty-stack.

arglist arg arg2 arg3 arg4 arg5 arg6 arg7 arg8 arg9 =
list-encode arg list-encode arg2 list-encode arg3
list-encode arg4 list-encode arg5 list-encode arg6
list-encode arg7 list-encode arg8 list-encode arg9.

push-arguments _ . stack = stack.
push-arguments 1count args stack =
push-arguments count
tail args
push head args
stack.

apply fnames functions function args =
match fnames functions args
head function
tail function.

match fnames functions args _ defs = error _. == pattern match failed

match fnames functions args def defs =
match2 fnames functions args defs
head head def == first pattern
tail head def == remaining patterns
tail def == body
empty-stack == bindings
args.

match2 fnames functions saved-args defs
_ patterns body bindings args
= eval fnames functions bindings
head body
tail body
empty-stack.

== binding
match2 fnames functions saved-args defs
0pattern-bits patterns body bindings args =
if-match pattern-bits
top args
match2 fnames functions saved-args defs
head patterns
tail patterns
body
push drop-head pattern-bits
top args
bindings
pop args
match fnames functions saved-args
head defs
tail defs.

== literal
match2 fnames functions saved-args defs
10pattern-bits patterns body bindings args =
if-equal pattern-bits
top args
match2 fnames functions saved-args defs
head patterns
tail patterns
body
bindings
pop args
match fnames functions saved-args
head defs
tail defs.

== wild
match2 fnames functions saved-args defs
11pattern-bits patterns body bindings args =
if-match pattern-bits
top args
match2 fnames functions saved-args defs
head patterns
tail patterns
body
bindings
pop args
match fnames functions saved-args
head defs
tail defs.

eval fnames functions bindings _ ops stack = _.

== push bound expr
eval fnames functions bindings
00index ops stack =
eval fnames functions bindings
head ops
tail ops
push get-binding index bindings
stack.

== concat bound expr
eval fnames functions bindings
10index ops 0. =
get-binding index bindings
eval fnames functions bindings
head ops
tail ops
empty-stack.

get-binding _ bindings = top bindings.
get-binding 1index bindings = get-binding index pop bindings.

== push literal expr
eval fnames functions bindings
010bits ops stack =
eval fnames functions bindings
head ops
tail ops
push bits stack.

== concat literal expr
eval fnames functions bindings
110bits ops 0. =
bits
eval fnames functions bindings
head ops
tail ops
empty-stack.

== push funcall expr
eval fnames functions bindings
011funcall ops stack =
eval fnames functions bindings
head ops
tail ops
push apply fnames functions
alist-lookup head funcall
fnames functions
pull-args head tail funcall
stack empty-stack
pop-args head tail funcall
stack.

== concat funcall expr
eval fnames functions bindings
111funcall ops stack =
apply fnames functions
alist-lookup head funcall
fnames functions
pull-args head tail funcall
stack empty-stack
eval fnames functions bindings
head ops
tail ops
pop-args head tail funcall
stack.

pull-args _ . args = args.
pull-args 1count stack args =
pull-args count
pop stack
push top stack
args.

pop-args _ stack = stack.
pop-args 1count stack =
pop-args count
pop stack.

if-match 0x 0y true false = if-match x y true false.
if-match 1x 1y true false = if-match x y true false.
if-match _ . true . = true.
if-match . . . false = false.

drop-head 0x 0y = drop-head x y.
drop-head 1x 1y = drop-head x y.
drop-head _ y = y.

Top-level


interp in arg arg2 arg3 arg4 arg5 arg6 arg7 arg8 arg9 =
interpret-bitcode parse tokenize in
arglist arg arg2 arg3 arg4 arg5 arg6 arg7 arg8 arg9.

compile in = parse tokenize in 0000000. == pad to at least an even byte

Code size comparisons

Since I've implemented multiple 01_ interpreters, here's a comparison of the source code sizes:
  • Haskell - 216 lines, 9k
  • Python - 424 lines, 12k
  • 01_ - 746 lines, 25k
  • Java (slow original version) - 782 lines, 25k
  • Java (fast new version including compiler to Java) - 1188 lines, 34k
  • C - 1527 lines, 37k

No comments:

Post a Comment