Monday, June 29, 2009

The extreme conservatism where I work had just about every Java project using JDK 1.4 until 2008, even though JDK 1.5 came out in 2003. New projects in 2008 started using JDK 1.6, but the existing projects continue to use JDK 1.4.

The best new feature of JDK 1.5 is generics. Most other people where I work don't seem to use generics beyond the collections classes, but even only that is huge improvement over JDK 1.4. I've only used generics beyond the collections classes sparingly at work -- just a handful of generic methods, where they make things simpler. I've made more extensive use of generics in my personal projects.

In JDK 1.4, one choice I've often had to make was whether an interface should use an array or a List. Arrays provide compile-time type checking, but Lists were usually much nicer to deal with. I usually decided on the array, and then using List.toArray() when passing or returning values. With generics, using a List is the clear choice.

Another thing is that, due a type-safety issue, arrays of parameterized types can't be instantiated, although they can be declared. A List of a parameterized type is just fine.
So,

C<T>[] array = new C<T>[size];

is illegal, but

List<C<T>> list = new List<C<T>>();

is fine. Of course, if, for some reason, an array really is needed,

class D extends C<T> { ... }

then an array of D can be used. (In all of these examples, T is a real class, and not a type parameter.)

Friday, June 26, 2009

Here is a program in the 01_ programming language that I wrote in my spare time. I then obfuscated it by removing all the comments and renaming all the identifiers. I don't expect anyone will want to guess what it does.

obfuscated ( )=' ~ )____(.
[0(0)=0[ ( ).[0(1)=1[ ( ).[..=_., [[[ [=[[[ [.
]0(0)=] ( ).]0(1)=] ( ).]1(0)=] ( ).]1(1)=] ( ).]_]]=]].
{ -=/ -_./__=_./00-_=00[00000000-./00( )=00[00000000( / ]00000000( ).
/01( )=01/ ( ,0)./11._=_./11.0_=11./11(0)=11/ ( ).} -=] { - -.
:01-=| -.|11_=_.| -={ - | } -.
~_=_.~00100000-=~ -.~00001001-=~ -.~00001010-=~ -.~00001101-=~ -.
~00100011-=# -.#_=_.#00001010-=~ -.# -=# ]00000000-.
~01100000-=0001100000~ -.~01101011-=0001101011~ -.~01110011-=0001110011~ -.
~01101001-=0001101001~ -.~01110110-=0001110110~ -.~01100011-=0001100011~ -.
~01100100-=0001100100~ -.~01110010-=010000101110000000101011~ -.
~00101110-=01000010111000[00000000-11~ ]00000000-.
~01100101-=0001100101~ -.~01000000-=0001000000~ -.
~00111111-=01000011111100[00000000-11~ ]00000000-.
~01111100-=0001111100~ -.~ -=~ ]00000000-.
^_.=_.^._=_.^ (0001100000)=0001100000^ ,0( ).
^0(01000010111000)=01000010111000[0000000000) ^ ( ]0000000000).
^0(01000011111100)=01000011111100[0000000000) ^ ( ]0000000000).
^0( )=[0000000000) ^ ( ]0000000000).
?_11=0001101001.?0(0)=? ( ).?1(1)=? ( ).?..=0001110110.'_.._..=_.
' (0001100100.0001100000.- + )=' ] ^0( ( ,01000100010001, ^0(1111{ - } - + ).
' (0001100000.- + ~ )=' } ( { (0001100000, - + ~ ).
' ( -0001100000.+ ~ )=' } ( { ( - ,0001100000+ ~ ).
' ( - +0001100000~ ^ )=` ( - + ~ ^ ).' ( - + ~ ^ )=' } ( { ( - , + ~ ^ ).
` ( -0001101011.+ ~ )=' ( ,010001001011, -11{ + } + ~ ).
` (.010001001011- + ~ )=' ( { - { + } + ~ ).
` ( -0001110011.+ ~ )=' ( ,010001010011, -11{ + } + ~ ).
` ( -010001010011+ ~ ^ )=' ( ,010011110011, - + { ~ } ~ ^ ).
` ( -010011110011+ ~ ^ )=' ( - { } + ,0001100000,010011010011,
{ + , - ,110001100000~ ^ ).
` ( -010011010011+ ~ ^ )=' ( { } + { + ,0001100000, - ,0001100000~ ^ ).
` ( -01000010111000+ ~ ^ )=[00000000+ ' ( - { ~ } ~ ^ ).
` ( -0001101001.+ ~ )=' ( - { + } + ~ ).
` (.0001110110.- + )=' (0001110110{ - } - + ).
` ( -0001100011.+ ~ )=' ( ,01000100001101, ( ,1101, +1111- ,0001100000+ ~ ).
`.(010001000011-.+ )=' : { - ( { : { } - } : { } - + ).
` ( -0001100100.+ ~ )=' ( - { + } + ~ ).
` ( -010001000100+ ~ ^ )=' , : { + ( ,010011100100, -11_0001100000~ ^ ).
` ( -010011100100+ ~ ^ )=' ( { + - ,0001100000~ ^ ).`..0001100101....=_.
` ( -0001000000.)._=' (0001110110- ,0001100000)__.
` ( -0001000000.+.)=' (0001101001- ,0001100000+ [00000000) ]00000000).
` ( -01000011111100.+_)=' (0001110110- ,0001100000+_).
` ( -01000011111100+ ~ ^ )=' ( ? ^ + - ,0001100000~ ^ ).
` ( -0001111100.+_)=' (0001110110- ,0001100000+_).
` ( -0001111100.+ ~ )=' ( ,01000010111000, ~11- ,0001100000+ ~ ).

Wednesday, June 24, 2009

I really like having source code available when diagnosing code problems. Of course, the source code for my company's code is available. However, source code is not always available for third-party libraries, and is rarely available for remote services.

For third-party Java libraries, it's possible to use javap to disassemble the code, which is something I've done a number of times. Sometimes the class files are obfuscated, so the class names are 0.0 and 0.1, and the method names are 000001 and 000002. In those cases, it's generally not worth it to go through the disassembled code.

For remote services, the usual thing to do is to dump the network traffic with tcpdump. It can show if our code is sending something wrong. But when SSL is being used, it's not as easy.

The good thing about Apache code and a lot of other free code is that the source code is available in publicly readable source control repositories, which means the revision history is available, which is also helpful.

There usually is documentation, but it can be ambiguous and there can be undocumented behavior. Having the source code usually more than makes up for shortcomings in the documentation.

When I look at third-party source code, it's usually not because I think there's something wrong with it. It's usually because there's something wrong with how I'm using it, and looking at the source leads to learning how to use it correctly.

Monday, June 22, 2009

I've written that I think just about any logging can be useful, and that I disagree with removing logging just because they clutter the logs, which is something some of my coworkers have done from time to time. I don't disagree with their removal of log messages in general as much as I disagree with their removal of the logging of stack traces, though. Whenever I have to debug something and run into an exception with an unlogged stack trace, I restore the logging of the stack trace of just about every exception that is caught in that file. Back to the topic of just about any logging being useful, though.

A few years ago, I used the logs to do quick and dirty profiling, since each log message included the thread and a timestamp to the millisecond. By looking at the time differentials between the log messages, I could see what code segments were taking the most time. In that case, most of the time spent waiting for HTTP responses, which was actually rather predictable.

The point remains, which is that just about any log message could contain information that would turn out to be useful in some unforeseen future situation.

I prefer my log messages to contain information that is likely to be useful, so that's why log statements that I write tend to be like

log("variable1="+variable1+",variable2="+variable2+",variable3="+variable3);

and don't have the verbiage that log statements written by other people have.

Friday, June 19, 2009

After a day that included a 3 hour meeting to make estimates on some ambiguously specified work that might be done weeks or months in the future and would probably change a bunch of times before then, I went home and got my 01_ interpreter to work. The meeting was long and boring, but having work lined up for the future is a good thing.

When I removed the memoizing from the interpreter, it worked correctly, though slowly. I eventually figured out that the problem had something to do with list concatenation of delayed values and pared down the code that reproduced the problem to

print _ = _.
print 0x = 00110000 print x.

== This should never be reached unless there is a bug in the interpreter.
print _ = 00101000 01100010 01110101 01100111 00101001.

concat x y = x y.

nil = _.

bug = print concat concat 0 nil 0_ 00001010.

which would print out 00 if the interpreter were working correctly. It printed out 0(bug) because the argument to print was a delayed value when forced the first time evaluated correctly to 0, so the first print _ = _. didn't match. However, when it was forced again, it evaluated to nil, so the print 0x = ... didn't match either, so it got to the second print _ = ....

Finally, I traced the problem to the assumption that the first list in a list concatenation was never nil, and the only way it could be forced to nil is when the list concatenation itself was forced. However, with memoization, the first list could be forced to nil outside of the list concatenation, and then the bug was that the list concatenation evaluated to nil instead of the second list. Once I figured that out, it was easy to fix.

Now the interpreter was fast enough and worked well enough to run a brainfuck interpreter

== brainfuck interpreter
== with 8-bit cells and an infinite array in both directions
brainfuck input code = bf compress code _ zeros zeros input.

== Infinite list of 0
zeros = 0 zeros.

== Compress instructions into 3 bits
compress _ = _.
compress 00111110code = 000 compress code. == >
compress 00111100code = 001 compress code. == <
compress 00101101code = 010 compress code. == -
compress 00101011code = 011 compress code. == +
compress 00101110code = 100 compress code. == .
compress 00101100code = 101 compress code. == ,
compress 01011011code = 110 compress code. == [
compress 01011101code = 111 compress code. == ]
compress code = compress tail code.

== End of code
bf _ . . . . = _.

== >
bf 000instructions previous-instructions data previous-data input =
bf instructions concat 000 previous-instructions
tail data concat head data previous-data input.

== <
bf 001instructions previous-instructions data previous-data input =
bf instructions concat 001 previous-instructions
concat head previous-data data tail previous-data input.

== -
bf 010instructions previous-instructions data previous-data input =
bf instructions concat 010 previous-instructions
concat head - data tail data previous-data input.

== +
bf 011instructions previous-instructions data previous-data input =
bf instructions concat 011 previous-instructions
concat head + data tail data previous-data input.

== .
bf 100instructions previous-instructions data previous-data input =
reverse head data
bf instructions concat 100 previous-instructions data previous-data input.

== ,
== Read EOF
bf 101instructions previous-instructions data previous-data _ =
bf instructions concat 101 previous-instructions
concat 00000000 tail data previous-data _.
== Read 8 bits
bf 101instructions previous-instructions data previous-data input =
bf instructions concat 101 previous-instructions
concat reverse head input tail data previous-data tail input.

== [
== Exit loop
bf 110instructions previous-instructions 00000000data previous-data input =
jump-forward _ instructions concat 110 previous-instructions
concat 00000000 data previous-data input.
== Enter loop
bf 110instructions previous-instructions data previous-data input =
bf instructions concat 110 previous-instructions data previous-data input.

== ]
== Exit loop
bf 111instructions previous-instructions 00000000data previous-data input =
bf instructions concat 111 previous-instructions
concat 00000000 data previous-data input.
== Continue loop
bf 111instructions previous-instructions data previous-data input =
jump-backward _ concat 111 instructions previous-instructions
data previous-data input.

== Reverse bits
reverse _ = _.
reverse 0bits = reverse bits 0.
reverse 1bits = reverse bits 1.

== First 8/3 bits
head bits = take 00000000 bits.
head3 bits = take 000 bits.
take 0counter 0bits = 0 take counter bits.
take 0counter 1bits = 1 take counter bits.
take . . = _.

== Drop first 8/3 bits
tail _ = _.
tail bits = drop 00000000 bits.
tail3 bits = drop 000 bits.
drop 0counter 0bits = drop counter bits.
drop 0counter 1bits = drop counter bits.
drop . bits = bits.

== Concatenate 2 lists
concat a b = a b.

== Skip out of loop
jump-forward _ 111instructions previous-instructions data previous-data input =
bf instructions concat 111 previous-instructions data previous-data input.
jump-forward 0nesting 111instructions previous-instructions
data previous-data input =
jump-forward nesting instructions concat 111 previous-instructions
data previous-data input.
jump-forward nesting 110instructions previous-instructions
data previous-data input =
jump-forward concat 0 nesting instructions
concat 110 previous-instructions data previous-data input.
jump-forward . _ . . . . = _.
jump-forward nesting instructions previous-instructions
data previous-data input =
jump-forward nesting
tail3 instructions concat head3 instructions previous-instructions
data previous-data input.

== Jump back to the start of loop
jump-backward _ instructions 110previous-instructions
data previous-data input =
bf instructions concat 110 previous-instructions data previous-data input.
jump-backward 0nesting instructions 110previous-instructions
data previous-data input =
jump-backward nesting concat 110 instructions previous-instructions
data previous-data input.
jump-backward nesting instructions 111previous-instructions
data previous-data input =
jump-backward concat 0 nesting
concat 111 instructions previous-instructions data previous-data input.
jump-backward . . _ . . . = _.
jump-backward nesting instructions previous-instructions
data previous-data input =
jump-backward nesting
concat head3 previous-instructions instructions
tail3 previous-instructions
data previous-data input.

== Increment
+ 0bits = 1 bits.
+ 1bits = 0 + bits.
+ _ = 1.

== Decrement
- 0bits = 1 - bits.
- 1bits = 0 bits.
- _ = _. == An infinite list of 1 would be more correct

The 01_ programming language

The 01_ programming language is an esoteric programming language in which the only data type is a list of bits. 01_ programs consist of function definitions. Functions are made of function calls, literal lists of bits, and list concatenation. There are no variables or side-effects. Pattern matching is used for conditionals. Evaluation is lazy.

Syntax and grammar

Whitespace separates tokens and is otherwise ignored. The characters 0, 1, ., _, and = are special tokens. The character sequence == introduces a comment, where the remaining characters in the line are ignored. Comments also separate tokens. Uninterrupted sequences of all non-whitespace characters other than the special tokens form symbol tokens.

Program = Definition*

Definition = Symbol Pattern* '=' Expression* '.'

Expression = Bit* '_' | Symbol | Symbol Expression*

Pattern = Bit* ( Symbol | '.' | '_' )

Bit = '0' | '1'
In other words, a program consists of a list of definitions. A definition defines a function for a given pattern to evaluate to a list of expressions, which are concatenated as the result of function's evaluation. An expression can be a literal list of bits, a symbol bound in the pattern, or a function call. Functions are right-associative.

A pattern starts with 0 or more literal bits which must be matched, followed by a symbol, which matches any remaining bits and binds the symbol to the remaining bits for the body of the function, or by ., which matches any remaining bits that will be ignored and is equivalent to binding a symbol that is not referenced in the body, or by _, which matches nil, the empty list, which also ends all finite lists.

As shorthand, if the final pattern before the = is at least one literal bit followed by ., the . may be omitted. Likewise, the final _ of an expression of at least one literal bit may be omitted if the expression is followed by a symbol or ., i.e. it is not followed by another literal list of bits.

Functions

A function consists of one or more definitions of a symbol with the same number of arguments. It is an error to add a definition of a symbol with a number of arguments other than the number it was first defined with.

When a function is evaluated, its arguments are matched against the patterns of its definitions in the order that the definitions were provided. The body of the first definition for which all the patterns are matched by their arguments is result. It is an error if none of the definitions match.

Interpreter implementation

Here is an implementation of a 01_ interpreter:

import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Reader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;

/**
* Interpreter for the 01_ language.
* Usage:
* java in01_ [-0 MAXBITS] [-1 INPUTFILE1] [-2 INPUTFILE2] ...
* SRCFILE ... FUNCTION
* Evaluates the specified FUNCTION after reading the specified
* SRCFILEs. The result is written to stdout. If MAXBITS is
* specified, the result is written as a string of 0s and 1s with a
* maximum length of MAXBITS. The first argument, if used, is stdin.
* The second argument, if used, is the file INPUTFILE1 if specified,
* and nil otherwise. The 3rd argument, if used, is the file
* INPUTFILE2, etc.
*/
public class in01_ {
public static void main(String[] args) throws Exception {
HashMap<Integer,String> inputfiles = new HashMap<Integer,String>();
int index = 0;
int maxbits = 0;
if (args.length >= 2 && "-0".equals(args[0])) {
index += 2;
maxbits = Integer.parseInt(args[1]);
}
while (index < args.length - 2) {
int inputfile;
try {
inputfile = Integer.parseInt(args[index]);
} catch (NumberFormatException e) {
break;
}
if (inputfile >= 0)
break;
inputfiles.put(-inputfile, args[index + 1]);
index += 2;
}
HashMap<String,Function> functions = new HashMap<String,Function>();
while (index < args.length - 1) {
readFile(functions, new FileReader(args[index]));
index++;
}
for (Function function : functions.values())
function.parseBodies(functions);
Function function = functions.get(args[args.length - 1]);
if (function == null)
throw new RuntimeException("Undefined symbol: " + args[args.length - 1]);
ArrayList<Value> funargs = new ArrayList<Value>();
if (function.getArity() > 0)
funargs.add(new StreamValue(System.in));
for (int i = 1; i < function.getArity(); i++) {
String inputfile = inputfiles.get(i);
if (inputfile == null)
funargs.add(Value.NIL);
else
funargs.add(new StreamValue(new FileInputStream(inputfile)));
}
if (maxbits > 0)
writeBits(maxbits, function.eval(funargs));
else
writeValue(System.out, function.eval(funargs));
System.out.flush();
}

private static void readFile(HashMap<String,Function> functions, Reader reader) {
DefReader defReader = new DefReader(reader);
for (Def def = defReader.nextDef(); def != null; def = defReader.nextDef()) {
Function function = functions.get(def.getName());
if (function == null)
functions.put(def.getName(), new Function(def));
else
function.addDef(def);
}
}

public static void writeBits(int maxbits, Value value) {
value = forceValue(value);
while (maxbits-- > 0 && value.head() != HeadValue.NIL) {
System.out.print(value.head() == HeadValue.ONE ? '1' : '0');
value = forceValue(value.tail());
}
}

public static void writeValue(OutputStream out, Value value) throws Exception {
int output = 0;
int bit = 7;
value = forceValue(value);
while (value.head() != HeadValue.NIL) {
if (value.head() == HeadValue.ONE)
output |= 1<<bit;
if (bit > 0) {
bit--;
} else {
out.write(output);
bit = 7;
output = 0;
}
value = forceValue(value.tail());
}
}

public static Value forceValue(Value value) {
while (value.head() == HeadValue.DELAYED)
value = value.force();
return value;
}

public static enum HeadValue {
ZERO, ONE, NIL, DELAYED;
}

public static abstract class Value {
public static final Value NIL = new Value() {
public HeadValue head() {
return HeadValue.NIL;
}

public Value tail() {
throw new RuntimeException();
}

public Value force() {
return this;
}
};

public abstract HeadValue head();
public abstract Value tail();
public abstract Value force();
public void makeForced() {
}
}

public static class StreamValue extends Value {
private InputStream in;
private HeadValue head;
private StreamValue tail;
private int bit;
private int inputByte;

private StreamValue(InputStream in, HeadValue head, int bit, int inputByte) {
assert head != HeadValue.NIL;
assert head != HeadValue.DELAYED || bit == 7;
this.in = in;
this.head = head;
this.tail = null;
this.bit = bit;
this.inputByte = inputByte;
}

public StreamValue(InputStream in) {
this(in, HeadValue.DELAYED, 7, 0);
}

public HeadValue head() {
return head;
}

public Value tail() {
assert head != HeadValue.DELAYED && head != HeadValue.NIL;
if (tail != null)
return tail;
if (bit == 0)
tail = new StreamValue(in);
else
tail = new StreamValue(in, bitValue(bit - 1), bit - 1, inputByte);
return tail;
}

private HeadValue bitValue(int bit) {
return (inputByte & (1 << bit)) != 0 ? HeadValue.ONE : HeadValue.ZERO;
}

public Value force() {
if (head != HeadValue.DELAYED)
return this;
try {
inputByte = in.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
if (inputByte < 0)
head = HeadValue.NIL;
else
head = bitValue(bit);
return this;
}
}

public static class LiteralValue extends Value {
private HeadValue head;
private Value tail;

public LiteralValue(Token token, Value tail) {
assert token.getTokenType() == TokenType.ONE || token.getTokenType() == TokenType.ZERO;
head = token.getTokenType() == TokenType.ONE ? HeadValue.ONE : HeadValue.ZERO;
this.tail = tail;
}

public HeadValue head() {
return head;
}

public Value tail() {
return tail;
}

public Value force() {
return this;
}
}

public static class ConcatValue extends Value {
private Value headList;
private Value tailList;
private Value tail = null;

public ConcatValue(Value headList, Value tailList) {
assert headList.head() != HeadValue.NIL;
this.headList = headList;
this.tailList = tailList;
}

public HeadValue head() {
return headList.head() != HeadValue.NIL ? headList.head() : tailList.head();
}

public Value tail() {
assert head() != HeadValue.DELAYED;
if (tail == null) {
if (headList.head() == HeadValue.NIL) {
assert tailList.head() != HeadValue.DELAYED;
return tailList.tail();
} else if (headList.tail().head() == HeadValue.NIL) {
tail = tailList;
} else {
tail = new ConcatValue(headList.tail(), tailList);
}
}
return tail;
}

public Value force() {
headList = headList.force();
if (headList.head() == HeadValue.NIL)
return tailList;
if (!(headList instanceof ConcatValue))
return this;
ConcatValue head = (ConcatValue) headList;
return new ConcatValue(head.headList, new ConcatValue(head.tailList, tailList));
}
}

public static class FuncallValue extends Value {
private Function function;
private ArrayList<Value> args;
private Value forced = null;

public FuncallValue(Function function, ArrayList<Value> args) {
this.function = function;
this.args = args;
}

public HeadValue head() {
if (forced != null)
return forced.head();
return HeadValue.DELAYED;
}

public Value tail() {
if (forced != null)
return forced.tail();
assert false;
throw new RuntimeException();
}

public Value force() {
if (forced != null)
return forced;
return function.eval(args);
}

public void makeForced() {
if (forced == null)
forced = forceValue(this);
}
}

public static abstract class Expr {
public static final Expr NIL = new Expr() {
public Value eval(HashMap<String,Value> bindings) {
return Value.NIL;
}
};

public abstract Value eval(HashMap<String,Value> bindings);
}

public static class LiteralExpr extends Expr {
private Value value;

public LiteralExpr(ArrayList<Token> tokens, int start, int end) {
value = getValue(tokens, start, end);
}

public Value eval(HashMap<String,Value> bindings) {
return value;
}

private Value getValue(ArrayList<Token> tokens, int start, int end) {
if (start >= end)
return Value.NIL;
return new LiteralValue(tokens.get(start), getValue(tokens, start + 1, end));
}
}

public static class ConcatExpr extends Expr {
private Expr head;
private Expr tail;

public ConcatExpr(Expr head, Expr tail) {
this.head = head;
this.tail = tail;
}

public Value eval(HashMap<String,Value> bindings) {
Value headVal = head.eval(bindings);
if (headVal.head() == HeadValue.NIL)
return tail.eval(bindings);
return new ConcatValue(headVal, tail.eval(bindings));
}
}

public static class BoundExpr extends Expr {
private String name;

public BoundExpr(String name) {
this.name = name;
}

public Value eval(HashMap<String,Value> bindings) {
return bindings.get(name);
}
}

public static class FuncallExpr extends Expr {
private Function function;
private ArrayList<Expr> args;

public FuncallExpr(Function function, ArrayList<Expr> args) {
this.function = function;
this.args = args;
}

public Value eval(HashMap<String,Value> bindings) {
ArrayList<Value> argValues = new ArrayList<Value>(args.size());
for (Expr arg : args)
argValues.add(arg.eval(bindings));
return new FuncallValue(function, argValues);
}
}

public static class Function {
private ArrayList<Def> defs = new ArrayList<Def>();

public Function(Def def) {
defs.add(def);
}

public int getArity() {
return defs.get(0).getArity();
}

public void addDef(Def def) {
assert def.getName().equals(defs.get(0).getName());
if (def.getArity() != defs.get(0).getArity())
throw new RuntimeException(def.getName() + " defined as " + def.getArity() + "-ary, originally " + defs.get(0).getArity() + "-ary");
defs.add(def);
}

public void parseBodies(HashMap<String,Function> functions) {
for (Def def : defs)
def.parseBody(functions);
}

public Value eval(ArrayList<Value> args) {
for (Def def : defs) {
Value value = def.eval(args);
if (value != null)
return value;
}
throw new RuntimeException("No matching pattern for " + defs.get(0).getName());
}
}

public static class Def {
private String name;
private ArrayList<Pattern> patterns;
private ArrayList<Token> bodyTokens;
private Expr body = null;

public Def(String name, ArrayList<Pattern> patterns, ArrayList<Token> bodyTokens) {
this.name = name;
this.patterns = patterns;
this.bodyTokens = bodyTokens;
}

public String getName() {
return name;
}

public int getArity() {
return patterns.size();
}

public void parseBody(HashMap<String,Function> functions) {
HashSet<String> bindings = new HashSet<String>();
for (Pattern pattern : patterns)
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.add(pattern.getBinding().getName());
body = new Parser(functions, bindings).getExpr();
if (body == null)
body = Expr.NIL;
}

private class Parser {
private int index = 0;
private HashMap<String,Function> functions;
private HashSet<String> bindings;

public Parser(HashMap<String,Function> functions, HashSet<String> bindings) {
this.functions = functions;
this.bindings = bindings;
}

public Expr getExpr() {
Expr expr = nextExpr();
if (index < bodyTokens.size())
return new ConcatExpr(expr, getExpr());
return expr;
}

public Expr nextExpr() {
if (index >= bodyTokens.size())
return null;
Token token = bodyTokens.get(index);
switch (token.getTokenType()) {
case ONE: case ZERO:
return literalExpr();
case NIL:
index++;
return Expr.NIL;
case SYMBOL:
index++;
if (bindings.contains(token.getName()))
return new BoundExpr(token.getName());
return funcallExpr(token);
default:
assert false;
throw new RuntimeException();
}
}

public LiteralExpr literalExpr() {
int start = index;
while (++index < bodyTokens.size()) {
switch (bodyTokens.get(index).getTokenType()) {
case ZERO: case ONE:
break;
case NIL:
return new LiteralExpr(bodyTokens, start, index++);
default:
return new LiteralExpr(bodyTokens, start, index);
}
}
return new LiteralExpr(bodyTokens, start, bodyTokens.size());
}

public FuncallExpr funcallExpr(Token token) {
Function function = functions.get(token.getName());
if (function == null)
throw new RuntimeException("Undefined symbol: " + token.getName() + " in body of " + name);
ArrayList<Expr> args = new ArrayList<Expr>(function.getArity());
for (int i = 0; i < function.getArity(); i++) {
Expr arg = nextExpr();
if (arg == null)
throw new RuntimeException("Not enough arguments for " + token.getName() + " in body of " + name + ": needs " + function.getArity());
args.add(arg);
}
return new FuncallExpr(function, args);
}
}

public Value eval(ArrayList<Value> args) {
HashMap<String,Value> bindings = new HashMap<String,Value>();
for (int i = 0; i < args.size(); i++) {
Pattern pattern = patterns.get(i);
Value match = pattern.tryMatch(args.get(i));
if (match == null)
return null;
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.put(pattern.getBinding().getName(), match);
}
return body.eval(bindings);
}
}

public static class Pattern {
private Token binding;
private ArrayList<Boolean> bits;

public Pattern(Token binding, ArrayList<Boolean> bits) {
this.binding = binding;
this.bits = bits;
}

public Token getBinding() {
return binding;
}

public Value tryMatch(Value value) {
for (boolean bit : bits) {
value.makeForced();
value = forceValue(value);
switch (value.head()) {
case ZERO:
if (bit)
return null;
break;
case ONE:
if (!bit)
return null;
break;
case DELAYED:
assert false;
default:
return null;
}
value = value.tail();
}
if (binding.getTokenType() == TokenType.NIL) {
value.makeForced();
value = forceValue(value);
return value.head() == HeadValue.NIL ? value : null;
}
return value;
}
}

public static class DefReader {
private Tokenizer tokenizer;

public DefReader(Reader reader) {
this.tokenizer = new Tokenizer(reader);
}

public Def nextDef() {
Token name = tokenizer.nextToken();
if (name == null)
return null;
if (name.getTokenType() != TokenType.SYMBOL)
throw new RuntimeException("Expected symbol, got " + name.getName());
ArrayList<Pattern> patterns = getPatterns();
ArrayList<Token> body = getBody();
return new Def(name.getName(), patterns, body);
}

private ArrayList<Pattern> getPatterns() {
ArrayList<Pattern> patterns = new ArrayList<Pattern>();
ArrayList<Boolean> pattern = new ArrayList<Boolean>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Expected =, got EOF");
switch (token.getTokenType()) {
case ZERO:
pattern.add(false);
break;
case ONE:
pattern.add(true);
break;
case SYMBOL:
for (Pattern p : patterns)
if (token.getName().equals(p.getBinding().getName()))
throw new RuntimeException("Multiple patterns bind " + token.getName());
/*FALLTHROUGH*/
case DOT:
case NIL:
patterns.add(new Pattern(token, pattern));
pattern = new ArrayList<Boolean>();
break;
case EQ:
if (pattern.size() > 0)
patterns.add(new Pattern(Token.DOT, pattern));
return patterns;
}
}
}

private ArrayList<Token> getBody() {
ArrayList<Token> body = new ArrayList<Token>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Got unexpected EOF");
switch (token.getTokenType()) {
case EQ:
throw new RuntimeException("Got unexpected =");
case DOT:
return body;
default:
body.add(token);
break;
}
}
}
}

public static enum TokenType {
ZERO, ONE, DOT, NIL, EQ, SYMBOL;
}

public static class Token {
public static Token ZERO = new Token(TokenType.ZERO, "0");
public static Token ONE = new Token(TokenType.ONE, "1");
public static Token DOT = new Token(TokenType.DOT, ".");
public static Token NIL = new Token(TokenType.NIL, "_");
public static Token EQ = new Token(TokenType.EQ, "=");

private TokenType tokenType;
private String name;

private Token(TokenType tokenType, String name) {
this.tokenType = tokenType;
this.name = name;
}

public Token(String name) {
this(TokenType.SYMBOL, name);
}

public TokenType getTokenType() {
return tokenType;
}

public String getName() {
return name;
}
}

public static class Tokenizer {
private Reader reader;
private int nextChar = -1;

public Tokenizer(Reader reader) {
this.reader = reader;
}

private int getNextChar() {
if (nextChar < 0)
try {
return reader.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
int result = nextChar;
nextChar = -1;
return result;
}

private void pushback(int pushedChar) {
assert nextChar < 0;
nextChar = pushedChar;
}

public Token nextToken() {
for (;;) {
int ch = getNextChar();
if (ch < 0)
return null;
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
break;
case '0':
return Token.ZERO;
case '1':
return Token.ONE;
case '.':
return Token.DOT;
case '_':
return Token.NIL;
case '=':
ch = getNextChar();
if (ch != '=') {
pushback(ch);
return Token.EQ;
}
discardComment();
break;
default:
pushback(ch);
return readSymbol();
}
}
}

private void discardComment() {
for (;;) {
int ch = getNextChar();
if (ch < 0 || ch == '\n')
return;
}
}

private Token readSymbol() {
StringBuilder sb = new StringBuilder();
for (;;) {
int ch = getNextChar();
if (ch < 0)
return new Token(sb.toString());
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
case '0': case '1': case '.': case '_': case '=':
pushback(ch);
return new Token(sb.toString());
default:
sb.append((char) ch);
break;
}
}
}
}
}

Thursday, June 18, 2009

In my down time at work, I was looking at my second 01_ interpreter to see why it was so slow on the 99 bottles of beer. The simplistic built-in Java profiling, enabled with the -Xprof flag, showed that a big chunk of time was spent in pattern matching. Every time a function was called, it would attempt to match the arguments to the patterns, and it would have to reevaluate the argument for the next pattern if the pattern failed to match. Furthermore, recursion would cause giant chains in the delayed function call values that had to be evaluated over and over.

So I changed FuncallValue to memoize its result when evaluated, and the 99 bottles of beer program was fast. However, I then uncovered some hard to diagnose bugs in the interpreter. It still runs all the examples that I've posted so far correctly.

After skimming some of the languages on the Esolang Wiki, I would say that, superficially, 01_, as a lazy, pattern-matching language, is most closely related to pax and Rhotor.

Some more simple example code that runs correctly on my interpreter. Given the definitions:

take 0x 0y = 0 take x y.
take 0x 1y = 1 take x y.
take _ . = _.

drop 0x 0y = drop x y.
drop 0x 1y = drop x y.
drop _ y = y.

concat x y = x y.

reverse 0 x = reverse x 0.
reverse 1 x = reverse x 1.
reverse _ = _.

This reverses the lines of its input:

tac _ = _.
tac in = tac drop-line in first-line in _.

drop-line _ = _.
drop-line 00001010in = in.
drop-line in = drop-line drop 00000000 in.

first-line _ ac = ac.
first-line 00001010in ac = ac 00001010.
first-line in ac = first-line drop 00000000 in concat ac take 00000000 in.

This caesar-encodes its input:

caesar _ = _.
caesar 011x = 011 reverse rotl3 reverse take 00000 x caesar drop 00000 x.
caesar 010x = 010 reverse rotl3 reverse take 00000 x caesar drop 00000 x.
caesar x = take 00000000 x caesar drop 00000000 x.

rotl3 00000 = 00000.
rotl3 11111 = 11111.
rotl3 01111 = 01111.
rotl3 10111 = 10111.
rotl3 00111 = 00111.
rotl3 11011 = 11011.
rotl3 x = rotl rotl rotl rotl rotl rotl rotl rotl rotl rotl rotl rotl rotl x.

rotl 10000 = 01011.
rotl 0x = 1 rotl x.
rotl 1x = 0 x.
rotl _ = _.

Note that the 4th character in rotl3 and rotl is the letter L and is not the number 1.

Wednesday, June 17, 2009

Sometimes when there is not much work to do at work, I write code for fun. During one such recent period, I came up with an idea for a programming language in which the only datatype is a list of bits. It would qualify as an esoteric language.

I decided it would have pattern-matching, lazy evaluation, and be side-effect free. Running a program would be the evaluation of a function where the parameter was standard input. The result of the function would be sent to standard output.

I wanted a minimal syntax. The first few special tokens that I decided on were 0 for the literal 0 bit, 1 for the literal 1 bit, = for introducing the function body in a function definition, . for ending a function body, and _ for the wildcard pattern. Sequences of all other non-whitespace characters would be symbol tokens, and whitespace would separate tokens. I would wind up making a few changes.

With 0 and 1 being tokens, symbols wouldn't be able to contain 0 or 1, which is a bit perverse.

Designing the pattern-matching syntax, I realized that I needed another token for nil, or the empty list. I considered various characters, when I realized that . left of the = could be syntactically distinct from its role of terminating the function body right of the =, so I could use . as nil. Using a period as nil, which also terminates a list, also appealed to me.

Then, I realized that I needed nil in the function body as well. My first thought was to use . and find some other character to terminate the function body, such as ;, as used by C and Pascal. However, I couldn't find a character that I liked as much as . for terminating the function, so I again considered various characters to represent nil. I realized that _ could be used in the function body distinct of its role as the wildcard pattern left of the =. However, it would be perverse and inelegant to have . be nil left of the =, while _ is nil right of the =.

So then I was choosing between using . as nil and _ as the wildcard pattern and the function definition terminator, versus using _ as nil and . as the wildcard pattern and the function definition terminator. The former had the appeal of . as nil, the list terminator, and _ is used by various languages as the wildcard pattern, making that fairly standard. However, I decided on the former, because I really liked . to terminate the function definition, and really disliked _ for that.

Later, I decided I wanted to allow comments. First, I considered # until the end of the line as in the Bourne shell and various other languages. However, partly inspired by -- as in Ada and various other languages, I realized that == would be otherwise syntactically invalid and would not introduce a new special character, so that would be how to add comments.

I decided to call the language 01_. From then on, it was pretty straightforward to design and it was straightforward to write an interpreter, albeit a very inefficient one. The main design decision was that multiple expressions in a function body would be concatenated as the result.

Some examples:

Hello World:
h = 01001000011001010110110001101100011011110010000001110111011011110111001001101100011001000010000100001010.

Self-printing program (It could be 73 characters shorter, but I wanted it to end with a newline):

q=d p d0010111000001010.p0b=00110000p b.p1b=00110001p b.p_=_.d=011100010011110101100100001000000111000000100000011001000011000000110000001100010011000000110001001100010011000100110000001100000011000000110000001100000011000100110000001100010011000000101110011100000011000001100010001111010011000000110000001100010011000100110000001100000011000000110000011100000010000001100010001011100111000000110001011000100011110100110000001100000011000100110001001100000011000000110000001100010111000000100000011000100010111001110000010111110011110101011111001011100110010000111101.

Fibonacci sequence:

fibonacci = 101 + fibonacci dropfirst fibonacci.

dropfirst 1x = x.

+ 0a b = 0 + a b.
+ a 0b = 0 + a b.
+ 1a 1b = 1 + a b.

99 bottles of beer (somewhat obfuscated):

== " of beer"
b = 0010000001101111011001100010000001100010011001010110010101110010.

== This is a cool function. It capitalizes its argument.
2 0113 = 0103.

== " on the wall"
33r = 001000000110111101101110001000000111010001101000011001010010000001110111011000010110110001101100.

== Little-endian modulus 10 decrement, i.e. BCD (binary coded decimal).
4 0000 = 1001.
4 _ = _.
4 02 = 1 4 2.
4 12 = 0 2.

== n bottles
5 0000. 0000 = 01101110011011110010000001101101011011110111001001100101 7 01110011.
5 0000. 1000 = 00110001 7.
5 0000. 3 = 0011 6 3 7 01110011.
5 2 3 = 0011 6 2 0011 6 3 7 01110011.

== Reverse bits.
6 _ = _.
6 02 = 6 2 0.
6 12 = 6 2 1.

== " bottle"
7 = 00100000011000100110111101110100011101000110110001100101.

== Decrement 10s.
8 2 0000 = 4 2.
8 2 . = 2.

== Verse.
9 0000. 0000 = 2 5 0000_0000 b 33r 0010110000100000 5 0000_0000 b 00101110000010100100011101101111001000000111010001101111001000000111010001101000011001010010000001110011011101000110111101110010011001010010000001100001011011100110010000100000011000100111010101111001001000000111001101101111011011010110010100100000011011010110111101110010011001010010110000100000 5 1001_1001 b 33r 0010111000001010.
9 2 3 = 5 2 3 b 33r 0010110000100000 5 2 3 b 001011100000101001010100011000010110101101100101001000000110111101101110011001010010000001100100011011110111011101101110001000000110000101101110011001000010000001110000011000010111001101110011001000000110100101110100001000000110000101110010011011110111010101101110011001000010110000100000 5 8 2 3 4 3 b 33r 001011100000101000001010 9 8 2 3 4 3.

99 = 9 1001_1001.


Here is my first implementation of a 01_ interpreter. It suffers from profligate stack usage and it doesn't allow unused values to be garbage collected. It's kind of a stupid implementation in that sense. I should have used linked lists to allow unused values to become unreferenced and garbage collected. However, it was straightforward to implement and, other than i01_.ConcatValue.getSuperIndex(), all the exercised code worked as intended on the first try. It has only been lightly tested, though.

I put it all in one file because an interpreter for a small language ought to be small. However, at 840 lines, it turned out to be a little bigger than I had envisioned.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;

/**
* Interpreter for the 01_ language.
* Usage:
* java i01_
* Runs a 01_ repl (read-eval-print-loop).
* java i01_ -repl SRCFILE ...
* Runs a 01_ repl after reading the specified source files.
* java i01_ [-1 INPUTFILE1] [-2 INPUTFILE2] ... SRCFILE ... FUNCTION
* Evaluates the specified function after reading the specified
* source files. The first argument, if used, is stdin. The second
* argument, if used, is the file INPUTFILE1. The 3rd argument, if
* used, is the file INPUTFILE2, etc.
*
* The repl reader reads 01_ defs and recognizes non-01_ syntax for
* evaluating an expression:
* = EXPR.
* which evaluates the given expression and prints the result.
*/
public class i01_ {
private static final int MAX_OUTPUT_BITS = 1500;

public static void main(String[] args) throws Exception {
Interpreter interpreter = new Interpreter();

if (args.length != 0) {
boolean repl = false;
int start = 0;
HashMap<Integer,String> inputfiles = null;
if ("-repl".equals(args[0])) {
start = 1;
repl = true;
} else {
inputfiles = new HashMap<Integer,String>();
while (start < args.length - 1) {
try {
int i = Integer.parseInt(args[start]);
if (i >= 0)
break;
inputfiles.put(-i, args[start+1]);
start += 2;
} catch (NumberFormatException e) {
break;
}
}
}
for (int i = start; i < args.length - (repl ? 0 : 1); i++)
readFile(interpreter, args[i]);
if (!repl) {
interpreter.parseBodies();
Function function = interpreter.getFunction(args[args.length-1]);
ArrayList<Value> funargs = new ArrayList<Value>();
if (function.getArity() > 0)
funargs.add(new StreamValue(System.in));
for (int i = 1; i < function.getArity(); i++) {
String inputfile = inputfiles.get(i);
funargs.add(inputfile == null ? Value.NIL : new StreamValue(new FileInputStream(inputfile)));
}
writeAsBytes(function.eval(funargs, "(cmdline):"+function.getName()), System.out);
System.out.flush();
return;
}
}
repl(interpreter, new BufferedReader(new InputStreamReader(System.in)));
}

private static void repl(Interpreter interpreter, BufferedReader in) throws Exception {
Parser parser = new Parser();
for (;;) {
System.out.print(parser.isDefPending() ? "_ " : ". ");
System.out.flush();
String line = in.readLine();
if (line == null)
break;
try {
parser.parseLine(line, "(repl)");
Def def;
while ((def = parser.nextDef()) != null) {
if (def.getName() != null) {
interpreter.addDef(def);
System.out.println("= "+def.getName());
} else {
interpreter.parseBodies();
def.parseBody(interpreter);
System.out.print("= ");
writeAsBits(def.eval(null, "(repl)"), System.out);
System.out.println();
}
}
} catch (EvalException e) {
System.err.println(e.getMessage());
parser.reset();
} catch (ParseException e) {
System.err.println(e.getMessage());
parser.reset();
} catch (Exception e) {
e.printStackTrace();
parser.reset();
}
}
}

private static void readFile(Interpreter interpreter, String filename) throws Exception {
Parser parser = new Parser();
BufferedReader in = new BufferedReader(new FileReader(filename));
int linenumber = 0;
String location = null;
for (;;) {
String line = in.readLine();
if (line == null)
break;
linenumber++;
location = filename + ":" + linenumber;
parser.parseLine(line, location);
try {
Def def;
while ((def = parser.nextDef()) != null) {
if (def.getName() != null)
interpreter.addDef(def);
else
System.err.println(def.getLocation()+": Error: unexpected =");
}
} catch (ParseException e) {
System.err.println(e.getMessage());
}
}
}

private static void writeAsBytes(Value value, OutputStream out) throws Exception {
int shift = 8;
int b = 0;
for (int i = 0; value.notNil(i); i++) {
shift--;
b |= (value.get(i) ? 1 : 0) << shift;
if (shift <= 0) {
out.write(b);
shift = 8;
b = 0;
}
}
}

private static void writeAsBits(Value value, PrintStream out) throws Exception {
for (int i = 0; value.notNil(i); i++) {
out.print(value.get(i) ? "1" : "0");
if (i >= MAX_OUTPUT_BITS) {
out.print("...");
return;
}
}
out.print("_");
}

private static class Interpreter {
private HashMap<String,Function> functions = new HashMap<String,Function>();

public Function getFunction(String name) {
return functions.get(name);
}

public void addDef(Def def) {
Function function = functions.get(def.getName());
if (function == null)
functions.put(def.getName(), new Function(def));
else if (function.getArity() == def.getArity())
function.addDef(def);
else
throw new ParseException(def.getLocation() + ": Error: " + function.getName() + " defined to be " + function.getArity() + "-ary, def is " + def.getArity() + "-ary");
}

public void parseBodies() {
for (Function function : functions.values())
function.parseBodies(this);
}
}

public static class Function {
private int arity;
private String name;
private ArrayList<Def> defs = new ArrayList<Def>();

public Function(Def def) {
arity = def.getArity();
name = def.getName();
defs.add(def);
}

public int getArity() {
return arity;
}

public String getName() {
return name;
}

public void addDef(Def def) {
assert arity == def.getArity() && name.equals(def.getName());
defs.add(def);
}

public Value eval(ArrayList<Value> args, String stacktrace) {
for (Def def : defs) {
Value value = def.eval(args, stacktrace);
if (value != null)
return value;
}
throw new EvalException("No matching pattern: " + stacktrace);
}

public void parseBodies(Interpreter interpreter) {
for (Def def : defs)
def.parseBody(interpreter);
}
}

public static class Def {
private String name;
private String location;
private ArrayList<Pattern> patterns;
private ArrayList<Token> body;
private Expr expr;

public Def(Token name, ArrayList<Pattern> patterns, Token eq, ArrayList<Token> body) {
this.name = name != null ? name.getName() : null;
this.location = name != null ? name.getLocation() : eq.getLocation();
this.patterns = patterns;
this.body = body;
}

public int getArity() {
return patterns != null ? patterns.size() : 0;
}

public String getName() {
return name;
}

public String getLocation() {
return location;
}

public Value eval(ArrayList<Value> args, String stacktrace) {
assert args == null ? getArity() == 0 : args.size() == getArity();
if (getArity() == 0)
return expr.eval(null, stacktrace);
HashMap<String,Value> bindings = new HashMap<String,Value>();
for (int i = 0; i < getArity(); i++) {
Pattern pattern = patterns.get(i);
Value binding = pattern.getBindingIfMatches(args.get(i));
if (binding == null)
return null;
bindings.put(pattern.getName(), binding);
}
return expr.eval(bindings, stacktrace);
}

public void parseBody(Interpreter interpreter) {
HashSet<String> bindings = new HashSet<String>();
if (patterns != null)
for (Pattern pattern : patterns)
if (pattern.getName() != null)
bindings.add(pattern.getName());
ExprParser exprParser = new ExprParser(body, bindings, interpreter);
ArrayList<Expr> exprs = new ArrayList<Expr>();
for (;;) {
Expr nextExpr = exprParser.nextExpr();
if (nextExpr == null)
break;
exprs.add(nextExpr);
}
if (exprs.size() == 0)
expr = Expr.NIL;
else if (exprs.size() == 1)
expr = exprs.get(0);
else
expr = new ConcatExpr(exprs);
}
}

public static class Pattern {
private String name;
private boolean[] bits;
private boolean nilTerminated;

public Pattern(ArrayList<Token> tokens) {
nilTerminated = false;
int numberOfBits = 0;
for (Token token : tokens) {
switch (token.getTokenType()) {
case ZERO: case ONE:
numberOfBits++;
break;
case NIL:
nilTerminated = true;
break;
case SYMBOL:
name = token.getName();
break;
}
}
bits = new boolean[numberOfBits];
int i = 0;
for (Token token : tokens) {
switch (token.getTokenType()) {
case ZERO:
bits[i++] = false;
break;
case ONE:
bits[i++] = true;
break;
}
}
}

public String getName() {
return name;
}

public Value getBindingIfMatches(Value value) {
for (int i = 0; i < bits.length; i++)
if (!value.notNil(i) || value.get(i) != bits[i])
return null;
if (nilTerminated)
return value.notNil(bits.length) ? null : Value.NIL;
return name != null ? new TailValue(value, bits.length) : Value.NIL;
}
}

public static abstract class Value {
public static Value NIL = new Value() {
public boolean notNil(int index) {
return false;
}

public boolean get(int index) {
return false;
}
};

public abstract boolean notNil(int index);
public abstract boolean get(int index);
}

public static class StreamValue extends Value {
private InputStream in;
private ArrayList<Byte> bytes = new ArrayList<Byte>();
private boolean eof = false;

public StreamValue(InputStream in) {
this.in = in;
}

private void readToIndex(int index) {
assert index >= 0;
try {
while (!eof && bytes.size() <= index/8) {
int b = in.read();
if (b < 0)
eof = true;
else
bytes.add((byte) b);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}

public boolean notNil(int index) {
readToIndex(index);
return bytes.size() > index/8;
}

public boolean get(int index) {
readToIndex(index);
if (bytes.size() <= index/8)
throw new RuntimeException("eof index="+index);
int b = bytes.get(index/8);
return (b & (1 << 7-(index%8))) != 0;
}
}

public static class TailValue extends Value {
private Value baseValue;
private int tailIndex;

public TailValue(Value baseValue, int tailIndex) {
this.baseValue = baseValue;
this.tailIndex = tailIndex;
}

public boolean notNil(int index) {
return baseValue.notNil(index+tailIndex);
}

public boolean get(int index) {
return baseValue.get(index+tailIndex);
}
}

public static class FuncallValue extends Value {
private Function function;
private ArrayList<Value> args;
private String stacktrace;
private Value value = null;

public FuncallValue(Function function, ArrayList<Value> args, String stacktrace) {
this.function = function;
this.args = args;
this.stacktrace = stacktrace;
}

private void getValue() {
if (value == null)
value = function.eval(args, stacktrace);
}

public boolean notNil(int index) {
getValue();
return value.notNil(index);
}

public boolean get(int index) {
getValue();
return value.get(index);
}
}

public static class ConcatValue extends Value {
private ArrayList<Value> values;
private int[] startIndices;

public ConcatValue(ArrayList<Value> values) {
this.values = values;
startIndices = new int[values.size()];
Arrays.fill(startIndices, -1);
startIndices[0] = 0;
}

private int getSuperIndex(int index) {
assert index >= 0;
assert startIndices.length == values.size();
outerLoop:
for (int i = 0; i < startIndices.length - 1; i++) {
if (startIndices[i+1] > index)
return i;
if (startIndices[i+1] < 0) {
Value value = values.get(i);
for (int j = 0; startIndices[i]+j <= index; j++)
if (!value.notNil(j)) {
startIndices[i+1] = startIndices[i]+j;
continue outerLoop;
}
return i;
}
}
return startIndices.length-1;
}

public boolean notNil(int index) {
int superIndex = getSuperIndex(index);
return values.get(superIndex).notNil(index - startIndices[superIndex]);
}

public boolean get(int index) {
int superIndex = getSuperIndex(index);
return values.get(superIndex).get(index - startIndices[superIndex]);
}
}

public static class LiteralValue extends Value {
private boolean[] bits;

public LiteralValue(boolean[] bits) {
this.bits = bits;
}

public boolean notNil(int index) {
assert index >= 0;
return index < bits.length;
}

public boolean get(int index) {
assert index >= 0;
return bits[index];
}
}

public static abstract class Expr {
public static final Expr NIL = new Expr() {
public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return Value.NIL;
}
};

public abstract Value eval(HashMap<String,Value> bindings, String stacktrace);
}

public static class FuncallExpr extends Expr {
private Function function;
private ArrayList<Expr> args;
private String location;

public FuncallExpr(Function function, ArrayList<Expr> args, String location) {
this.function = function;
this.args = args;
this.location = location;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
ArrayList<Value> argValues = null;
if (args != null) {
argValues = new ArrayList<Value>(args.size());
for (Expr arg : args)
argValues.add(arg.eval(bindings, stacktrace));
}
return new FuncallValue(function, argValues, location+":"+function.getName()+"\n"+stacktrace);
}
}

public static class ConcatExpr extends Expr {
private ArrayList<Expr> exprs;

public ConcatExpr(ArrayList<Expr> exprs) {
this.exprs = exprs;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
ArrayList<Value> values = new ArrayList<Value>(exprs.size());
for (Expr expr : exprs)
values.add(expr.eval(bindings, stacktrace));
return new ConcatValue(values);
}
}

public static class LiteralExpr extends Expr {
private LiteralValue value;

public LiteralExpr(ArrayList<Token> tokens, int start, int end) {
boolean[] bits = new boolean[end-start];
for (int i = 0; i+start < end; i++)
bits[i] = tokens.get(i+start).getTokenType() == TokenType.ONE;
value = new LiteralValue(bits);
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return value;
}
}

public static class BoundExpr extends Expr {
private String name;

public BoundExpr(String name) {
this.name = name;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return bindings.get(name);
}
}

public static class EvalException extends RuntimeException {
public EvalException(String message) {
super(message);
}
}

public static class ParseException extends RuntimeException {
public ParseException(String message) {
super(message);
}
}

public static class Parser {
private Tokenizer tokenizer = new Tokenizer();

private Token currentName = null;
private ArrayList<Token> currentPattern = null;
private ArrayList<Pattern> currentPatterns = null;
private Token currentEq = null;
private ArrayList<Token> currentBody = null;

public Def nextDef() {
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
return null;
if (currentEq != null) {
switch (token.getTokenType()) {
case DOT:
Def def = new Def(currentName, currentPatterns, currentEq, currentBody);
currentName = null;
currentPattern = null;
currentPatterns = null;
currentEq = null;
currentBody = null;
return def;
case EQ:
throw new ParseException(token.getLocation()+": Error: unexpected =");
default:
if (currentBody == null)
currentBody = new ArrayList<Token>();
currentBody.add(token);
break;
}
} else if (token.getTokenType() == TokenType.EQ) {
if (currentPattern != null) {
if (currentPatterns == null)
currentPatterns = new ArrayList<Pattern>();
currentPatterns.add(new Pattern(currentPattern));
currentPattern = null;
}
currentEq = token;
} else if (currentName == null) {
if (token.getTokenType() != TokenType.SYMBOL)
throw new ParseException(token.getLocation()+": Error: expected symbol, got " + token.getName());
currentName = token;
} else {
if (currentPattern == null)
currentPattern = new ArrayList<Token>();
currentPattern.add(token);
switch (token.getTokenType()) {
case DOT: case NIL: case SYMBOL:
if (currentPatterns == null)
currentPatterns = new ArrayList<Pattern>();
Pattern nextPattern = new Pattern(currentPattern);
if (nextPattern.getName() != null)
for (Pattern pattern : currentPatterns)
if (nextPattern.getName().equals(pattern.getName()))
throw new ParseException(token.getLocation()+": Error: multiple bindings of "+nextPattern.getName());
currentPatterns.add(nextPattern);
currentPattern = null;
}
}
}
}

public void parseLine(String line, String location) {
assert tokenizer.nextToken() == null;
tokenizer.tokenizeLine(line, location);
}

public void reset() {
currentName = null;
currentPattern = null;
currentPatterns = null;
currentEq = null;
currentBody = null;
tokenizer.reset();
}

public boolean isDefPending() {
return currentName != null || currentEq != null;
}
}

public static enum TokenType {
ZERO, ONE, DOT, NIL, EQ, SYMBOL;
}

public static class Token {
private TokenType tokenType;
private String name;
private String location;

public Token(TokenType tokenType, String name, String location) {
this.tokenType = tokenType;
this.name = name;
this.location = location;
}

public TokenType getTokenType() {
return tokenType;
}

public String getName() {
switch (tokenType) {
case ZERO: return "0";
case ONE: return "1";
case DOT: return ".";
case NIL: return "_";
case EQ: return "=";
}
return name;
}

public String getLocation() {
return location;
}
}

public static class Tokenizer {
private String line = null;
private String location;
private int index = 0;

public Token nextToken() {
if (line == null)
return null;
while (index < line.length()) {
switch (line.charAt(index)) {
case ' ': case '\t': case '\r': case '\n':
index++;
break;
case '0':
index++;
return new Token(TokenType.ZERO, null, location);
case '1':
index++;
return new Token(TokenType.ONE, null, location);
case '.':
index++;
return new Token(TokenType.DOT, null, location);
case '_':
index++;
return new Token(TokenType.NIL, null, location);
case '=':
index++;
if (index < line.length() && line.charAt(index) == '=') {
line = null;
return null;
}
return new Token(TokenType.EQ, null, location);
default:
return symbolToken();
}
}
line = null;
return null;
}

private Token symbolToken() {
int start = index;
while (++index < line.length()) {
switch (line.charAt(index)) {
case ' ': case '\t': case '\r': case '\n':
case '0': case '1': case '.': case '_': case '=':
return new Token(TokenType.SYMBOL, line.substring(start, index), location);
}
}
return new Token(TokenType.SYMBOL, line.substring(start), location);
}

public void tokenizeLine(String line, String location) {
assert this.line == null;
this.line = line;
this.location = location;
index = 0;
}

public void reset() {
line = null;
location = null;
}
}

public static class ExprParser {
private ArrayList<Token> tokens;
private HashSet<String> bindings;
private Interpreter interpreter;
private int index = 0;

public ExprParser(ArrayList<Token> tokens, HashSet<String> bindings, Interpreter interpreter) {
this.tokens = tokens;
this.bindings = bindings;
this.interpreter = interpreter;
}

public Expr nextExpr() {
if (index >= tokens.size())
return null;
Token token = tokens.get(index);
switch (token.getTokenType()) {
case ZERO: case ONE:
return literalExpr();
case NIL:
index++;
return Expr.NIL;
case SYMBOL:
index++;
if (bindings.contains(token.getName()))
return new BoundExpr(token.getName());
return funcallExpr(token);
default:
assert false;
throw new RuntimeException();
}
}

public LiteralExpr literalExpr() {
int start = index;
int end = tokens.size();
loop:
while (++index < tokens.size()) {
switch (tokens.get(index).getTokenType()) {
case ZERO: case ONE:
break;
case NIL:
end = index;
index++;
break loop;
default:
end = index;
break loop;
}
}
return new LiteralExpr(tokens, start, end);
}

public FuncallExpr funcallExpr(Token token) {
Function function = interpreter.getFunction(token.getName());
if (function == null)
throw new ParseException(token.getLocation()+": Error: undefined symbol: "+token.getName());
if (function.getArity() == 0)
return new FuncallExpr(function, null, token.getLocation());
ArrayList<Expr> args = new ArrayList<Expr>(function.getArity());
for (int i = 0; i < function.getArity(); i++) {
Expr arg = nextExpr();
if (arg == null)
throw new ParseException(token.getLocation()+": Error: function requires "+function.getArity()+" argument(s): "+function.getName());
args.add(arg);
}
return new FuncallExpr(function, args, token.getLocation());
}
}
}

Here's my second interpreter, which doesn't use a huge amount of stack and, in principle, ought to allow garbage collection. This implementation was harder to get right than the first one, and, as in the first one, getting ConcatValue to work correctly was one of the biggest hurdles. It's super slow, so I have not tested it much.

import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Reader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;

/**
* Interpreter for the 01_ language.
* Usage:
* java i01_ [-1 INPUTFILE1] [-2 INPUTFILE2] ... SRCFILE ... FUNCTION
* Evaluates the specified function after reading the specified
* source files. The first argument, if used, is stdin. The second
* argument, if used, is the file INPUTFILE1. The 3rd argument, if
* used, is the file INPUTFILE2, etc.
*/
public class in01_ {
public static void main(String[] args) throws Exception {
HashMap<Integer,String> inputfiles = new HashMap<Integer,String>();
int index = 0;
while (index < args.length - 2) {
int inputfile;
try {
inputfile = Integer.parseInt(args[index]);
} catch (NumberFormatException e) {
break;
}
if (inputfile >= 0)
break;
inputfiles.put(-inputfile, args[index + 1]);
index += 2;
}
HashMap<String,Function> functions = new HashMap<String,Function>();
while (index < args.length - 1) {
readFile(functions, new FileReader(args[index]));
index++;
}
for (Function function : functions.values())
function.parseBodies(functions);
Function function = functions.get(args[args.length - 1]);
if (function == null)
throw new RuntimeException("Undefined symbol: " + args[args.length - 1]);
ArrayList<Value> funargs = new ArrayList<Value>();
if (function.getArity() > 0)
funargs.add(new StreamValue(System.in));
for (int i = 1; i < function.getArity(); i++) {
String inputfile = inputfiles.get(i);
if (inputfile == null)
funargs.add(Value.NIL);
else
funargs.add(new StreamValue(new FileInputStream(inputfile)));
}
writeValue(System.out, function.eval(funargs));
System.out.flush();
}

private static void readFile(HashMap<String,Function> functions, Reader reader) {
DefReader defReader = new DefReader(reader);
for (Def def = defReader.nextDef(); def != null; def = defReader.nextDef()) {
Function function = functions.get(def.getName());
if (function == null)
functions.put(def.getName(), new Function(def));
else
function.addDef(def);
}
}

public static void writeValue(OutputStream out, Value value) throws Exception {
int output = 0;
int bit = 7;
value = forceValue(value);
while (value.head() != HeadValue.NIL) {
if (value.head() == HeadValue.ONE)
output |= 1<<bit;
if (bit > 0) {
bit--;
} else {
out.write(output);
bit = 7;
output = 0;
}
value = forceValue(value.tail());
}
}

public static Value forceValue(Value value) {
while (value.head() == HeadValue.DELAYED)
value = value.force();
return value;
}

public static enum HeadValue {
ZERO, ONE, NIL, DELAYED;
}

public static abstract class Value {
public static final Value NIL = new Value() {
public HeadValue head() {
return HeadValue.NIL;
}

public Value tail() {
throw new RuntimeException();
}

public Value force() {
return this;
}
};

public abstract HeadValue head();
public abstract Value tail();
public abstract Value force();
}

public static class StreamValue extends Value {
private InputStream in;
private HeadValue head;
private StreamValue tail;
private int bit;
private int inputByte;

private StreamValue(InputStream in, HeadValue head, int bit, int inputByte) {
assert head != HeadValue.NIL;
assert head != HeadValue.DELAYED || bit == 7;
this.in = in;
this.head = head;
this.tail = null;
this.bit = bit;
this.inputByte = inputByte;
}

public StreamValue(InputStream in) {
this(in, HeadValue.DELAYED, 7, 0);
}

public HeadValue head() {
return head;
}

public Value tail() {
assert head != HeadValue.DELAYED && head != HeadValue.NIL;
if (tail != null)
return tail;
if (bit == 0)
tail = new StreamValue(in);
else
tail = new StreamValue(in, bitValue(bit - 1), bit - 1, inputByte);
return tail;
}

private HeadValue bitValue(int bit) {
return (inputByte & (1 << bit)) != 0 ? HeadValue.ONE : HeadValue.ZERO;
}

public Value force() {
if (head != HeadValue.DELAYED)
return this;
try {
inputByte = in.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
if (inputByte < 0)
head = HeadValue.NIL;
else
head = bitValue(bit);
return this;
}
}

public static class LiteralValue extends Value {
private HeadValue head;
private Value tail;

public LiteralValue(Token token, Value tail) {
assert token.getTokenType() == TokenType.ONE || token.getTokenType() == TokenType.ZERO;
head = token.getTokenType() == TokenType.ONE ? HeadValue.ONE : HeadValue.ZERO;
this.tail = tail;
}

public HeadValue head() {
return head;
}

public Value tail() {
return tail;
}

public Value force() {
return this;
}
}

public static class ConcatValue extends Value {
private Value headList;
private Value tailList;

private Value tail;

public ConcatValue(Value headList, Value tailList) {
assert headList.head() != HeadValue.NIL;
this.headList = headList;
this.tailList = tailList;
this.tail = null;
}

public HeadValue head() {
return headList.head();
}

public Value tail() {
assert headList.head() != HeadValue.DELAYED;
assert headList.head() != HeadValue.NIL;
if (tail == null) {
if (headList.tail().head() == HeadValue.NIL)
tail = tailList;
else
tail = new ConcatValue(headList.tail(), tailList);
}
return tail;
}

public Value force() {
headList = forceValue(headList);
if (headList.head() == HeadValue.NIL)
return tailList;
if (!(headList instanceof ConcatValue))
return this;
ConcatValue head = (ConcatValue) headList;
return new ConcatValue(head.headList, new ConcatValue(head.tailList, tailList));
}
}

public static class FuncallValue extends Value {
private Function function;
private ArrayList<Value> args;

public FuncallValue(Function function, ArrayList<Value> args) {
this.function = function;
this.args = args;
}

public HeadValue head() {
return HeadValue.DELAYED;
}

public Value tail() {
assert false;
throw new RuntimeException();
}

public Value force() {
return function.eval(args);
}
}

public static abstract class Expr {
public static final Expr NIL = new Expr() {
public Value eval(HashMap<String,Value> bindings) {
return Value.NIL;
}
};

public abstract Value eval(HashMap<String,Value> bindings);
}

public static class LiteralExpr extends Expr {
private Value value;

public LiteralExpr(ArrayList<Token> tokens, int start, int end) {
value = getValue(tokens, start, end);
}

public Value eval(HashMap<String,Value> bindings) {
return value;
}

private Value getValue(ArrayList<Token> tokens, int start, int end) {
if (start >= end)
return Value.NIL;
return new LiteralValue(tokens.get(start), getValue(tokens, start + 1, end));
}
}

public static class ConcatExpr extends Expr {
private Expr head;
private Expr tail;

public ConcatExpr(Expr head, Expr tail) {
this.head = head;
this.tail = tail;
}

public Value eval(HashMap<String,Value> bindings) {
return new ConcatValue(head.eval(bindings), tail.eval(bindings));
}
}

public static class BoundExpr extends Expr {
private String name;

public BoundExpr(String name) {
this.name = name;
}

public Value eval(HashMap<String,Value> bindings) {
return bindings.get(name);
}
}

public static class FuncallExpr extends Expr {
private Function function;
private ArrayList<Expr> args;

public FuncallExpr(Function function, ArrayList<Expr> args) {
this.function = function;
this.args = args;
}

public Value eval(HashMap<String,Value> bindings) {
ArrayList<Value> argValues = new ArrayList<Value>(args.size());
for (Expr arg : args)
argValues.add(arg.eval(bindings));
return new FuncallValue(function, argValues);
}
}

public static class Function {
private ArrayList<Def> defs = new ArrayList<Def>();

public Function(Def def) {
defs.add(def);
}

public int getArity() {
return defs.get(0).getArity();
}

public void addDef(Def def) {
assert def.getName().equals(defs.get(0).getName());
if (def.getArity() != defs.get(0).getArity())
throw new RuntimeException(def.getName() + " defined as " + def.getArity() + "-ary, originally " + defs.get(0).getArity() + "-ary");
defs.add(def);
}

public void parseBodies(HashMap<String,Function> functions) {
for (Def def : defs)
def.parseBody(functions);
}

public Value eval(ArrayList<Value> args) {
for (Def def : defs) {
Value value = def.eval(args);
if (value != null)
return value;
}
throw new RuntimeException("No matching pattern for " + defs.get(0).getName());
}
}

public static class Def {
private String name;
private ArrayList<Pattern> patterns;
private ArrayList<Token> bodyTokens;
private Expr body = null;

public Def(String name, ArrayList<Pattern> patterns, ArrayList<Token> bodyTokens) {
this.name = name;
this.patterns = patterns;
this.bodyTokens = bodyTokens;
}

public String getName() {
return name;
}

public int getArity() {
return patterns.size();
}

public void parseBody(HashMap<String,Function> functions) {
HashSet<String> bindings = new HashSet<String>();
for (Pattern pattern : patterns)
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.add(pattern.getBinding().getName());
body = new Parser(functions, bindings).getExpr();
if (body == null)
body = Expr.NIL;
}

private class Parser {
private int index = 0;
private HashMap<String,Function> functions;
private HashSet<String> bindings;

public Parser(HashMap<String,Function> functions, HashSet<String> bindings) {
this.functions = functions;
this.bindings = bindings;
}

public Expr getExpr() {
Expr expr = nextExpr();
if (index < bodyTokens.size())
return new ConcatExpr(expr, getExpr());
return expr;
}

public Expr nextExpr() {
if (index >= bodyTokens.size())
return null;
Token token = bodyTokens.get(index);
switch (token.getTokenType()) {
case ONE: case ZERO:
return literalExpr();
case NIL:
index++;
return Expr.NIL;
case SYMBOL:
index++;
if (bindings.contains(token.getName()))
return new BoundExpr(token.getName());
return funcallExpr(token);
default:
assert false;
throw new RuntimeException();
}
}

public LiteralExpr literalExpr() {
int start = index;
while (++index < bodyTokens.size()) {
switch (bodyTokens.get(index).getTokenType()) {
case ZERO: case ONE:
break;
case NIL:
return new LiteralExpr(bodyTokens, start, index++);
default:
return new LiteralExpr(bodyTokens, start, index);
}
}
return new LiteralExpr(bodyTokens, start, bodyTokens.size());
}

public FuncallExpr funcallExpr(Token token) {
Function function = functions.get(token.getName());
if (function == null)
throw new RuntimeException("Undefined symbol: " + token.getName() + " in body of " + name);
ArrayList<Expr> args = new ArrayList<Expr>(function.getArity());
for (int i = 0; i < function.getArity(); i++) {
Expr arg = nextExpr();
if (arg == null)
throw new RuntimeException("Not enough arguments for " + token.getName() + " in body of " + name + ": needs " + function.getArity());
args.add(arg);
}
return new FuncallExpr(function, args);
}
}

public Value eval(ArrayList<Value> args) {
HashMap<String,Value> bindings = new HashMap<String,Value>();
for (int i = 0; i < args.size(); i++) {
Pattern pattern = patterns.get(i);
Value match = pattern.tryMatch(args.get(i));
if (match == null)
return null;
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.put(pattern.getBinding().getName(), match);
}
return body.eval(bindings);
}
}

public static class Pattern {
private Token binding;
private ArrayList<Boolean> bits;

public Pattern(Token binding, ArrayList<Boolean> bits) {
this.binding = binding;
this.bits = bits;
}

public Token getBinding() {
return binding;
}

public Value tryMatch(Value value) {
for (boolean bit : bits) {
value = forceValue(value);
switch (value.head()) {
case ZERO:
if (bit)
return null;
break;
case ONE:
if (!bit)
return null;
break;
case DELAYED:
assert false;
default:
return null;
}
value = value.tail();
}
if (binding.getTokenType() == TokenType.NIL) {
value = forceValue(value);
return value.head() == HeadValue.NIL ? value : null;
}
return value;
}
}

public static class DefReader {
private Tokenizer tokenizer;

public DefReader(Reader reader) {
this.tokenizer = new Tokenizer(reader);
}

public Def nextDef() {
Token name = tokenizer.nextToken();
if (name == null)
return null;
if (name.getTokenType() != TokenType.SYMBOL)
throw new RuntimeException("Expected symbol, got " + name.getName());
ArrayList<Pattern> patterns = getPatterns();
ArrayList<Token> body = getBody();
return new Def(name.getName(), patterns, body);
}

private ArrayList<Pattern> getPatterns() {
ArrayList<Pattern> patterns = new ArrayList<Pattern>();
ArrayList<Boolean> pattern = new ArrayList<Boolean>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Expected =, got EOF");
switch (token.getTokenType()) {
case ZERO:
pattern.add(false);
break;
case ONE:
pattern.add(true);
break;
case SYMBOL:
for (Pattern p : patterns)
if (token.getName().equals(p.getBinding().getName()))
throw new RuntimeException("Multiple patterns bind " + token.getName());
/*FALLTHROUGH*/
case DOT:
case NIL:
patterns.add(new Pattern(token, pattern));
pattern = new ArrayList<Boolean>();
break;
case EQ:
if (pattern.size() > 0)
patterns.add(new Pattern(Token.DOT, pattern));
return patterns;
}
}
}

private ArrayList<Token> getBody() {
ArrayList<Token> body = new ArrayList<Token>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Got unexpected EOF");
switch (token.getTokenType()) {
case EQ:
throw new RuntimeException("Got unexpected =");
case DOT:
return body;
default:
body.add(token);
break;
}
}
}
}

public static enum TokenType {
ZERO, ONE, DOT, NIL, EQ, SYMBOL;
}

public static class Token {
public static Token ZERO = new Token(TokenType.ZERO, "0");
public static Token ONE = new Token(TokenType.ONE, "1");
public static Token DOT = new Token(TokenType.DOT, ".");
public static Token NIL = new Token(TokenType.NIL, "_");
public static Token EQ = new Token(TokenType.EQ, "=");

private TokenType tokenType;
private String name;

private Token(TokenType tokenType, String name) {
this.tokenType = tokenType;
this.name = name;
}

public Token(String name) {
this(TokenType.SYMBOL, name);
}

public TokenType getTokenType() {
return tokenType;
}

public String getName() {
return name;
}
}

public static class Tokenizer {
private Reader reader;
private int nextChar = -1;

public Tokenizer(Reader reader) {
this.reader = reader;
}

private int getNextChar() {
if (nextChar < 0)
try {
return reader.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
int result = nextChar;
nextChar = -1;
return result;
}

private void pushback(int pushedChar) {
assert nextChar < 0;
nextChar = pushedChar;
}

public Token nextToken() {
for (;;) {
int ch = getNextChar();
if (ch < 0)
return null;
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
break;
case '0':
return Token.ZERO;
case '1':
return Token.ONE;
case '.':
return Token.DOT;
case '_':
return Token.NIL;
case '=':
ch = getNextChar();
if (ch != '=') {
pushback(ch);
return Token.EQ;
}
discardComment();
break;
default:
pushback(ch);
return readSymbol();
}
}
}

private void discardComment() {
for (;;) {
int ch = getNextChar();
if (ch < 0 || ch == '\n')
return;
}
}

private Token readSymbol() {
StringBuilder sb = new StringBuilder();
for (;;) {
int ch = getNextChar();
if (ch < 0)
return new Token(sb.toString());
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
case '0': case '1': case '.': case '_': case '=':
pushback(ch);
return new Token(sb.toString());
default:
sb.append((char) ch);
break;
}
}
}
}
}