Wednesday, June 17, 2009

Sometimes when there is not much work to do at work, I write code for fun. During one such recent period, I came up with an idea for a programming language in which the only datatype is a list of bits. It would qualify as an esoteric language.

I decided it would have pattern-matching, lazy evaluation, and be side-effect free. Running a program would be the evaluation of a function where the parameter was standard input. The result of the function would be sent to standard output.

I wanted a minimal syntax. The first few special tokens that I decided on were 0 for the literal 0 bit, 1 for the literal 1 bit, = for introducing the function body in a function definition, . for ending a function body, and _ for the wildcard pattern. Sequences of all other non-whitespace characters would be symbol tokens, and whitespace would separate tokens. I would wind up making a few changes.

With 0 and 1 being tokens, symbols wouldn't be able to contain 0 or 1, which is a bit perverse.

Designing the pattern-matching syntax, I realized that I needed another token for nil, or the empty list. I considered various characters, when I realized that . left of the = could be syntactically distinct from its role of terminating the function body right of the =, so I could use . as nil. Using a period as nil, which also terminates a list, also appealed to me.

Then, I realized that I needed nil in the function body as well. My first thought was to use . and find some other character to terminate the function body, such as ;, as used by C and Pascal. However, I couldn't find a character that I liked as much as . for terminating the function, so I again considered various characters to represent nil. I realized that _ could be used in the function body distinct of its role as the wildcard pattern left of the =. However, it would be perverse and inelegant to have . be nil left of the =, while _ is nil right of the =.

So then I was choosing between using . as nil and _ as the wildcard pattern and the function definition terminator, versus using _ as nil and . as the wildcard pattern and the function definition terminator. The former had the appeal of . as nil, the list terminator, and _ is used by various languages as the wildcard pattern, making that fairly standard. However, I decided on the former, because I really liked . to terminate the function definition, and really disliked _ for that.

Later, I decided I wanted to allow comments. First, I considered # until the end of the line as in the Bourne shell and various other languages. However, partly inspired by -- as in Ada and various other languages, I realized that == would be otherwise syntactically invalid and would not introduce a new special character, so that would be how to add comments.

I decided to call the language 01_. From then on, it was pretty straightforward to design and it was straightforward to write an interpreter, albeit a very inefficient one. The main design decision was that multiple expressions in a function body would be concatenated as the result.

Some examples:

Hello World:
h = 01001000011001010110110001101100011011110010000001110111011011110111001001101100011001000010000100001010.

Self-printing program (It could be 73 characters shorter, but I wanted it to end with a newline):

q=d p d0010111000001010.p0b=00110000p b.p1b=00110001p b.p_=_.d=011100010011110101100100001000000111000000100000011001000011000000110000001100010011000000110001001100010011000100110000001100000011000000110000001100000011000100110000001100010011000000101110011100000011000001100010001111010011000000110000001100010011000100110000001100000011000000110000011100000010000001100010001011100111000000110001011000100011110100110000001100000011000100110001001100000011000000110000001100010111000000100000011000100010111001110000010111110011110101011111001011100110010000111101.

Fibonacci sequence:

fibonacci = 101 + fibonacci dropfirst fibonacci.

dropfirst 1x = x.

+ 0a b = 0 + a b.
+ a 0b = 0 + a b.
+ 1a 1b = 1 + a b.

99 bottles of beer (somewhat obfuscated):

== " of beer"
b = 0010000001101111011001100010000001100010011001010110010101110010.

== This is a cool function. It capitalizes its argument.
2 0113 = 0103.

== " on the wall"
33r = 001000000110111101101110001000000111010001101000011001010010000001110111011000010110110001101100.

== Little-endian modulus 10 decrement, i.e. BCD (binary coded decimal).
4 0000 = 1001.
4 _ = _.
4 02 = 1 4 2.
4 12 = 0 2.

== n bottles
5 0000. 0000 = 01101110011011110010000001101101011011110111001001100101 7 01110011.
5 0000. 1000 = 00110001 7.
5 0000. 3 = 0011 6 3 7 01110011.
5 2 3 = 0011 6 2 0011 6 3 7 01110011.

== Reverse bits.
6 _ = _.
6 02 = 6 2 0.
6 12 = 6 2 1.

== " bottle"
7 = 00100000011000100110111101110100011101000110110001100101.

== Decrement 10s.
8 2 0000 = 4 2.
8 2 . = 2.

== Verse.
9 0000. 0000 = 2 5 0000_0000 b 33r 0010110000100000 5 0000_0000 b 00101110000010100100011101101111001000000111010001101111001000000111010001101000011001010010000001110011011101000110111101110010011001010010000001100001011011100110010000100000011000100111010101111001001000000111001101101111011011010110010100100000011011010110111101110010011001010010110000100000 5 1001_1001 b 33r 0010111000001010.
9 2 3 = 5 2 3 b 33r 0010110000100000 5 2 3 b 001011100000101001010100011000010110101101100101001000000110111101101110011001010010000001100100011011110111011101101110001000000110000101101110011001000010000001110000011000010111001101110011001000000110100101110100001000000110000101110010011011110111010101101110011001000010110000100000 5 8 2 3 4 3 b 33r 001011100000101000001010 9 8 2 3 4 3.

99 = 9 1001_1001.


Here is my first implementation of a 01_ interpreter. It suffers from profligate stack usage and it doesn't allow unused values to be garbage collected. It's kind of a stupid implementation in that sense. I should have used linked lists to allow unused values to become unreferenced and garbage collected. However, it was straightforward to implement and, other than i01_.ConcatValue.getSuperIndex(), all the exercised code worked as intended on the first try. It has only been lightly tested, though.

I put it all in one file because an interpreter for a small language ought to be small. However, at 840 lines, it turned out to be a little bigger than I had envisioned.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;

/**
* Interpreter for the 01_ language.
* Usage:
* java i01_
* Runs a 01_ repl (read-eval-print-loop).
* java i01_ -repl SRCFILE ...
* Runs a 01_ repl after reading the specified source files.
* java i01_ [-1 INPUTFILE1] [-2 INPUTFILE2] ... SRCFILE ... FUNCTION
* Evaluates the specified function after reading the specified
* source files. The first argument, if used, is stdin. The second
* argument, if used, is the file INPUTFILE1. The 3rd argument, if
* used, is the file INPUTFILE2, etc.
*
* The repl reader reads 01_ defs and recognizes non-01_ syntax for
* evaluating an expression:
* = EXPR.
* which evaluates the given expression and prints the result.
*/
public class i01_ {
private static final int MAX_OUTPUT_BITS = 1500;

public static void main(String[] args) throws Exception {
Interpreter interpreter = new Interpreter();

if (args.length != 0) {
boolean repl = false;
int start = 0;
HashMap<Integer,String> inputfiles = null;
if ("-repl".equals(args[0])) {
start = 1;
repl = true;
} else {
inputfiles = new HashMap<Integer,String>();
while (start < args.length - 1) {
try {
int i = Integer.parseInt(args[start]);
if (i >= 0)
break;
inputfiles.put(-i, args[start+1]);
start += 2;
} catch (NumberFormatException e) {
break;
}
}
}
for (int i = start; i < args.length - (repl ? 0 : 1); i++)
readFile(interpreter, args[i]);
if (!repl) {
interpreter.parseBodies();
Function function = interpreter.getFunction(args[args.length-1]);
ArrayList<Value> funargs = new ArrayList<Value>();
if (function.getArity() > 0)
funargs.add(new StreamValue(System.in));
for (int i = 1; i < function.getArity(); i++) {
String inputfile = inputfiles.get(i);
funargs.add(inputfile == null ? Value.NIL : new StreamValue(new FileInputStream(inputfile)));
}
writeAsBytes(function.eval(funargs, "(cmdline):"+function.getName()), System.out);
System.out.flush();
return;
}
}
repl(interpreter, new BufferedReader(new InputStreamReader(System.in)));
}

private static void repl(Interpreter interpreter, BufferedReader in) throws Exception {
Parser parser = new Parser();
for (;;) {
System.out.print(parser.isDefPending() ? "_ " : ". ");
System.out.flush();
String line = in.readLine();
if (line == null)
break;
try {
parser.parseLine(line, "(repl)");
Def def;
while ((def = parser.nextDef()) != null) {
if (def.getName() != null) {
interpreter.addDef(def);
System.out.println("= "+def.getName());
} else {
interpreter.parseBodies();
def.parseBody(interpreter);
System.out.print("= ");
writeAsBits(def.eval(null, "(repl)"), System.out);
System.out.println();
}
}
} catch (EvalException e) {
System.err.println(e.getMessage());
parser.reset();
} catch (ParseException e) {
System.err.println(e.getMessage());
parser.reset();
} catch (Exception e) {
e.printStackTrace();
parser.reset();
}
}
}

private static void readFile(Interpreter interpreter, String filename) throws Exception {
Parser parser = new Parser();
BufferedReader in = new BufferedReader(new FileReader(filename));
int linenumber = 0;
String location = null;
for (;;) {
String line = in.readLine();
if (line == null)
break;
linenumber++;
location = filename + ":" + linenumber;
parser.parseLine(line, location);
try {
Def def;
while ((def = parser.nextDef()) != null) {
if (def.getName() != null)
interpreter.addDef(def);
else
System.err.println(def.getLocation()+": Error: unexpected =");
}
} catch (ParseException e) {
System.err.println(e.getMessage());
}
}
}

private static void writeAsBytes(Value value, OutputStream out) throws Exception {
int shift = 8;
int b = 0;
for (int i = 0; value.notNil(i); i++) {
shift--;
b |= (value.get(i) ? 1 : 0) << shift;
if (shift <= 0) {
out.write(b);
shift = 8;
b = 0;
}
}
}

private static void writeAsBits(Value value, PrintStream out) throws Exception {
for (int i = 0; value.notNil(i); i++) {
out.print(value.get(i) ? "1" : "0");
if (i >= MAX_OUTPUT_BITS) {
out.print("...");
return;
}
}
out.print("_");
}

private static class Interpreter {
private HashMap<String,Function> functions = new HashMap<String,Function>();

public Function getFunction(String name) {
return functions.get(name);
}

public void addDef(Def def) {
Function function = functions.get(def.getName());
if (function == null)
functions.put(def.getName(), new Function(def));
else if (function.getArity() == def.getArity())
function.addDef(def);
else
throw new ParseException(def.getLocation() + ": Error: " + function.getName() + " defined to be " + function.getArity() + "-ary, def is " + def.getArity() + "-ary");
}

public void parseBodies() {
for (Function function : functions.values())
function.parseBodies(this);
}
}

public static class Function {
private int arity;
private String name;
private ArrayList<Def> defs = new ArrayList<Def>();

public Function(Def def) {
arity = def.getArity();
name = def.getName();
defs.add(def);
}

public int getArity() {
return arity;
}

public String getName() {
return name;
}

public void addDef(Def def) {
assert arity == def.getArity() && name.equals(def.getName());
defs.add(def);
}

public Value eval(ArrayList<Value> args, String stacktrace) {
for (Def def : defs) {
Value value = def.eval(args, stacktrace);
if (value != null)
return value;
}
throw new EvalException("No matching pattern: " + stacktrace);
}

public void parseBodies(Interpreter interpreter) {
for (Def def : defs)
def.parseBody(interpreter);
}
}

public static class Def {
private String name;
private String location;
private ArrayList<Pattern> patterns;
private ArrayList<Token> body;
private Expr expr;

public Def(Token name, ArrayList<Pattern> patterns, Token eq, ArrayList<Token> body) {
this.name = name != null ? name.getName() : null;
this.location = name != null ? name.getLocation() : eq.getLocation();
this.patterns = patterns;
this.body = body;
}

public int getArity() {
return patterns != null ? patterns.size() : 0;
}

public String getName() {
return name;
}

public String getLocation() {
return location;
}

public Value eval(ArrayList<Value> args, String stacktrace) {
assert args == null ? getArity() == 0 : args.size() == getArity();
if (getArity() == 0)
return expr.eval(null, stacktrace);
HashMap<String,Value> bindings = new HashMap<String,Value>();
for (int i = 0; i < getArity(); i++) {
Pattern pattern = patterns.get(i);
Value binding = pattern.getBindingIfMatches(args.get(i));
if (binding == null)
return null;
bindings.put(pattern.getName(), binding);
}
return expr.eval(bindings, stacktrace);
}

public void parseBody(Interpreter interpreter) {
HashSet<String> bindings = new HashSet<String>();
if (patterns != null)
for (Pattern pattern : patterns)
if (pattern.getName() != null)
bindings.add(pattern.getName());
ExprParser exprParser = new ExprParser(body, bindings, interpreter);
ArrayList<Expr> exprs = new ArrayList<Expr>();
for (;;) {
Expr nextExpr = exprParser.nextExpr();
if (nextExpr == null)
break;
exprs.add(nextExpr);
}
if (exprs.size() == 0)
expr = Expr.NIL;
else if (exprs.size() == 1)
expr = exprs.get(0);
else
expr = new ConcatExpr(exprs);
}
}

public static class Pattern {
private String name;
private boolean[] bits;
private boolean nilTerminated;

public Pattern(ArrayList<Token> tokens) {
nilTerminated = false;
int numberOfBits = 0;
for (Token token : tokens) {
switch (token.getTokenType()) {
case ZERO: case ONE:
numberOfBits++;
break;
case NIL:
nilTerminated = true;
break;
case SYMBOL:
name = token.getName();
break;
}
}
bits = new boolean[numberOfBits];
int i = 0;
for (Token token : tokens) {
switch (token.getTokenType()) {
case ZERO:
bits[i++] = false;
break;
case ONE:
bits[i++] = true;
break;
}
}
}

public String getName() {
return name;
}

public Value getBindingIfMatches(Value value) {
for (int i = 0; i < bits.length; i++)
if (!value.notNil(i) || value.get(i) != bits[i])
return null;
if (nilTerminated)
return value.notNil(bits.length) ? null : Value.NIL;
return name != null ? new TailValue(value, bits.length) : Value.NIL;
}
}

public static abstract class Value {
public static Value NIL = new Value() {
public boolean notNil(int index) {
return false;
}

public boolean get(int index) {
return false;
}
};

public abstract boolean notNil(int index);
public abstract boolean get(int index);
}

public static class StreamValue extends Value {
private InputStream in;
private ArrayList<Byte> bytes = new ArrayList<Byte>();
private boolean eof = false;

public StreamValue(InputStream in) {
this.in = in;
}

private void readToIndex(int index) {
assert index >= 0;
try {
while (!eof && bytes.size() <= index/8) {
int b = in.read();
if (b < 0)
eof = true;
else
bytes.add((byte) b);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}

public boolean notNil(int index) {
readToIndex(index);
return bytes.size() > index/8;
}

public boolean get(int index) {
readToIndex(index);
if (bytes.size() <= index/8)
throw new RuntimeException("eof index="+index);
int b = bytes.get(index/8);
return (b & (1 << 7-(index%8))) != 0;
}
}

public static class TailValue extends Value {
private Value baseValue;
private int tailIndex;

public TailValue(Value baseValue, int tailIndex) {
this.baseValue = baseValue;
this.tailIndex = tailIndex;
}

public boolean notNil(int index) {
return baseValue.notNil(index+tailIndex);
}

public boolean get(int index) {
return baseValue.get(index+tailIndex);
}
}

public static class FuncallValue extends Value {
private Function function;
private ArrayList<Value> args;
private String stacktrace;
private Value value = null;

public FuncallValue(Function function, ArrayList<Value> args, String stacktrace) {
this.function = function;
this.args = args;
this.stacktrace = stacktrace;
}

private void getValue() {
if (value == null)
value = function.eval(args, stacktrace);
}

public boolean notNil(int index) {
getValue();
return value.notNil(index);
}

public boolean get(int index) {
getValue();
return value.get(index);
}
}

public static class ConcatValue extends Value {
private ArrayList<Value> values;
private int[] startIndices;

public ConcatValue(ArrayList<Value> values) {
this.values = values;
startIndices = new int[values.size()];
Arrays.fill(startIndices, -1);
startIndices[0] = 0;
}

private int getSuperIndex(int index) {
assert index >= 0;
assert startIndices.length == values.size();
outerLoop:
for (int i = 0; i < startIndices.length - 1; i++) {
if (startIndices[i+1] > index)
return i;
if (startIndices[i+1] < 0) {
Value value = values.get(i);
for (int j = 0; startIndices[i]+j <= index; j++)
if (!value.notNil(j)) {
startIndices[i+1] = startIndices[i]+j;
continue outerLoop;
}
return i;
}
}
return startIndices.length-1;
}

public boolean notNil(int index) {
int superIndex = getSuperIndex(index);
return values.get(superIndex).notNil(index - startIndices[superIndex]);
}

public boolean get(int index) {
int superIndex = getSuperIndex(index);
return values.get(superIndex).get(index - startIndices[superIndex]);
}
}

public static class LiteralValue extends Value {
private boolean[] bits;

public LiteralValue(boolean[] bits) {
this.bits = bits;
}

public boolean notNil(int index) {
assert index >= 0;
return index < bits.length;
}

public boolean get(int index) {
assert index >= 0;
return bits[index];
}
}

public static abstract class Expr {
public static final Expr NIL = new Expr() {
public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return Value.NIL;
}
};

public abstract Value eval(HashMap<String,Value> bindings, String stacktrace);
}

public static class FuncallExpr extends Expr {
private Function function;
private ArrayList<Expr> args;
private String location;

public FuncallExpr(Function function, ArrayList<Expr> args, String location) {
this.function = function;
this.args = args;
this.location = location;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
ArrayList<Value> argValues = null;
if (args != null) {
argValues = new ArrayList<Value>(args.size());
for (Expr arg : args)
argValues.add(arg.eval(bindings, stacktrace));
}
return new FuncallValue(function, argValues, location+":"+function.getName()+"\n"+stacktrace);
}
}

public static class ConcatExpr extends Expr {
private ArrayList<Expr> exprs;

public ConcatExpr(ArrayList<Expr> exprs) {
this.exprs = exprs;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
ArrayList<Value> values = new ArrayList<Value>(exprs.size());
for (Expr expr : exprs)
values.add(expr.eval(bindings, stacktrace));
return new ConcatValue(values);
}
}

public static class LiteralExpr extends Expr {
private LiteralValue value;

public LiteralExpr(ArrayList<Token> tokens, int start, int end) {
boolean[] bits = new boolean[end-start];
for (int i = 0; i+start < end; i++)
bits[i] = tokens.get(i+start).getTokenType() == TokenType.ONE;
value = new LiteralValue(bits);
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return value;
}
}

public static class BoundExpr extends Expr {
private String name;

public BoundExpr(String name) {
this.name = name;
}

public Value eval(HashMap<String,Value> bindings, String stacktrace) {
return bindings.get(name);
}
}

public static class EvalException extends RuntimeException {
public EvalException(String message) {
super(message);
}
}

public static class ParseException extends RuntimeException {
public ParseException(String message) {
super(message);
}
}

public static class Parser {
private Tokenizer tokenizer = new Tokenizer();

private Token currentName = null;
private ArrayList<Token> currentPattern = null;
private ArrayList<Pattern> currentPatterns = null;
private Token currentEq = null;
private ArrayList<Token> currentBody = null;

public Def nextDef() {
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
return null;
if (currentEq != null) {
switch (token.getTokenType()) {
case DOT:
Def def = new Def(currentName, currentPatterns, currentEq, currentBody);
currentName = null;
currentPattern = null;
currentPatterns = null;
currentEq = null;
currentBody = null;
return def;
case EQ:
throw new ParseException(token.getLocation()+": Error: unexpected =");
default:
if (currentBody == null)
currentBody = new ArrayList<Token>();
currentBody.add(token);
break;
}
} else if (token.getTokenType() == TokenType.EQ) {
if (currentPattern != null) {
if (currentPatterns == null)
currentPatterns = new ArrayList<Pattern>();
currentPatterns.add(new Pattern(currentPattern));
currentPattern = null;
}
currentEq = token;
} else if (currentName == null) {
if (token.getTokenType() != TokenType.SYMBOL)
throw new ParseException(token.getLocation()+": Error: expected symbol, got " + token.getName());
currentName = token;
} else {
if (currentPattern == null)
currentPattern = new ArrayList<Token>();
currentPattern.add(token);
switch (token.getTokenType()) {
case DOT: case NIL: case SYMBOL:
if (currentPatterns == null)
currentPatterns = new ArrayList<Pattern>();
Pattern nextPattern = new Pattern(currentPattern);
if (nextPattern.getName() != null)
for (Pattern pattern : currentPatterns)
if (nextPattern.getName().equals(pattern.getName()))
throw new ParseException(token.getLocation()+": Error: multiple bindings of "+nextPattern.getName());
currentPatterns.add(nextPattern);
currentPattern = null;
}
}
}
}

public void parseLine(String line, String location) {
assert tokenizer.nextToken() == null;
tokenizer.tokenizeLine(line, location);
}

public void reset() {
currentName = null;
currentPattern = null;
currentPatterns = null;
currentEq = null;
currentBody = null;
tokenizer.reset();
}

public boolean isDefPending() {
return currentName != null || currentEq != null;
}
}

public static enum TokenType {
ZERO, ONE, DOT, NIL, EQ, SYMBOL;
}

public static class Token {
private TokenType tokenType;
private String name;
private String location;

public Token(TokenType tokenType, String name, String location) {
this.tokenType = tokenType;
this.name = name;
this.location = location;
}

public TokenType getTokenType() {
return tokenType;
}

public String getName() {
switch (tokenType) {
case ZERO: return "0";
case ONE: return "1";
case DOT: return ".";
case NIL: return "_";
case EQ: return "=";
}
return name;
}

public String getLocation() {
return location;
}
}

public static class Tokenizer {
private String line = null;
private String location;
private int index = 0;

public Token nextToken() {
if (line == null)
return null;
while (index < line.length()) {
switch (line.charAt(index)) {
case ' ': case '\t': case '\r': case '\n':
index++;
break;
case '0':
index++;
return new Token(TokenType.ZERO, null, location);
case '1':
index++;
return new Token(TokenType.ONE, null, location);
case '.':
index++;
return new Token(TokenType.DOT, null, location);
case '_':
index++;
return new Token(TokenType.NIL, null, location);
case '=':
index++;
if (index < line.length() && line.charAt(index) == '=') {
line = null;
return null;
}
return new Token(TokenType.EQ, null, location);
default:
return symbolToken();
}
}
line = null;
return null;
}

private Token symbolToken() {
int start = index;
while (++index < line.length()) {
switch (line.charAt(index)) {
case ' ': case '\t': case '\r': case '\n':
case '0': case '1': case '.': case '_': case '=':
return new Token(TokenType.SYMBOL, line.substring(start, index), location);
}
}
return new Token(TokenType.SYMBOL, line.substring(start), location);
}

public void tokenizeLine(String line, String location) {
assert this.line == null;
this.line = line;
this.location = location;
index = 0;
}

public void reset() {
line = null;
location = null;
}
}

public static class ExprParser {
private ArrayList<Token> tokens;
private HashSet<String> bindings;
private Interpreter interpreter;
private int index = 0;

public ExprParser(ArrayList<Token> tokens, HashSet<String> bindings, Interpreter interpreter) {
this.tokens = tokens;
this.bindings = bindings;
this.interpreter = interpreter;
}

public Expr nextExpr() {
if (index >= tokens.size())
return null;
Token token = tokens.get(index);
switch (token.getTokenType()) {
case ZERO: case ONE:
return literalExpr();
case NIL:
index++;
return Expr.NIL;
case SYMBOL:
index++;
if (bindings.contains(token.getName()))
return new BoundExpr(token.getName());
return funcallExpr(token);
default:
assert false;
throw new RuntimeException();
}
}

public LiteralExpr literalExpr() {
int start = index;
int end = tokens.size();
loop:
while (++index < tokens.size()) {
switch (tokens.get(index).getTokenType()) {
case ZERO: case ONE:
break;
case NIL:
end = index;
index++;
break loop;
default:
end = index;
break loop;
}
}
return new LiteralExpr(tokens, start, end);
}

public FuncallExpr funcallExpr(Token token) {
Function function = interpreter.getFunction(token.getName());
if (function == null)
throw new ParseException(token.getLocation()+": Error: undefined symbol: "+token.getName());
if (function.getArity() == 0)
return new FuncallExpr(function, null, token.getLocation());
ArrayList<Expr> args = new ArrayList<Expr>(function.getArity());
for (int i = 0; i < function.getArity(); i++) {
Expr arg = nextExpr();
if (arg == null)
throw new ParseException(token.getLocation()+": Error: function requires "+function.getArity()+" argument(s): "+function.getName());
args.add(arg);
}
return new FuncallExpr(function, args, token.getLocation());
}
}
}

Here's my second interpreter, which doesn't use a huge amount of stack and, in principle, ought to allow garbage collection. This implementation was harder to get right than the first one, and, as in the first one, getting ConcatValue to work correctly was one of the biggest hurdles. It's super slow, so I have not tested it much.

import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Reader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;

/**
* Interpreter for the 01_ language.
* Usage:
* java i01_ [-1 INPUTFILE1] [-2 INPUTFILE2] ... SRCFILE ... FUNCTION
* Evaluates the specified function after reading the specified
* source files. The first argument, if used, is stdin. The second
* argument, if used, is the file INPUTFILE1. The 3rd argument, if
* used, is the file INPUTFILE2, etc.
*/
public class in01_ {
public static void main(String[] args) throws Exception {
HashMap<Integer,String> inputfiles = new HashMap<Integer,String>();
int index = 0;
while (index < args.length - 2) {
int inputfile;
try {
inputfile = Integer.parseInt(args[index]);
} catch (NumberFormatException e) {
break;
}
if (inputfile >= 0)
break;
inputfiles.put(-inputfile, args[index + 1]);
index += 2;
}
HashMap<String,Function> functions = new HashMap<String,Function>();
while (index < args.length - 1) {
readFile(functions, new FileReader(args[index]));
index++;
}
for (Function function : functions.values())
function.parseBodies(functions);
Function function = functions.get(args[args.length - 1]);
if (function == null)
throw new RuntimeException("Undefined symbol: " + args[args.length - 1]);
ArrayList<Value> funargs = new ArrayList<Value>();
if (function.getArity() > 0)
funargs.add(new StreamValue(System.in));
for (int i = 1; i < function.getArity(); i++) {
String inputfile = inputfiles.get(i);
if (inputfile == null)
funargs.add(Value.NIL);
else
funargs.add(new StreamValue(new FileInputStream(inputfile)));
}
writeValue(System.out, function.eval(funargs));
System.out.flush();
}

private static void readFile(HashMap<String,Function> functions, Reader reader) {
DefReader defReader = new DefReader(reader);
for (Def def = defReader.nextDef(); def != null; def = defReader.nextDef()) {
Function function = functions.get(def.getName());
if (function == null)
functions.put(def.getName(), new Function(def));
else
function.addDef(def);
}
}

public static void writeValue(OutputStream out, Value value) throws Exception {
int output = 0;
int bit = 7;
value = forceValue(value);
while (value.head() != HeadValue.NIL) {
if (value.head() == HeadValue.ONE)
output |= 1<<bit;
if (bit > 0) {
bit--;
} else {
out.write(output);
bit = 7;
output = 0;
}
value = forceValue(value.tail());
}
}

public static Value forceValue(Value value) {
while (value.head() == HeadValue.DELAYED)
value = value.force();
return value;
}

public static enum HeadValue {
ZERO, ONE, NIL, DELAYED;
}

public static abstract class Value {
public static final Value NIL = new Value() {
public HeadValue head() {
return HeadValue.NIL;
}

public Value tail() {
throw new RuntimeException();
}

public Value force() {
return this;
}
};

public abstract HeadValue head();
public abstract Value tail();
public abstract Value force();
}

public static class StreamValue extends Value {
private InputStream in;
private HeadValue head;
private StreamValue tail;
private int bit;
private int inputByte;

private StreamValue(InputStream in, HeadValue head, int bit, int inputByte) {
assert head != HeadValue.NIL;
assert head != HeadValue.DELAYED || bit == 7;
this.in = in;
this.head = head;
this.tail = null;
this.bit = bit;
this.inputByte = inputByte;
}

public StreamValue(InputStream in) {
this(in, HeadValue.DELAYED, 7, 0);
}

public HeadValue head() {
return head;
}

public Value tail() {
assert head != HeadValue.DELAYED && head != HeadValue.NIL;
if (tail != null)
return tail;
if (bit == 0)
tail = new StreamValue(in);
else
tail = new StreamValue(in, bitValue(bit - 1), bit - 1, inputByte);
return tail;
}

private HeadValue bitValue(int bit) {
return (inputByte & (1 << bit)) != 0 ? HeadValue.ONE : HeadValue.ZERO;
}

public Value force() {
if (head != HeadValue.DELAYED)
return this;
try {
inputByte = in.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
if (inputByte < 0)
head = HeadValue.NIL;
else
head = bitValue(bit);
return this;
}
}

public static class LiteralValue extends Value {
private HeadValue head;
private Value tail;

public LiteralValue(Token token, Value tail) {
assert token.getTokenType() == TokenType.ONE || token.getTokenType() == TokenType.ZERO;
head = token.getTokenType() == TokenType.ONE ? HeadValue.ONE : HeadValue.ZERO;
this.tail = tail;
}

public HeadValue head() {
return head;
}

public Value tail() {
return tail;
}

public Value force() {
return this;
}
}

public static class ConcatValue extends Value {
private Value headList;
private Value tailList;

private Value tail;

public ConcatValue(Value headList, Value tailList) {
assert headList.head() != HeadValue.NIL;
this.headList = headList;
this.tailList = tailList;
this.tail = null;
}

public HeadValue head() {
return headList.head();
}

public Value tail() {
assert headList.head() != HeadValue.DELAYED;
assert headList.head() != HeadValue.NIL;
if (tail == null) {
if (headList.tail().head() == HeadValue.NIL)
tail = tailList;
else
tail = new ConcatValue(headList.tail(), tailList);
}
return tail;
}

public Value force() {
headList = forceValue(headList);
if (headList.head() == HeadValue.NIL)
return tailList;
if (!(headList instanceof ConcatValue))
return this;
ConcatValue head = (ConcatValue) headList;
return new ConcatValue(head.headList, new ConcatValue(head.tailList, tailList));
}
}

public static class FuncallValue extends Value {
private Function function;
private ArrayList<Value> args;

public FuncallValue(Function function, ArrayList<Value> args) {
this.function = function;
this.args = args;
}

public HeadValue head() {
return HeadValue.DELAYED;
}

public Value tail() {
assert false;
throw new RuntimeException();
}

public Value force() {
return function.eval(args);
}
}

public static abstract class Expr {
public static final Expr NIL = new Expr() {
public Value eval(HashMap<String,Value> bindings) {
return Value.NIL;
}
};

public abstract Value eval(HashMap<String,Value> bindings);
}

public static class LiteralExpr extends Expr {
private Value value;

public LiteralExpr(ArrayList<Token> tokens, int start, int end) {
value = getValue(tokens, start, end);
}

public Value eval(HashMap<String,Value> bindings) {
return value;
}

private Value getValue(ArrayList<Token> tokens, int start, int end) {
if (start >= end)
return Value.NIL;
return new LiteralValue(tokens.get(start), getValue(tokens, start + 1, end));
}
}

public static class ConcatExpr extends Expr {
private Expr head;
private Expr tail;

public ConcatExpr(Expr head, Expr tail) {
this.head = head;
this.tail = tail;
}

public Value eval(HashMap<String,Value> bindings) {
return new ConcatValue(head.eval(bindings), tail.eval(bindings));
}
}

public static class BoundExpr extends Expr {
private String name;

public BoundExpr(String name) {
this.name = name;
}

public Value eval(HashMap<String,Value> bindings) {
return bindings.get(name);
}
}

public static class FuncallExpr extends Expr {
private Function function;
private ArrayList<Expr> args;

public FuncallExpr(Function function, ArrayList<Expr> args) {
this.function = function;
this.args = args;
}

public Value eval(HashMap<String,Value> bindings) {
ArrayList<Value> argValues = new ArrayList<Value>(args.size());
for (Expr arg : args)
argValues.add(arg.eval(bindings));
return new FuncallValue(function, argValues);
}
}

public static class Function {
private ArrayList<Def> defs = new ArrayList<Def>();

public Function(Def def) {
defs.add(def);
}

public int getArity() {
return defs.get(0).getArity();
}

public void addDef(Def def) {
assert def.getName().equals(defs.get(0).getName());
if (def.getArity() != defs.get(0).getArity())
throw new RuntimeException(def.getName() + " defined as " + def.getArity() + "-ary, originally " + defs.get(0).getArity() + "-ary");
defs.add(def);
}

public void parseBodies(HashMap<String,Function> functions) {
for (Def def : defs)
def.parseBody(functions);
}

public Value eval(ArrayList<Value> args) {
for (Def def : defs) {
Value value = def.eval(args);
if (value != null)
return value;
}
throw new RuntimeException("No matching pattern for " + defs.get(0).getName());
}
}

public static class Def {
private String name;
private ArrayList<Pattern> patterns;
private ArrayList<Token> bodyTokens;
private Expr body = null;

public Def(String name, ArrayList<Pattern> patterns, ArrayList<Token> bodyTokens) {
this.name = name;
this.patterns = patterns;
this.bodyTokens = bodyTokens;
}

public String getName() {
return name;
}

public int getArity() {
return patterns.size();
}

public void parseBody(HashMap<String,Function> functions) {
HashSet<String> bindings = new HashSet<String>();
for (Pattern pattern : patterns)
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.add(pattern.getBinding().getName());
body = new Parser(functions, bindings).getExpr();
if (body == null)
body = Expr.NIL;
}

private class Parser {
private int index = 0;
private HashMap<String,Function> functions;
private HashSet<String> bindings;

public Parser(HashMap<String,Function> functions, HashSet<String> bindings) {
this.functions = functions;
this.bindings = bindings;
}

public Expr getExpr() {
Expr expr = nextExpr();
if (index < bodyTokens.size())
return new ConcatExpr(expr, getExpr());
return expr;
}

public Expr nextExpr() {
if (index >= bodyTokens.size())
return null;
Token token = bodyTokens.get(index);
switch (token.getTokenType()) {
case ONE: case ZERO:
return literalExpr();
case NIL:
index++;
return Expr.NIL;
case SYMBOL:
index++;
if (bindings.contains(token.getName()))
return new BoundExpr(token.getName());
return funcallExpr(token);
default:
assert false;
throw new RuntimeException();
}
}

public LiteralExpr literalExpr() {
int start = index;
while (++index < bodyTokens.size()) {
switch (bodyTokens.get(index).getTokenType()) {
case ZERO: case ONE:
break;
case NIL:
return new LiteralExpr(bodyTokens, start, index++);
default:
return new LiteralExpr(bodyTokens, start, index);
}
}
return new LiteralExpr(bodyTokens, start, bodyTokens.size());
}

public FuncallExpr funcallExpr(Token token) {
Function function = functions.get(token.getName());
if (function == null)
throw new RuntimeException("Undefined symbol: " + token.getName() + " in body of " + name);
ArrayList<Expr> args = new ArrayList<Expr>(function.getArity());
for (int i = 0; i < function.getArity(); i++) {
Expr arg = nextExpr();
if (arg == null)
throw new RuntimeException("Not enough arguments for " + token.getName() + " in body of " + name + ": needs " + function.getArity());
args.add(arg);
}
return new FuncallExpr(function, args);
}
}

public Value eval(ArrayList<Value> args) {
HashMap<String,Value> bindings = new HashMap<String,Value>();
for (int i = 0; i < args.size(); i++) {
Pattern pattern = patterns.get(i);
Value match = pattern.tryMatch(args.get(i));
if (match == null)
return null;
if (pattern.getBinding().getTokenType() == TokenType.SYMBOL)
bindings.put(pattern.getBinding().getName(), match);
}
return body.eval(bindings);
}
}

public static class Pattern {
private Token binding;
private ArrayList<Boolean> bits;

public Pattern(Token binding, ArrayList<Boolean> bits) {
this.binding = binding;
this.bits = bits;
}

public Token getBinding() {
return binding;
}

public Value tryMatch(Value value) {
for (boolean bit : bits) {
value = forceValue(value);
switch (value.head()) {
case ZERO:
if (bit)
return null;
break;
case ONE:
if (!bit)
return null;
break;
case DELAYED:
assert false;
default:
return null;
}
value = value.tail();
}
if (binding.getTokenType() == TokenType.NIL) {
value = forceValue(value);
return value.head() == HeadValue.NIL ? value : null;
}
return value;
}
}

public static class DefReader {
private Tokenizer tokenizer;

public DefReader(Reader reader) {
this.tokenizer = new Tokenizer(reader);
}

public Def nextDef() {
Token name = tokenizer.nextToken();
if (name == null)
return null;
if (name.getTokenType() != TokenType.SYMBOL)
throw new RuntimeException("Expected symbol, got " + name.getName());
ArrayList<Pattern> patterns = getPatterns();
ArrayList<Token> body = getBody();
return new Def(name.getName(), patterns, body);
}

private ArrayList<Pattern> getPatterns() {
ArrayList<Pattern> patterns = new ArrayList<Pattern>();
ArrayList<Boolean> pattern = new ArrayList<Boolean>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Expected =, got EOF");
switch (token.getTokenType()) {
case ZERO:
pattern.add(false);
break;
case ONE:
pattern.add(true);
break;
case SYMBOL:
for (Pattern p : patterns)
if (token.getName().equals(p.getBinding().getName()))
throw new RuntimeException("Multiple patterns bind " + token.getName());
/*FALLTHROUGH*/
case DOT:
case NIL:
patterns.add(new Pattern(token, pattern));
pattern = new ArrayList<Boolean>();
break;
case EQ:
if (pattern.size() > 0)
patterns.add(new Pattern(Token.DOT, pattern));
return patterns;
}
}
}

private ArrayList<Token> getBody() {
ArrayList<Token> body = new ArrayList<Token>();
for (;;) {
Token token = tokenizer.nextToken();
if (token == null)
throw new RuntimeException("Got unexpected EOF");
switch (token.getTokenType()) {
case EQ:
throw new RuntimeException("Got unexpected =");
case DOT:
return body;
default:
body.add(token);
break;
}
}
}
}

public static enum TokenType {
ZERO, ONE, DOT, NIL, EQ, SYMBOL;
}

public static class Token {
public static Token ZERO = new Token(TokenType.ZERO, "0");
public static Token ONE = new Token(TokenType.ONE, "1");
public static Token DOT = new Token(TokenType.DOT, ".");
public static Token NIL = new Token(TokenType.NIL, "_");
public static Token EQ = new Token(TokenType.EQ, "=");

private TokenType tokenType;
private String name;

private Token(TokenType tokenType, String name) {
this.tokenType = tokenType;
this.name = name;
}

public Token(String name) {
this(TokenType.SYMBOL, name);
}

public TokenType getTokenType() {
return tokenType;
}

public String getName() {
return name;
}
}

public static class Tokenizer {
private Reader reader;
private int nextChar = -1;

public Tokenizer(Reader reader) {
this.reader = reader;
}

private int getNextChar() {
if (nextChar < 0)
try {
return reader.read();
} catch (Exception e) {
throw new RuntimeException(e);
}
int result = nextChar;
nextChar = -1;
return result;
}

private void pushback(int pushedChar) {
assert nextChar < 0;
nextChar = pushedChar;
}

public Token nextToken() {
for (;;) {
int ch = getNextChar();
if (ch < 0)
return null;
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
break;
case '0':
return Token.ZERO;
case '1':
return Token.ONE;
case '.':
return Token.DOT;
case '_':
return Token.NIL;
case '=':
ch = getNextChar();
if (ch != '=') {
pushback(ch);
return Token.EQ;
}
discardComment();
break;
default:
pushback(ch);
return readSymbol();
}
}
}

private void discardComment() {
for (;;) {
int ch = getNextChar();
if (ch < 0 || ch == '\n')
return;
}
}

private Token readSymbol() {
StringBuilder sb = new StringBuilder();
for (;;) {
int ch = getNextChar();
if (ch < 0)
return new Token(sb.toString());
switch (ch) {
case ' ': case '\t': case '\r': case '\n':
case '0': case '1': case '.': case '_': case '=':
pushback(ch);
return new Token(sb.toString());
default:
sb.append((char) ch);
break;
}
}
}
}
}

No comments:

Post a Comment