Reflection-Based S-Expression Parser

I’m really quite proud of this little reflection-based S-expression parser that I wrote for our GPCE Mint tutorial.

We wanted to have a parser so we don’t have to construct our ASTs using Java code. The problem was that we’ll probably have ten different mini-languages, and we didn’t want to write a parser for each language. Reflection was the answer, of course.

For this parser to work, you need to define the base class of the class hierarchy (e.g. [cci lang=”java”]Exp[/cci]) and the subclasses as static nested classes inside the same outer class (e.g. all inside [cci lang=”java”]UnstagedLint[/cci]).

You then simply specify the base class, and the parser will search for the subclasses. The names of the S-expressions are the simple names of the subclasses (e.g. Add and Int if there are [cci lang=”java”]Add[/cci] and [cci lang=”java”]Int[/cci] subclasses).

When encountering an S-expression, the parser will try all of the constructors of the corresponding class. As arguments for a constructor, the parser can process all primitive types, the boxed types, strings, recursive occurrences of the base class (e.g. [cci lang=”java”]Add[/cci] containing two [cci lang=”java”]Exp[/cci] instances), as well as other class hierarchies, which are parsed using subparsers.

These subparsers work the same way the main parser works. For example, if a [cci lang=”java”]Val[/cci] constructor needs a [cci lang=”java”]Value[/cci] argument, and there is a subparser registered for the [cci lang=”java”]Val[/cci] hierarchy, the main parser will delegate to that subparser. Here’s an example:

[cc lang=”java”]DataTypes.Exp e =
new LintParser(DataTypes.Exp.class)
.addSubParser(new LintParser(DataTypes.Value.class))
.parse(“(Add (Val (IntValue 1)) (Val (IntValue 2)))”);[/cc]

This creates a parser for expressions of class [cci lang=”java”]DataTypes.Exp[/cci], and it registers a subparser for expressions of class [cci lang=”java”]DataTypes.Value[/cci]. Then it parses the string [cci](Add (Val (IntValue 1)) (Val (IntValue 2)))[/cci]. The [cci](IntValue 1)[/cci] and [cci](IntValue 2)[/cci] substrings are parsed using the subparser.

It’s really quite elegant. Especially the ease of integrating the subparsers surprised me. Here’s the parser in full.

[cc lang=”java” lines=”20″]package parser;

import java.lang.reflect.*;
import java.util.*;

/**
* A reflection-based S-expression parser returning an expression of type E.
* Written by Mathias Ricken and Edwin Westbrook.
*/
public class LintParser {
/** Convenience method to parse string s into an expression of class c,
* using the specified subparsers as help, if the parser encounters non-primitive,
* non-string arguments to S-expressions not within the class hierarcy under class c.
* @param c class of the expression
* @param s string to parse
* @param subParsers subparsers for non-primitive, no-string values
* @return expression of class c */
public static T parse(Class c, String s, LintParser… subParsers) {
LintParser p = new LintParser(c);
for(LintParser sp: subParsers) {
p.addSubParser(sp);
}
return p.parse(s);
}

/** Exception being thrown if the string cannot be parsed. */
public static class ParseException extends RuntimeException { }

/** The class of the expression (e.g. lint1_arithmetic.UnstagedLint.Exp). */
private Class _expClass;

/** The class enclosing the expression classes (e.g. lint1_arithmetic.UnstagedLint). */
private Class _enclosingClass;

/** A map from S-expression names to classes (e.g. Int -> lint1_arithmetic.UnstagedLint.Int). */
private HashMap> _classes =
new HashMap>();

/** A map from classes for which subparsers have been registered to the subparsers. */
private HashMap,LintParser> _subParsers =
new HashMap,LintParser>();

/** Create a new parser for expressions of class expClass.
* @param expClass the class of the expressions to be parsed */
public LintParser(Class expClass) {
_expClass = expClass;
_enclosingClass = _expClass.getEnclosingClass();
// find all concrete, static subclasses of expClass contained in the same enclosing class
for(Class c: _enclosingClass.getDeclaredClasses()) {
if (((c.getModifiers() & Modifier.ABSTRACT)==0) &&
((c.getModifiers() & Modifier.STATIC)!=0)) {
try {
Class contained = c.asSubclass(_expClass);
_classes.put(contained.getSimpleName(), (Class)contained);
println(contained.getSimpleName());
}
catch(ClassCastException cce) { /* skip */ }
}
}
}

/** Add a subparser for expressions of type S. Return this parser (not the subparser!)
* for method chaining.
* @param p subparser for expressions of type S
* @return this parser (for expressions of type E) */
public LintParser addSubParser(LintParser p) {
_subParsers.put(p._expClass, p);
return this;
}

/** Parse the string.
* @param s the string to be parsed
* @return the expression of type E */
public E parse(String s) {
println(“parse ‘”+s+”‘”);
RestoreStringTokenizer st = new RestoreStringTokenizer(s.trim());
return parse(st);
}

/** Parse the tokens in the tokenizer.
* @param s the tokenizer with the tokens
* @return the expression of type E */
protected E parse(RestoreStringTokenizer s) {
// position to restore everything
final int sPos = s.getPosition();

// read (
parseWhitespace(s);
String word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!word.equals(“(“)) {
s.restorePosition(sPos);
throw new ParseException();
}

// read S-expression name
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!_classes.containsKey(word)) {
// don’t know what that is, restore
s.restorePosition(sPos);
throw new ParseException();
}

// known subclass of E
Class c = _classes.get(word);
println(“Class: “+c.getSimpleName());
parseWhitespace(s);

// position to restore after trying out a constructor
final int sTryCtorPos = s.getPosition();

Constructor ctor;
for(Constructor ctor2: c.getDeclaredConstructors()) {
println(“\ttrying ctor “+ctor2);

// this is necessary since c.getDeclaredConstructors() returns Constructor
// but we want Constructor
try { ctor = c.getConstructor(ctor2.getParameterTypes()); }
catch(NoSuchMethodException nsme) { throw new AssertionError(“Should never happen.”); }

// try to parse the arguments for this constructor
Object[] args = new Object[ctor.getParameterTypes().length];
int argIndex = 0;
try {
for(Class paramC: ctor.getParameterTypes()) {
// primitive types and their boxed types
if ((paramC==int.class) || (paramC==Integer.class)) {
args[argIndex] = parseLeaf(s,Integer.class);
}
else if ((paramC==boolean.class) || (paramC==Boolean.class)) {
args[argIndex] = parseLeaf(s,Boolean.class);
}
else if ((paramC==long.class) || (paramC==Long.class)) {
args[argIndex] = parseLeaf(s,Long.class);
}
else if ((paramC==double.class) || (paramC==Double.class)) {
args[argIndex] = parseLeaf(s,Double.class);
}
else if ((paramC==float.class) || (paramC==Float.class)) {
args[argIndex] = parseLeaf(s,Float.class);
}
else if ((paramC==byte.class) || (paramC==Byte.class)) {
args[argIndex] = parseLeaf(s,Byte.class);
}
else if ((paramC==short.class) || (paramC==Short.class)) {
args[argIndex] = parseLeaf(s,Short.class);
}
// char or Character
else if ((paramC==char.class) || (paramC==Character.class)) {
// parse as string
Object temp = parseLeaf(s,String.class);
// must be exactly one character
if (temp.toString().length()!=1) throw new ParseException();
args[argIndex] = new Character(temp.toString().charAt(0));
}
// strings
else if (paramC==String.class) {
args[argIndex] = parseLeaf(s,String.class);
}
// recursively parse expressions of type E
else if (_expClass.equals(paramC)) {
args[argIndex] = parse(s);
}
// try to use a subparser
else if (_subParsers.containsKey(paramC)) {
// try one of the subparsers
args[argIndex] = _subParsers.get(paramC).parse(s);
}
else {
// don’t know what that is
// restore happens below
throw new ParseException();
}
++argIndex;
}
}
catch(ParseException pe) {
// this constructor didn’t work out, we need to restore and try another constructor
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
try {
// we read all values required for this constructor
// make sure that the next token is )
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!word.equals(“)”)) {
// it wasn’t ), we need to restore and try another constructor
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
parseWhitespace(s);

// successfully used this constructor
println(“new “+ctor.getDeclaringClass().getSimpleName()+” “+
java.util.Arrays.toString(args));
E e = (E)ctor.newInstance(args);
return e;
}
catch(Exception e) {
// something went wrong using this constructor, we need to restore and try another
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
}

s.restorePosition(sPos);
throw new ParseException();
}

/** Parse a leaf of class c, which must have a unary constructor taking a String, from
* the tokenizer s.
* @param s tokenizer from which to parse
* @param c the class, which has a unary constructor taking a String, for that we want to create a value
* @return parsed value of class c */
protected Object parseLeaf(RestoreStringTokenizer s, Class c) {
// position so we can undo this attempt to parse
final int sPos = s.getPosition();

String word = null;
try {
// get the unary constructor taking String
Constructor ctor = c.getDeclaredConstructor(String.class);

// skip whitespace, get the string, skip whitespace
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
Object o = ctor.newInstance(word);
parseWhitespace(s);

return o;
}
catch(NoSuchMethodException nsme) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(InstantiationException ie) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(IllegalAccessException iae) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(InvocationTargetException ite) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
}

/** Parse whitespace. Stop before the next non-whitespace token, or when there are no
* more tokens.
* @param s tokenizer from which to parse whitespace */
protected void parseWhitespace(RestoreStringTokenizer s) {
String word;
int sPrevPos;
while(s.hasMoreTokens()) {
sPrevPos = s.getPosition();
word = s.nextToken();
if (!word.trim().equals(“”)) {
s.restorePosition(sPrevPos);
break;
}
}
}

/** Debug method to print, comment System.out.println call out to disable debug printing */
public static void println(Object o) {
// System.out.println(o.toString());
}

/** This tokenizer stores all read tokens in a stack so they can easily be
* restored. This isn’t as memory as it could be, but it’s good enough for
* small examples. */
public static class RestoreStringTokenizer extends StringTokenizer {
/** Stack of read tokens. */
private Stack _read = new Stack();
/** Stack of restored tokens. */
private Stack _restored = new Stack();
/** Create a new string tokenizer that can restore positions. */
public RestoreStringTokenizer(String s) { super(s,” ()”,true); }
/** Return the current position. */
public int getPosition() { return _read.size(); }
/** Rewind to a previous position. This only goes backward, not forward. */
public void restorePosition(int pos) {
while(_read.size()>pos) {
_restored.push(_read.pop());
}
}
// overridden methods from StringTokenizer
public int countTokens() { return _restored.size() + super.countTokens(); }
public boolean hasMoreTokens() { return (_restored.size()>0) || super.hasMoreTokens(); }
public boolean hasMoreElements() { return hasMoreTokens(); }
public String nextToken() {
String token = (_restored.size()>0) ? _restored.pop() : super.nextToken();
_read.push(token);
return token;
}
public Object nextElement() { return nextToken(); }
public String nextToken(String delim) {
if (_restored.size()>0) return _restored.pop();
return super.nextToken(delim);
}
}

public static void main(String[] args) {
lint2_cond.UnstagedLint.Exp e = parse
(lint2_cond.UnstagedLint.Exp.class,”(Ifz (Int 1) (Int 10) (Mul(Int 3)(Add(Int 5)(Int 10))))”);
System.out.println(new lint2_cond.UnstagedLint.Program(e).peval(lint2_cond.UnstagedLint.env0));

lint.DataTypes.Value v = parse(lint.DataTypes.Value.class, “(IntValue 1)”);
System.out.println(v);

lint.DataTypes.Exp e2 =
new LintParser(lint.DataTypes.Exp.class)
.addSubParser(new LintParser(lint.DataTypes.Value.class))
.parse(“(Add (Val (IntValue 1)) (Val (IntValue 2)))”);
System.out.println(new lint.DataTypes.Program(e2).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));

lint.DataTypes.Exp e3 =
LintParser.parse(lint.DataTypes.Exp.class, “(Add (Val (IntValue 1)) (Val (IntValue 2)))”,
new LintParser(lint.DataTypes.Value.class));
System.out.println(new lint.DataTypes.Program(e3).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));

}
}[/cc]

Here is the definition of a simple language for arithmetic expressions that uses the parser:

[cc lang=”java” lines=”20″]package lint1_arithmetic;

public class UnstagedLint {

/*
type exp = Int of int
| Var of string
| Add of exp * exp
| Sub of exp * exp
| Mul of exp * exp
| Div of exp * exp
type def = Definition of string * int
type prog = Program of def list * exp
*/

public static interface Exp {
public int eval(Env e);
}

public static class Int implements Exp {
private int _value;
public Int(int value) {
_value = value;
}
public int eval(Env e) {
return _value;
}
}

public static class Var implements Exp {
private String _s;
public Var(String s) {
_s = s;
}
public int eval(Env e) {
return e.get(_s);
}
}

public static abstract class BinOp implements Exp {
protected Exp _left, _right;
public BinOp(Exp left, Exp right) {
_left = left;
_right = right;
}
}

public static class Add extends BinOp {
public Add(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) + _right.eval(e);
}
}

public static class Sub extends BinOp {
public Sub(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) – _right.eval(e);
}
}

public static class Mul extends BinOp {
public Mul(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) * _right.eval(e);
}
}

public static class Div extends BinOp {
public Div(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) / _right.eval(e);
}
}

/*
exception EnvironmentException
let env0 = fun x -> raise EnvironmentException
let fenv0 = env0
let ext env x v = fun y -> if x=y then v else env y
*/

public static class EnvironmentException extends RuntimeException { }

// environment; implemented as a function object from String to Exp
public static interface Env {
public int get(String y);
}
public static final Env env0 = new Env() {
public int get(String s) { throw new EnvironmentException(); }
};

public static Env ext(final Env env, final String x, final int v) {
return new Env() {
public int get(String y) {
if (x.equals(y)) return v; else return env.get(y);
}
};
}

public static class Definition {
private String _name;
private int _v;
public Definition(String name, int v) {
_name = name;
_v = v;
}
public String name() { return _name; }
public int value() { return _v; }
}

public static class Program {
public Definition[] _defs;
public Exp _body;
public Program(Exp body, Definition… defs) {
_defs = defs;
_body = body;
}

public int peval(Env env) {
return peval(env,0);
}

private int peval(final Env env, int defIndex) {
// match p with
if (_defs.length<=defIndex) { // Program ([],e) -> eval e env
return _body.eval(env);
}
else {
// |Program (Definition (n,v)::tl,e) -> peval (Program(tl,e)) (ext env n v)
final Definition d = _defs[defIndex];
return peval(ext(env, d.name(), d.value()),defIndex+1);
}
}
}

/*
let rec eval e env fenv =
match e with
Int i -> i
| Var s -> env s
| Add (e1,e2) -> (eval e1 env fenv)+(eval e2 env fenv)
| Sub (e1,e2) -> (eval e1 env fenv)-(eval e2 env fenv)
| Mul (e1,e2) -> (eval e1 env fenv)*(eval e2 env fenv)
| Div (e1,e2) -> (eval e1 env fenv)/(eval e2 env fenv)

let rec peval p env =
match p with
Program ([],e) -> eval e env
|Program (Definition (s1,i1)::tl,e) -> peval (Program(tl,e)) (ext env s1 i1)

*/

public static Program term10times20plus30 = new Program
(new Add(new Mul(new Var(“x”), new Var(“y”)), new Int(30)),
new Definition(“x”, 10),
new Definition(“y”, 20));

public static Program parsedTerm10times20plus30 = new Program
(parser.LintParser.parse(Exp.class, “(Add (Mul (Var x) (Var y)) (Int 30))”),
new Definition(“x”, 10),
new Definition(“y”, 20));

public static Program termWhatX = new Program
(new Var(“x”));

public static Program parsedTermWhatX = new Program
(parser.LintParser.parse(Exp.class, “(Var x)”));

public static void main(String[] args) {
int i = term10times20plus30.peval(env0);
System.out.println(“term10times20plus30 = “+i);

i = parsedTerm10times20plus30.peval(env0);
System.out.println(“parsedTerm10times20plus30 = “+i);

try {
i = termWhatX.peval(env0);
System.out.println(“termWhatX = “+i);
throw new AssertionError(“This should throw an EnvironmentException”);
}
catch(EnvironmentException ee) { System.out.println(“termWhatX threw “+ee); /* expected */ }

try {
i = parsedTermWhatX.peval(env0);
System.out.println(“parsedTermWhatX = “+i);
throw new AssertionError(“This should throw an EnvironmentException”);
}
catch(EnvironmentException ee) { System.out.println(“parsedTermWhatX threw “+ee); /* expected */ }
}
}[/cc]

Share

About Mathias

Software development engineer. Principal developer of DrJava. Recent Ph.D. graduate from the Department of Computer Science at Rice University.
This entry was posted in Mint. Bookmark the permalink.

One Response to Reflection-Based S-Expression Parser

  1. Pingback: BSD License for the Reflection-Based S-Expression Parser | A Concurrent Affair

Leave a Reply