I’m really quite proud of this little reflection-based S-expression parser that I wrote for our GPCE Mint tutorial.
We wanted to have a parser so we don’t have to construct our ASTs using Java code. The problem was that we’ll probably have ten different mini-languages, and we didn’t want to write a parser for each language. Reflection was the answer, of course.
For this parser to work, you need to define the base class of the class hierarchy (e.g. [cci lang=”java”]Exp[/cci]) and the subclasses as static nested classes inside the same outer class (e.g. all inside [cci lang=”java”]UnstagedLint[/cci]).
You then simply specify the base class, and the parser will search for the subclasses. The names of the S-expressions are the simple names of the subclasses (e.g. Add and Int if there are [cci lang=”java”]Add[/cci] and [cci lang=”java”]Int[/cci] subclasses).
When encountering an S-expression, the parser will try all of the constructors of the corresponding class. As arguments for a constructor, the parser can process all primitive types, the boxed types, strings, recursive occurrences of the base class (e.g. [cci lang=”java”]Add[/cci] containing two [cci lang=”java”]Exp[/cci] instances), as well as other class hierarchies, which are parsed using subparsers.
These subparsers work the same way the main parser works. For example, if a [cci lang=”java”]Val[/cci] constructor needs a [cci lang=”java”]Value[/cci] argument, and there is a subparser registered for the [cci lang=”java”]Val[/cci] hierarchy, the main parser will delegate to that subparser. Here’s an example:
[cc lang=”java”]DataTypes.Exp e =
new LintParser
.addSubParser(new LintParser
.parse(“(Add (Val (IntValue 1)) (Val (IntValue 2)))”);[/cc]
This creates a parser for expressions of class [cci lang=”java”]DataTypes.Exp[/cci], and it registers a subparser for expressions of class [cci lang=”java”]DataTypes.Value[/cci]. Then it parses the string [cci](Add (Val (IntValue 1)) (Val (IntValue 2)))[/cci]. The [cci](IntValue 1)[/cci] and [cci](IntValue 2)[/cci] substrings are parsed using the subparser.
It’s really quite elegant. Especially the ease of integrating the subparsers surprised me. Here’s the parser in full.
[cc lang=”java” lines=”20″]package parser;
import java.lang.reflect.*;
import java.util.*;
/**
* A reflection-based S-expression parser returning an expression of type E.
* Written by Mathias Ricken and Edwin Westbrook.
*/
public class LintParser
/** Convenience method to parse string s into an expression of class c,
* using the specified subparsers as help, if the parser encounters non-primitive,
* non-string arguments to S-expressions not within the class hierarcy under class c.
* @param c class of the expression
* @param s string to parse
* @param subParsers subparsers for non-primitive, no-string values
* @return expression of class c */
public static
LintParser
for(LintParser> sp: subParsers) {
p.addSubParser(sp);
}
return p.parse(s);
}
/** Exception being thrown if the string cannot be parsed. */
public static class ParseException extends RuntimeException { }
/** The class of the expression (e.g. lint1_arithmetic.UnstagedLint.Exp). */
private Class
/** The class enclosing the expression classes (e.g. lint1_arithmetic.UnstagedLint). */
private Class> _enclosingClass;
/** A map from S-expression names to classes (e.g. Int -> lint1_arithmetic.UnstagedLint.Int). */
private HashMap
new HashMap
/** A map from classes for which subparsers have been registered to the subparsers. */
private HashMap
new HashMap
/** Create a new parser for expressions of class expClass.
* @param expClass the class of the expressions to be parsed */
public LintParser(Class
_expClass = expClass;
_enclosingClass = _expClass.getEnclosingClass();
// find all concrete, static subclasses of expClass contained in the same enclosing class
for(Class> c: _enclosingClass.getDeclaredClasses()) {
if (((c.getModifiers() & Modifier.ABSTRACT)==0) &&
((c.getModifiers() & Modifier.STATIC)!=0)) {
try {
Class extends E> contained = c.asSubclass(_expClass);
_classes.put(contained.getSimpleName(), (Class extends E>)contained);
println(contained.getSimpleName());
}
catch(ClassCastException cce) { /* skip */ }
}
}
}
/** Add a subparser for expressions of type S. Return this parser (not the subparser!)
* for method chaining.
* @param p subparser for expressions of type S
* @return this parser (for expressions of type E) */
public LintParser p) {
_subParsers.put(p._expClass, p);
return this;
}
/** Parse the string.
* @param s the string to be parsed
* @return the expression of type E */
public E parse(String s) {
println(“parse ‘”+s+”‘”);
RestoreStringTokenizer st = new RestoreStringTokenizer(s.trim());
return parse(st);
}
/** Parse the tokens in the tokenizer.
* @param s the tokenizer with the tokens
* @return the expression of type E */
protected E parse(RestoreStringTokenizer s) {
// position to restore everything
final int sPos = s.getPosition();
// read (
parseWhitespace(s);
String word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!word.equals(“(“)) {
s.restorePosition(sPos);
throw new ParseException();
}
// read S-expression name
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!_classes.containsKey(word)) {
// don’t know what that is, restore
s.restorePosition(sPos);
throw new ParseException();
}
// known subclass of E
Class extends E> c = _classes.get(word);
println(“Class: “+c.getSimpleName());
parseWhitespace(s);
// position to restore after trying out a constructor
final int sTryCtorPos = s.getPosition();
Constructor extends E> ctor;
for(Constructor> ctor2: c.getDeclaredConstructors()) {
println(“\ttrying ctor “+ctor2);
// this is necessary since c.getDeclaredConstructors() returns Constructor>
// but we want Constructor extends E>
try { ctor = c.getConstructor(ctor2.getParameterTypes()); }
catch(NoSuchMethodException nsme) { throw new AssertionError(“Should never happen.”); }
// try to parse the arguments for this constructor
Object[] args = new Object[ctor.getParameterTypes().length];
int argIndex = 0;
try {
for(Class> paramC: ctor.getParameterTypes()) {
// primitive types and their boxed types
if ((paramC==int.class) || (paramC==Integer.class)) {
args[argIndex] = parseLeaf(s,Integer.class);
}
else if ((paramC==boolean.class) || (paramC==Boolean.class)) {
args[argIndex] = parseLeaf(s,Boolean.class);
}
else if ((paramC==long.class) || (paramC==Long.class)) {
args[argIndex] = parseLeaf(s,Long.class);
}
else if ((paramC==double.class) || (paramC==Double.class)) {
args[argIndex] = parseLeaf(s,Double.class);
}
else if ((paramC==float.class) || (paramC==Float.class)) {
args[argIndex] = parseLeaf(s,Float.class);
}
else if ((paramC==byte.class) || (paramC==Byte.class)) {
args[argIndex] = parseLeaf(s,Byte.class);
}
else if ((paramC==short.class) || (paramC==Short.class)) {
args[argIndex] = parseLeaf(s,Short.class);
}
// char or Character
else if ((paramC==char.class) || (paramC==Character.class)) {
// parse as string
Object temp = parseLeaf(s,String.class);
// must be exactly one character
if (temp.toString().length()!=1) throw new ParseException();
args[argIndex] = new Character(temp.toString().charAt(0));
}
// strings
else if (paramC==String.class) {
args[argIndex] = parseLeaf(s,String.class);
}
// recursively parse expressions of type E
else if (_expClass.equals(paramC)) {
args[argIndex] = parse(s);
}
// try to use a subparser
else if (_subParsers.containsKey(paramC)) {
// try one of the subparsers
args[argIndex] = _subParsers.get(paramC).parse(s);
}
else {
// don’t know what that is
// restore happens below
throw new ParseException();
}
++argIndex;
}
}
catch(ParseException pe) {
// this constructor didn’t work out, we need to restore and try another constructor
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
try {
// we read all values required for this constructor
// make sure that the next token is )
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
if (!word.equals(“)”)) {
// it wasn’t ), we need to restore and try another constructor
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
parseWhitespace(s);
// successfully used this constructor
println(“new “+ctor.getDeclaringClass().getSimpleName()+” “+
java.util.Arrays.toString(args));
E e = (E)ctor.newInstance(args);
return e;
}
catch(Exception e) {
// something went wrong using this constructor, we need to restore and try another
s.restorePosition(sTryCtorPos);
// continue with next constructor
continue;
}
}
s.restorePosition(sPos);
throw new ParseException();
}
/** Parse a leaf of class c, which must have a unary constructor taking a String, from
* the tokenizer s.
* @param s tokenizer from which to parse
* @param c the class, which has a unary constructor taking a String, for that we want to create a value
* @return parsed value of class c */
protected Object parseLeaf(RestoreStringTokenizer s, Class> c) {
// position so we can undo this attempt to parse
final int sPos = s.getPosition();
String word = null;
try {
// get the unary constructor taking String
Constructor ctor = c.getDeclaredConstructor(String.class);
// skip whitespace, get the string, skip whitespace
parseWhitespace(s);
word = s.nextToken();
println(“word: ‘”+word+”‘”);
Object o = ctor.newInstance(word);
parseWhitespace(s);
return o;
}
catch(NoSuchMethodException nsme) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(InstantiationException ie) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(IllegalAccessException iae) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
catch(InvocationTargetException ite) {
// something went wrong, restore and abort
s.restorePosition(sPos);
throw new ParseException();
}
}
/** Parse whitespace. Stop before the next non-whitespace token, or when there are no
* more tokens.
* @param s tokenizer from which to parse whitespace */
protected void parseWhitespace(RestoreStringTokenizer s) {
String word;
int sPrevPos;
while(s.hasMoreTokens()) {
sPrevPos = s.getPosition();
word = s.nextToken();
if (!word.trim().equals(“”)) {
s.restorePosition(sPrevPos);
break;
}
}
}
/** Debug method to print, comment System.out.println call out to disable debug printing */
public static void println(Object o) {
// System.out.println(o.toString());
}
/** This tokenizer stores all read tokens in a stack so they can easily be
* restored. This isn’t as memory as it could be, but it’s good enough for
* small examples. */
public static class RestoreStringTokenizer extends StringTokenizer {
/** Stack of read tokens. */
private Stack
/** Stack of restored tokens. */
private Stack
/** Create a new string tokenizer that can restore positions. */
public RestoreStringTokenizer(String s) { super(s,” ()”,true); }
/** Return the current position. */
public int getPosition() { return _read.size(); }
/** Rewind to a previous position. This only goes backward, not forward. */
public void restorePosition(int pos) {
while(_read.size()>pos) {
_restored.push(_read.pop());
}
}
// overridden methods from StringTokenizer
public int countTokens() { return _restored.size() + super.countTokens(); }
public boolean hasMoreTokens() { return (_restored.size()>0) || super.hasMoreTokens(); }
public boolean hasMoreElements() { return hasMoreTokens(); }
public String nextToken() {
String token = (_restored.size()>0) ? _restored.pop() : super.nextToken();
_read.push(token);
return token;
}
public Object nextElement() { return nextToken(); }
public String nextToken(String delim) {
if (_restored.size()>0) return _restored.pop();
return super.nextToken(delim);
}
}
public static void main(String[] args) {
lint2_cond.UnstagedLint.Exp e = parse
(lint2_cond.UnstagedLint.Exp.class,”(Ifz (Int 1) (Int 10) (Mul(Int 3)(Add(Int 5)(Int 10))))”);
System.out.println(new lint2_cond.UnstagedLint.Program(e).peval(lint2_cond.UnstagedLint.env0));
lint.DataTypes.Value v = parse(lint.DataTypes.Value.class, “(IntValue 1)”);
System.out.println(v);
lint.DataTypes.Exp e2 =
new LintParser
.addSubParser(new LintParser
.parse(“(Add (Val (IntValue 1)) (Val (IntValue 2)))”);
System.out.println(new lint.DataTypes.Program(e2).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));
lint.DataTypes.Exp e3 =
LintParser.parse(lint.DataTypes.Exp.class, “(Add (Val (IntValue 1)) (Val (IntValue 2)))”,
new LintParser
System.out.println(new lint.DataTypes.Program(e3).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));
}
}[/cc]
Here is the definition of a simple language for arithmetic expressions that uses the parser:
[cc lang=”java” lines=”20″]package lint1_arithmetic;
public class UnstagedLint {
/*
type exp = Int of int
| Var of string
| Add of exp * exp
| Sub of exp * exp
| Mul of exp * exp
| Div of exp * exp
type def = Definition of string * int
type prog = Program of def list * exp
*/
public static interface Exp {
public int eval(Env e);
}
public static class Int implements Exp {
private int _value;
public Int(int value) {
_value = value;
}
public int eval(Env e) {
return _value;
}
}
public static class Var implements Exp {
private String _s;
public Var(String s) {
_s = s;
}
public int eval(Env e) {
return e.get(_s);
}
}
public static abstract class BinOp implements Exp {
protected Exp _left, _right;
public BinOp(Exp left, Exp right) {
_left = left;
_right = right;
}
}
public static class Add extends BinOp {
public Add(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) + _right.eval(e);
}
}
public static class Sub extends BinOp {
public Sub(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) – _right.eval(e);
}
}
public static class Mul extends BinOp {
public Mul(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) * _right.eval(e);
}
}
public static class Div extends BinOp {
public Div(Exp left, Exp right) {
super(left, right);
}
public int eval(Env e) {
return _left.eval(e) / _right.eval(e);
}
}
/*
exception EnvironmentException
let env0 = fun x -> raise EnvironmentException
let fenv0 = env0
let ext env x v = fun y -> if x=y then v else env y
*/
public static class EnvironmentException extends RuntimeException { }
// environment; implemented as a function object from String to Exp
public static interface Env {
public int get(String y);
}
public static final Env env0 = new Env() {
public int get(String s) { throw new EnvironmentException(); }
};
public static Env ext(final Env env, final String x, final int v) {
return new Env() {
public int get(String y) {
if (x.equals(y)) return v; else return env.get(y);
}
};
}
public static class Definition {
private String _name;
private int _v;
public Definition(String name, int v) {
_name = name;
_v = v;
}
public String name() { return _name; }
public int value() { return _v; }
}
public static class Program {
public Definition[] _defs;
public Exp _body;
public Program(Exp body, Definition… defs) {
_defs = defs;
_body = body;
}
public int peval(Env env) {
return peval(env,0);
}
private int peval(final Env env, int defIndex) {
// match p with
if (_defs.length<=defIndex) {
// Program ([],e) -> eval e env
return _body.eval(env);
}
else {
// |Program (Definition (n,v)::tl,e) -> peval (Program(tl,e)) (ext env n v)
final Definition d = _defs[defIndex];
return peval(ext(env, d.name(), d.value()),defIndex+1);
}
}
}
/*
let rec eval e env fenv =
match e with
Int i -> i
| Var s -> env s
| Add (e1,e2) -> (eval e1 env fenv)+(eval e2 env fenv)
| Sub (e1,e2) -> (eval e1 env fenv)-(eval e2 env fenv)
| Mul (e1,e2) -> (eval e1 env fenv)*(eval e2 env fenv)
| Div (e1,e2) -> (eval e1 env fenv)/(eval e2 env fenv)
let rec peval p env =
match p with
Program ([],e) -> eval e env
|Program (Definition (s1,i1)::tl,e) -> peval (Program(tl,e)) (ext env s1 i1)
*/
public static Program term10times20plus30 = new Program
(new Add(new Mul(new Var(“x”), new Var(“y”)), new Int(30)),
new Definition(“x”, 10),
new Definition(“y”, 20));
public static Program parsedTerm10times20plus30 = new Program
(parser.LintParser.parse(Exp.class, “(Add (Mul (Var x) (Var y)) (Int 30))”),
new Definition(“x”, 10),
new Definition(“y”, 20));
public static Program termWhatX = new Program
(new Var(“x”));
public static Program parsedTermWhatX = new Program
(parser.LintParser.parse(Exp.class, “(Var x)”));
public static void main(String[] args) {
int i = term10times20plus30.peval(env0);
System.out.println(“term10times20plus30 = “+i);
i = parsedTerm10times20plus30.peval(env0);
System.out.println(“parsedTerm10times20plus30 = “+i);
try {
i = termWhatX.peval(env0);
System.out.println(“termWhatX = “+i);
throw new AssertionError(“This should throw an EnvironmentException”);
}
catch(EnvironmentException ee) { System.out.println(“termWhatX threw “+ee); /* expected */ }
try {
i = parsedTermWhatX.peval(env0);
System.out.println(“parsedTermWhatX = “+i);
throw new AssertionError(“This should throw an EnvironmentException”);
}
catch(EnvironmentException ee) { System.out.println(“parsedTermWhatX threw “+ee); /* expected */ }
}
}[/cc]
Pingback: BSD License for the Reflection-Based S-Expression Parser | A Concurrent Affair