I’m really quite proud of this little reflection-based S-expression parser that I wrote for our GPCE Mint tutorial.
We wanted to have a parser so we don’t have to construct our ASTs using Java code. The problem was that we’ll probably have ten different mini-languages, and we didn’t want to write a parser for each language. Reflection was the answer, of course.
For this parser to work, you need to define the base class of the class hierarchy (e.g. Exp
) and the subclasses as static nested classes inside the same outer class (e.g. all inside UnstagedLint
).
You then simply specify the base class, and the parser will search for the subclasses. The names of the S-expressions are the simple names of the subclasses (e.g. Add and Int if there are Add
and Int
subclasses).
When encountering an S-expression, the parser will try all of the constructors of the corresponding class. As arguments for a constructor, the parser can process all primitive types, the boxed types, strings, recursive occurrences of the base class (e.g. Add
containing two Exp
instances), as well as other class hierarchies, which are parsed using subparsers.
These subparsers work the same way the main parser works. For example, if a Val
constructor needs a Value
argument, and there is a subparser registered for the Val
hierarchy, the main parser will delegate to that subparser. Here’s an example:
1 2 3 4 | DataTypes.Exp e = new LintParser<DataTypes.Exp>(DataTypes.Exp.class) .addSubParser(new LintParser<DataTypes.Value>(DataTypes.Value.class)) .parse("(Add (Val (IntValue 1)) (Val (IntValue 2)))"); |
This creates a parser for expressions of class DataTypes.Exp
, and it registers a subparser for expressions of class DataTypes.Value
. Then it parses the string (Add (Val (IntValue 1)) (Val (IntValue 2)))
. The (IntValue 1)
and (IntValue 2)
substrings are parsed using the subparser.
It’s really quite elegant. Especially the ease of integrating the subparsers surprised me. Here’s the parser in full.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 | package parser; import java.lang.reflect.*; import java.util.*; /** * A reflection-based S-expression parser returning an expression of type E. * Written by Mathias Ricken and Edwin Westbrook. */ public class LintParser<E> { /** Convenience method to parse string s into an expression of class c, * using the specified subparsers as help, if the parser encounters non-primitive, * non-string arguments to S-expressions not within the class hierarcy under class c. * @param c class of the expression * @param s string to parse * @param subParsers subparsers for non-primitive, no-string values * @return expression of class c */ public static <T> T parse(Class<T> c, String s, LintParser<?>... subParsers) { LintParser<T> p = new LintParser<T>(c); for(LintParser<?> sp: subParsers) { p.addSubParser(sp); } return p.parse(s); } /** Exception being thrown if the string cannot be parsed. */ public static class ParseException extends RuntimeException { } /** The class of the expression (e.g. lint1_arithmetic.UnstagedLint.Exp). */ private Class<E> _expClass; /** The class enclosing the expression classes (e.g. lint1_arithmetic.UnstagedLint). */ private Class<?> _enclosingClass; /** A map from S-expression names to classes (e.g. Int -> lint1_arithmetic.UnstagedLint.Int). */ private HashMap<String,Class<? extends E>> _classes = new HashMap<String,Class<? extends E>>(); /** A map from classes for which subparsers have been registered to the subparsers. */ private HashMap<Class<?>,LintParser<?>> _subParsers = new HashMap<Class<?>,LintParser<?>>(); /** Create a new parser for expressions of class expClass. * @param expClass the class of the expressions to be parsed */ public LintParser(Class<E> expClass) { _expClass = expClass; _enclosingClass = _expClass.getEnclosingClass(); // find all concrete, static subclasses of expClass contained in the same enclosing class for(Class<?> c: _enclosingClass.getDeclaredClasses()) { if (((c.getModifiers() & Modifier.ABSTRACT)==0) && ((c.getModifiers() & Modifier.STATIC)!=0)) { try { Class<? extends E> contained = c.asSubclass(_expClass); _classes.put(contained.getSimpleName(), (Class<? extends E>)contained); println(contained.getSimpleName()); } catch(ClassCastException cce) { /* skip */ } } } } /** Add a subparser for expressions of type S. Return this parser (not the subparser!) * for method chaining. * @param p subparser for expressions of type S * @return this parser (for expressions of type E) */ public <S> LintParser<E> addSubParser(LintParser<S> p) { _subParsers.put(p._expClass, p); return this; } /** Parse the string. * @param s the string to be parsed * @return the expression of type E */ public E parse(String s) { println("parse '"+s+"'"); RestoreStringTokenizer st = new RestoreStringTokenizer(s.trim()); return parse(st); } /** Parse the tokens in the tokenizer. * @param s the tokenizer with the tokens * @return the expression of type E */ protected E parse(RestoreStringTokenizer s) { // position to restore everything final int sPos = s.getPosition(); // read ( parseWhitespace(s); String word = s.nextToken(); println("word: '"+word+"'"); if (!word.equals("(")) { s.restorePosition(sPos); throw new ParseException(); } // read S-expression name parseWhitespace(s); word = s.nextToken(); println("word: '"+word+"'"); if (!_classes.containsKey(word)) { // don't know what that is, restore s.restorePosition(sPos); throw new ParseException(); } // known subclass of E Class<? extends E> c = _classes.get(word); println("Class: "+c.getSimpleName()); parseWhitespace(s); // position to restore after trying out a constructor final int sTryCtorPos = s.getPosition(); Constructor<? extends E> ctor; for(Constructor<?> ctor2: c.getDeclaredConstructors()) { println("\ttrying ctor "+ctor2); // this is necessary since c.getDeclaredConstructors() returns Constructor<?> // but we want Constructor<? extends E> try { ctor = c.getConstructor(ctor2.getParameterTypes()); } catch(NoSuchMethodException nsme) { throw new AssertionError("Should never happen."); } // try to parse the arguments for this constructor Object[] args = new Object[ctor.getParameterTypes().length]; int argIndex = 0; try { for(Class<?> paramC: ctor.getParameterTypes()) { // primitive types and their boxed types if ((paramC==int.class) || (paramC==Integer.class)) { args[argIndex] = parseLeaf(s,Integer.class); } else if ((paramC==boolean.class) || (paramC==Boolean.class)) { args[argIndex] = parseLeaf(s,Boolean.class); } else if ((paramC==long.class) || (paramC==Long.class)) { args[argIndex] = parseLeaf(s,Long.class); } else if ((paramC==double.class) || (paramC==Double.class)) { args[argIndex] = parseLeaf(s,Double.class); } else if ((paramC==float.class) || (paramC==Float.class)) { args[argIndex] = parseLeaf(s,Float.class); } else if ((paramC==byte.class) || (paramC==Byte.class)) { args[argIndex] = parseLeaf(s,Byte.class); } else if ((paramC==short.class) || (paramC==Short.class)) { args[argIndex] = parseLeaf(s,Short.class); } // char or Character else if ((paramC==char.class) || (paramC==Character.class)) { // parse as string Object temp = parseLeaf(s,String.class); // must be exactly one character if (temp.toString().length()!=1) throw new ParseException(); args[argIndex] = new Character(temp.toString().charAt(0)); } // strings else if (paramC==String.class) { args[argIndex] = parseLeaf(s,String.class); } // recursively parse expressions of type E else if (_expClass.equals(paramC)) { args[argIndex] = parse(s); } // try to use a subparser else if (_subParsers.containsKey(paramC)) { // try one of the subparsers args[argIndex] = _subParsers.get(paramC).parse(s); } else { // don't know what that is // restore happens below throw new ParseException(); } ++argIndex; } } catch(ParseException pe) { // this constructor didn't work out, we need to restore and try another constructor s.restorePosition(sTryCtorPos); // continue with next constructor continue; } try { // we read all values required for this constructor // make sure that the next token is ) parseWhitespace(s); word = s.nextToken(); println("word: '"+word+"'"); if (!word.equals(")")) { // it wasn't ), we need to restore and try another constructor s.restorePosition(sTryCtorPos); // continue with next constructor continue; } parseWhitespace(s); // successfully used this constructor println("new "+ctor.getDeclaringClass().getSimpleName()+" "+ java.util.Arrays.toString(args)); E e = (E)ctor.newInstance(args); return e; } catch(Exception e) { // something went wrong using this constructor, we need to restore and try another s.restorePosition(sTryCtorPos); // continue with next constructor continue; } } s.restorePosition(sPos); throw new ParseException(); } /** Parse a leaf of class c, which must have a unary constructor taking a String, from * the tokenizer s. * @param s tokenizer from which to parse * @param c the class, which has a unary constructor taking a String, for that we want to create a value * @return parsed value of class c */ protected Object parseLeaf(RestoreStringTokenizer s, Class<?> c) { // position so we can undo this attempt to parse final int sPos = s.getPosition(); String word = null; try { // get the unary constructor taking String Constructor ctor = c.getDeclaredConstructor(String.class); // skip whitespace, get the string, skip whitespace parseWhitespace(s); word = s.nextToken(); println("word: '"+word+"'"); Object o = ctor.newInstance(word); parseWhitespace(s); return o; } catch(NoSuchMethodException nsme) { // something went wrong, restore and abort s.restorePosition(sPos); throw new ParseException(); } catch(InstantiationException ie) { // something went wrong, restore and abort s.restorePosition(sPos); throw new ParseException(); } catch(IllegalAccessException iae) { // something went wrong, restore and abort s.restorePosition(sPos); throw new ParseException(); } catch(InvocationTargetException ite) { // something went wrong, restore and abort s.restorePosition(sPos); throw new ParseException(); } } /** Parse whitespace. Stop before the next non-whitespace token, or when there are no * more tokens. * @param s tokenizer from which to parse whitespace */ protected void parseWhitespace(RestoreStringTokenizer s) { String word; int sPrevPos; while(s.hasMoreTokens()) { sPrevPos = s.getPosition(); word = s.nextToken(); if (!word.trim().equals("")) { s.restorePosition(sPrevPos); break; } } } /** Debug method to print, comment System.out.println call out to disable debug printing */ public static void println(Object o) { // System.out.println(o.toString()); } /** This tokenizer stores all read tokens in a stack so they can easily be * restored. This isn't as memory as it could be, but it's good enough for * small examples. */ public static class RestoreStringTokenizer extends StringTokenizer { /** Stack of read tokens. */ private Stack<String> _read = new Stack<String>(); /** Stack of restored tokens. */ private Stack<String> _restored = new Stack<String>(); /** Create a new string tokenizer that can restore positions. */ public RestoreStringTokenizer(String s) { super(s," ()",true); } /** Return the current position. */ public int getPosition() { return _read.size(); } /** Rewind to a previous position. This only goes backward, not forward. */ public void restorePosition(int pos) { while(_read.size()>pos) { _restored.push(_read.pop()); } } // overridden methods from StringTokenizer public int countTokens() { return _restored.size() + super.countTokens(); } public boolean hasMoreTokens() { return (_restored.size()>0) || super.hasMoreTokens(); } public boolean hasMoreElements() { return hasMoreTokens(); } public String nextToken() { String token = (_restored.size()>0) ? _restored.pop() : super.nextToken(); _read.push(token); return token; } public Object nextElement() { return nextToken(); } public String nextToken(String delim) { if (_restored.size()>0) return _restored.pop(); return super.nextToken(delim); } } public static void main(String[] args) { lint2_cond.UnstagedLint.Exp e = parse (lint2_cond.UnstagedLint.Exp.class,"(Ifz (Int 1) (Int 10) (Mul(Int 3)(Add(Int 5)(Int 10))))"); System.out.println(new lint2_cond.UnstagedLint.Program(e).peval(lint2_cond.UnstagedLint.env0)); lint.DataTypes.Value v = parse(lint.DataTypes.Value.class, "(IntValue 1)"); System.out.println(v); lint.DataTypes.Exp e2 = new LintParser<lint.DataTypes.Exp>(lint.DataTypes.Exp.class) .addSubParser(new LintParser<lint.DataTypes.Value>(lint.DataTypes.Value.class)) .parse("(Add (Val (IntValue 1)) (Val (IntValue 2)))"); System.out.println(new lint.DataTypes.Program(e2).peval(lint.DataTypes.env0, lint.DataTypes.fenv0)); lint.DataTypes.Exp e3 = LintParser.parse(lint.DataTypes.Exp.class, "(Add (Val (IntValue 1)) (Val (IntValue 2)))", new LintParser<lint.DataTypes.Value>(lint.DataTypes.Value.class)); System.out.println(new lint.DataTypes.Program(e3).peval(lint.DataTypes.env0, lint.DataTypes.fenv0)); } } |
Here is the definition of a simple language for arithmetic expressions that uses the parser:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | package lint1_arithmetic; public class UnstagedLint { /* type exp = Int of int | Var of string | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp * exp type def = Definition of string * int type prog = Program of def list * exp */ public static interface Exp { public int eval(Env e); } public static class Int implements Exp { private int _value; public Int(int value) { _value = value; } public int eval(Env e) { return _value; } } public static class Var implements Exp { private String _s; public Var(String s) { _s = s; } public int eval(Env e) { return e.get(_s); } } public static abstract class BinOp implements Exp { protected Exp _left, _right; public BinOp(Exp left, Exp right) { _left = left; _right = right; } } public static class Add extends BinOp { public Add(Exp left, Exp right) { super(left, right); } public int eval(Env e) { return _left.eval(e) + _right.eval(e); } } public static class Sub extends BinOp { public Sub(Exp left, Exp right) { super(left, right); } public int eval(Env e) { return _left.eval(e) - _right.eval(e); } } public static class Mul extends BinOp { public Mul(Exp left, Exp right) { super(left, right); } public int eval(Env e) { return _left.eval(e) * _right.eval(e); } } public static class Div extends BinOp { public Div(Exp left, Exp right) { super(left, right); } public int eval(Env e) { return _left.eval(e) / _right.eval(e); } } /* exception EnvironmentException let env0 = fun x -> raise EnvironmentException let fenv0 = env0 let ext env x v = fun y -> if x=y then v else env y */ public static class EnvironmentException extends RuntimeException { } // environment; implemented as a function object from String to Exp public static interface Env { public int get(String y); } public static final Env env0 = new Env() { public int get(String s) { throw new EnvironmentException(); } }; public static Env ext(final Env env, final String x, final int v) { return new Env() { public int get(String y) { if (x.equals(y)) return v; else return env.get(y); } }; } public static class Definition { private String _name; private int _v; public Definition(String name, int v) { _name = name; _v = v; } public String name() { return _name; } public int value() { return _v; } } public static class Program { public Definition[] _defs; public Exp _body; public Program(Exp body, Definition... defs) { _defs = defs; _body = body; } public int peval(Env env) { return peval(env,0); } private int peval(final Env env, int defIndex) { // match p with if (_defs.length<=defIndex) { // Program ([],e) -> eval e env return _body.eval(env); } else { // |Program (Definition (n,v)::tl,e) -> peval (Program(tl,e)) (ext env n v) final Definition d = _defs[defIndex]; return peval(ext(env, d.name(), d.value()),defIndex+1); } } } /* let rec eval e env fenv = match e with Int i -> i | Var s -> env s | Add (e1,e2) -> (eval e1 env fenv)+(eval e2 env fenv) | Sub (e1,e2) -> (eval e1 env fenv)-(eval e2 env fenv) | Mul (e1,e2) -> (eval e1 env fenv)*(eval e2 env fenv) | Div (e1,e2) -> (eval e1 env fenv)/(eval e2 env fenv) let rec peval p env = match p with Program ([],e) -> eval e env |Program (Definition (s1,i1)::tl,e) -> peval (Program(tl,e)) (ext env s1 i1) */ public static Program term10times20plus30 = new Program (new Add(new Mul(new Var("x"), new Var("y")), new Int(30)), new Definition("x", 10), new Definition("y", 20)); public static Program parsedTerm10times20plus30 = new Program (parser.LintParser.parse(Exp.class, "(Add (Mul (Var x) (Var y)) (Int 30))"), new Definition("x", 10), new Definition("y", 20)); public static Program termWhatX = new Program (new Var("x")); public static Program parsedTermWhatX = new Program (parser.LintParser.parse(Exp.class, "(Var x)")); public static void main(String[] args) { int i = term10times20plus30.peval(env0); System.out.println("term10times20plus30 = "+i); i = parsedTerm10times20plus30.peval(env0); System.out.println("parsedTerm10times20plus30 = "+i); try { i = termWhatX.peval(env0); System.out.println("termWhatX = "+i); throw new AssertionError("This should throw an EnvironmentException"); } catch(EnvironmentException ee) { System.out.println("termWhatX threw "+ee); /* expected */ } try { i = parsedTermWhatX.peval(env0); System.out.println("parsedTermWhatX = "+i); throw new AssertionError("This should throw an EnvironmentException"); } catch(EnvironmentException ee) { System.out.println("parsedTermWhatX threw "+ee); /* expected */ } } } |
Pingback: BSD License for the Reflection-Based S-Expression Parser | A Concurrent Affair