Reflection-Based S-Expression Parser

I’m really quite proud of this little reflection-based S-expression parser that I wrote for our GPCE Mint tutorial.

We wanted to have a parser so we don’t have to construct our ASTs using Java code. The problem was that we’ll probably have ten different mini-languages, and we didn’t want to write a parser for each language. Reflection was the answer, of course.

For this parser to work, you need to define the base class of the class hierarchy (e.g. Exp) and the subclasses as static nested classes inside the same outer class (e.g. all inside UnstagedLint).

You then simply specify the base class, and the parser will search for the subclasses. The names of the S-expressions are the simple names of the subclasses (e.g. Add and Int if there are Add and Int subclasses).

When encountering an S-expression, the parser will try all of the constructors of the corresponding class. As arguments for a constructor, the parser can process all primitive types, the boxed types, strings, recursive occurrences of the base class (e.g. Add containing two Exp instances), as well as other class hierarchies, which are parsed using subparsers.

These subparsers work the same way the main parser works. For example, if a Val constructor needs a Value argument, and there is a subparser registered for the Val hierarchy, the main parser will delegate to that subparser. Here’s an example:

1
2
3
4
DataTypes.Exp e =
    new LintParser<DataTypes.Exp>(DataTypes.Exp.class)
    .addSubParser(new LintParser<DataTypes.Value>(DataTypes.Value.class))
    .parse("(Add (Val (IntValue 1)) (Val (IntValue 2)))");

This creates a parser for expressions of class DataTypes.Exp, and it registers a subparser for expressions of class DataTypes.Value. Then it parses the string (Add (Val (IntValue 1)) (Val (IntValue 2))). The (IntValue 1) and (IntValue 2) substrings are parsed using the subparser.

It’s really quite elegant. Especially the ease of integrating the subparsers surprised me. Here’s the parser in full.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
package parser;

import java.lang.reflect.*;
import java.util.*;

/**
 * A reflection-based S-expression parser returning an expression of type E.
 * Written by Mathias Ricken and Edwin Westbrook.
 */

public class LintParser<E> {
    /** Convenience method to parse string s into an expression of class c,
      * using the specified subparsers as help, if the parser encounters non-primitive,
      * non-string arguments to S-expressions not within the class hierarcy under class c.
      * @param c class of the expression
      * @param s string to parse
      * @param subParsers subparsers for non-primitive, no-string values
      * @return expression of class c */

    public static <T> T parse(Class<T> c, String s, LintParser<?>... subParsers) {
        LintParser<T> p = new LintParser<T>(c);
        for(LintParser<?> sp: subParsers) {
            p.addSubParser(sp);
        }
        return p.parse(s);
    }
   
    /** Exception being thrown if the string cannot be parsed. */
    public static class ParseException extends RuntimeException { }
   
    /** The class of the expression (e.g. lint1_arithmetic.UnstagedLint.Exp). */
    private Class<E> _expClass;
   
    /** The class enclosing the expression classes (e.g. lint1_arithmetic.UnstagedLint). */
    private Class<?> _enclosingClass;
   
    /** A map from S-expression names to classes (e.g. Int -> lint1_arithmetic.UnstagedLint.Int). */
    private HashMap<String,Class<? extends E>> _classes =
        new HashMap<String,Class<? extends E>>();
   
    /** A map from classes for which subparsers have been registered to the subparsers. */
    private HashMap<Class<?>,LintParser<?>> _subParsers =
        new HashMap<Class<?>,LintParser<?>>();
   
    /** Create a new parser for expressions of class expClass.
      * @param expClass the class of the expressions to be parsed */

    public LintParser(Class<E> expClass) {
        _expClass = expClass;
        _enclosingClass = _expClass.getEnclosingClass();
        // find all concrete, static subclasses of expClass contained in the same enclosing class
        for(Class<?> c: _enclosingClass.getDeclaredClasses()) {
            if (((c.getModifiers() & Modifier.ABSTRACT)==0) &&
                ((c.getModifiers() & Modifier.STATIC)!=0)) {
                try {
                    Class<? extends E> contained = c.asSubclass(_expClass);
                    _classes.put(contained.getSimpleName(), (Class<? extends E>)contained);
                    println(contained.getSimpleName());
                }
                catch(ClassCastException cce) { /* skip */ }
            }
        }
    }
   
    /** Add a subparser for expressions of type S. Return this parser (not the subparser!)
      * for method chaining.
      * @param p subparser for expressions of type S
      * @return this parser (for expressions of type E) */

    public <S> LintParser<E> addSubParser(LintParser<S> p) {
        _subParsers.put(p._expClass, p);
        return this;
    }    
   
    /** Parse the string.
      * @param s the string to be parsed
      * @return the expression of type E */

    public E parse(String s) {
        println("parse '"+s+"'");
        RestoreStringTokenizer st = new RestoreStringTokenizer(s.trim());
        return parse(st);
    }
   
    /** Parse the tokens in the tokenizer.
      * @param s the tokenizer with the tokens
      * @return the expression of type E */

    protected E parse(RestoreStringTokenizer s) {
        // position to restore everything
        final int sPos = s.getPosition();
       
        // read (
        parseWhitespace(s);
        String word = s.nextToken();
        println("word: '"+word+"'");
        if (!word.equals("(")) {
            s.restorePosition(sPos);
            throw new ParseException();
        }
       
        // read S-expression name
        parseWhitespace(s);
        word = s.nextToken();
        println("word: '"+word+"'");
        if (!_classes.containsKey(word)) {
            // don't know what that is, restore
            s.restorePosition(sPos);
            throw new ParseException();
        }
       
        // known subclass of E
        Class<? extends E> c = _classes.get(word);
        println("Class: "+c.getSimpleName());        
        parseWhitespace(s);
       
        // position to restore after trying out a constructor
        final int sTryCtorPos = s.getPosition();
       
        Constructor<? extends E> ctor;
        for(Constructor<?> ctor2: c.getDeclaredConstructors()) {
            println("\ttrying ctor "+ctor2);
           
            // this is necessary since c.getDeclaredConstructors() returns Constructor<?>
            // but we want Constructor<? extends E>
            try { ctor = c.getConstructor(ctor2.getParameterTypes()); }
            catch(NoSuchMethodException nsme) { throw new AssertionError("Should never happen."); }
           
            // try to parse the arguments for this constructor
            Object[] args = new Object[ctor.getParameterTypes().length];
            int argIndex = 0;
            try {
                for(Class<?> paramC: ctor.getParameterTypes()) {
                    // primitive types and their boxed types
                    if ((paramC==int.class) || (paramC==Integer.class)) {
                        args[argIndex] = parseLeaf(s,Integer.class);
                    }
                    else if ((paramC==boolean.class) || (paramC==Boolean.class)) {
                        args[argIndex] = parseLeaf(s,Boolean.class);
                    }
                    else if ((paramC==long.class) || (paramC==Long.class)) {
                        args[argIndex] = parseLeaf(s,Long.class);
                    }
                    else if ((paramC==double.class) || (paramC==Double.class)) {
                        args[argIndex] = parseLeaf(s,Double.class);
                    }
                    else if ((paramC==float.class) || (paramC==Float.class)) {
                        args[argIndex] = parseLeaf(s,Float.class);
                    }
                    else if ((paramC==byte.class) || (paramC==Byte.class)) {
                        args[argIndex] = parseLeaf(s,Byte.class);
                    }
                    else if ((paramC==short.class) || (paramC==Short.class)) {
                        args[argIndex] = parseLeaf(s,Short.class);
                    }
                    // char or Character
                    else if ((paramC==char.class) || (paramC==Character.class)) {
                        // parse as string
                        Object temp = parseLeaf(s,String.class);
                        // must be exactly one character
                        if (temp.toString().length()!=1) throw new ParseException();
                        args[argIndex] = new Character(temp.toString().charAt(0));
                    }                    
                    // strings
                    else if (paramC==String.class) {
                        args[argIndex] = parseLeaf(s,String.class);
                    }
                    // recursively parse expressions of type E
                    else if (_expClass.equals(paramC)) {
                        args[argIndex] = parse(s);
                    }
                    // try to use a subparser
                    else if (_subParsers.containsKey(paramC)) {
                        // try one of the subparsers
                        args[argIndex] = _subParsers.get(paramC).parse(s);
                    }                    
                    else {
                        // don't know what that is
                        // restore happens below
                        throw new ParseException();
                    }
                    ++argIndex;
                }
            }
            catch(ParseException pe) {
                // this constructor didn't work out, we need to restore and try another constructor
                s.restorePosition(sTryCtorPos);
                // continue with next constructor
                continue;
            }
            try {
                // we read all values required for this constructor
                // make sure that the next token is )
                parseWhitespace(s);
                word = s.nextToken();
                println("word: '"+word+"'");
                if (!word.equals(")")) {
                    // it wasn't ), we need to restore and try another constructor
                    s.restorePosition(sTryCtorPos);
                    // continue with next constructor
                    continue;
                }
                parseWhitespace(s);
               
                // successfully used this constructor
                println("new "+ctor.getDeclaringClass().getSimpleName()+" "+
                        java.util.Arrays.toString(args));
                E e = (E)ctor.newInstance(args);
                return e;
            }
            catch(Exception e) {
                // something went wrong using this constructor, we need to restore and try another
                s.restorePosition(sTryCtorPos);
                // continue with next constructor
                continue;
            }
        }
       
        s.restorePosition(sPos);
        throw new ParseException();
    }
   
    /** Parse a leaf of class c, which must have a unary constructor taking a String, from
      * the tokenizer s.
      * @param s tokenizer from which to parse
      * @param c the class, which has a unary constructor taking a String, for that we want to create a value
      * @return parsed value of class c */

    protected Object parseLeaf(RestoreStringTokenizer s, Class<?> c) {
        // position so we can undo this attempt to parse
        final int sPos = s.getPosition();
       
        String word = null;
        try {
            // get the unary constructor taking String
            Constructor ctor = c.getDeclaredConstructor(String.class);
           
            // skip whitespace, get the string, skip whitespace
            parseWhitespace(s);
            word = s.nextToken();
            println("word: '"+word+"'");
            Object o = ctor.newInstance(word);
            parseWhitespace(s);
           
            return o;
        }
        catch(NoSuchMethodException nsme) {
            // something went wrong, restore and abort
            s.restorePosition(sPos);
            throw new ParseException();
        }
        catch(InstantiationException ie) {
            // something went wrong, restore and abort
            s.restorePosition(sPos);
            throw new ParseException();
        }
        catch(IllegalAccessException iae) {
            // something went wrong, restore and abort
            s.restorePosition(sPos);
            throw new ParseException();
        }
        catch(InvocationTargetException ite) {
            // something went wrong, restore and abort
            s.restorePosition(sPos);
            throw new ParseException();
        }
    }
   
    /** Parse whitespace. Stop before the next non-whitespace token, or when there are no
      * more tokens.
      * @param s tokenizer from which to parse whitespace */

    protected void parseWhitespace(RestoreStringTokenizer s) {
        String word;
        int sPrevPos;
        while(s.hasMoreTokens()) {
            sPrevPos = s.getPosition();
            word = s.nextToken();
            if (!word.trim().equals("")) {
                s.restorePosition(sPrevPos);
                break;
            }
        }
    }
   
    /** Debug method to print, comment System.out.println call out to disable debug printing */
    public static void println(Object o) {
        // System.out.println(o.toString());
    }
   
    /** This tokenizer stores all read tokens in a stack so they can easily be
      * restored. This isn't as memory as it could be, but it's good enough for
      * small examples. */

    public static class RestoreStringTokenizer extends StringTokenizer {
        /** Stack of read tokens. */
        private Stack<String> _read = new Stack<String>();
        /** Stack of restored tokens. */
        private Stack<String> _restored = new Stack<String>();
        /** Create a new string tokenizer that can restore positions. */
        public RestoreStringTokenizer(String s) { super(s," ()",true); }
        /** Return the current position. */
        public int getPosition() { return _read.size(); }
        /** Rewind to a previous position. This only goes backward, not forward. */
        public void restorePosition(int pos) {
            while(_read.size()>pos) {
                _restored.push(_read.pop());
            }
        }
        // overridden methods from StringTokenizer
        public int countTokens() { return _restored.size() + super.countTokens(); }
        public boolean hasMoreTokens() { return (_restored.size()>0) || super.hasMoreTokens(); }
        public boolean hasMoreElements() { return hasMoreTokens(); }
        public String nextToken() {
            String token = (_restored.size()>0) ? _restored.pop() : super.nextToken();
            _read.push(token);
            return token;
        }
        public Object nextElement() { return nextToken(); }
        public String nextToken(String delim) {
            if (_restored.size()>0) return _restored.pop();
            return super.nextToken(delim);
        }
    }
       
    public static void main(String[] args) {
        lint2_cond.UnstagedLint.Exp e = parse
            (lint2_cond.UnstagedLint.Exp.class,"(Ifz (Int 1) (Int 10) (Mul(Int 3)(Add(Int 5)(Int 10))))");
        System.out.println(new lint2_cond.UnstagedLint.Program(e).peval(lint2_cond.UnstagedLint.env0));
       
        lint.DataTypes.Value v = parse(lint.DataTypes.Value.class, "(IntValue 1)");
        System.out.println(v);
       
        lint.DataTypes.Exp e2 =
            new LintParser<lint.DataTypes.Exp>(lint.DataTypes.Exp.class)
            .addSubParser(new LintParser<lint.DataTypes.Value>(lint.DataTypes.Value.class))
            .parse("(Add (Val (IntValue 1)) (Val (IntValue 2)))");
        System.out.println(new lint.DataTypes.Program(e2).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));

        lint.DataTypes.Exp e3 =
            LintParser.parse(lint.DataTypes.Exp.class, "(Add (Val (IntValue 1)) (Val (IntValue 2)))",
                             new LintParser<lint.DataTypes.Value>(lint.DataTypes.Value.class));
        System.out.println(new lint.DataTypes.Program(e3).peval(lint.DataTypes.env0, lint.DataTypes.fenv0));

    }
}

Here is the definition of a simple language for arithmetic expressions that uses the parser:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
package lint1_arithmetic;

public class UnstagedLint {
   
    /*
     type exp = Int of int
     | Var of string
     | Add of exp * exp
     | Sub of exp * exp
     | Mul of exp * exp
     | Div of exp * exp
     type def = Definition of string * int
     type prog = Program of def list * exp
     */

   
    public static interface Exp {
        public int eval(Env e);
    }
   
    public static class Int implements Exp {
        private int _value;
        public Int(int value) {
            _value = value;
        }
        public int eval(Env e) {
            return _value;
        }
    }
   
    public static class Var implements Exp {
        private String _s;
        public Var(String s) {
            _s = s;
        }
        public int eval(Env e) {
            return e.get(_s);
        }
    }
   
    public static abstract class BinOp implements Exp {
        protected Exp _left, _right;
        public BinOp(Exp left, Exp right) {
            _left = left;
            _right = right;
        }
    }
   
    public static class Add extends BinOp {
        public Add(Exp left, Exp right) {
            super(left, right);
        }
        public int eval(Env e) {
            return _left.eval(e) + _right.eval(e);
        }
    }
   
    public static class Sub extends BinOp {
        public Sub(Exp left, Exp right) {
            super(left, right);
        }
        public int eval(Env e) {
            return _left.eval(e) - _right.eval(e);
        }
    }
   
    public static class Mul extends BinOp {
        public Mul(Exp left, Exp right) {
            super(left, right);
        }
        public int eval(Env e) {
            return _left.eval(e) * _right.eval(e);
        }
    }
   
    public static class Div extends BinOp {
        public Div(Exp left, Exp right) {
            super(left, right);
        }
        public int eval(Env e) {
            return _left.eval(e) / _right.eval(e);
        }
    }
   
    /*
     exception EnvironmentException
     let env0 = fun x -> raise EnvironmentException
     let fenv0 = env0
     let ext env x v = fun y -> if x=y then v else env y
     */

   
    public static class EnvironmentException extends RuntimeException { }
   
    // environment; implemented as a function object from String to Exp
    public static interface Env {
        public int get(String y);
    }
    public static final Env env0 = new Env() {
        public int get(String s) { throw new EnvironmentException(); }
    };
   
    public static Env ext(final Env env, final String x, final int v) {
        return new Env() {
            public int get(String y) {
                if (x.equals(y)) return v; else return env.get(y);
            }
        };
    }
   
    public static class Definition {
        private String _name;
        private int _v;
        public Definition(String name, int v) {
            _name = name;
            _v = v;
        }
        public String name() { return _name; }
        public int value() { return _v; }
    }

    public static class Program {
        public Definition[] _defs;
        public Exp _body;
        public Program(Exp body, Definition... defs) {
            _defs = defs;
            _body = body;
        }
       
        public int peval(Env env) {
            return peval(env,0);
        }
       
        private int peval(final Env env, int defIndex) {
            // match p with
            if (_defs.length<=defIndex) {
                //     Program ([],e) -> eval e env
                return _body.eval(env);
            }
            else {
                //    |Program (Definition (n,v)::tl,e) -> peval (Program(tl,e)) (ext env n v)
                final Definition d = _defs[defIndex];
                return peval(ext(env, d.name(), d.value()),defIndex+1);
            }
        }
    }
   
    /*
     let rec eval e env fenv =
     match e with
     Int i -> i
     | Var s -> env s
     | Add (e1,e2) -> (eval e1 env fenv)+(eval e2 env fenv)
     | Sub (e1,e2) -> (eval e1 env fenv)-(eval e2 env fenv)
     | Mul (e1,e2) -> (eval e1 env fenv)*(eval e2 env fenv)
     | Div (e1,e2) -> (eval e1 env fenv)/(eval e2 env fenv)
     
     let rec peval p env =
     match p with
     Program ([],e) -> eval e env
     |Program (Definition (s1,i1)::tl,e) -> peval (Program(tl,e)) (ext env s1 i1)
     
     */

   
    public static Program term10times20plus30 = new Program
        (new Add(new Mul(new Var("x"), new Var("y")), new Int(30)),
         new Definition("x", 10),
         new Definition("y", 20));

    public static Program parsedTerm10times20plus30 = new Program
        (parser.LintParser.parse(Exp.class, "(Add (Mul (Var x) (Var y)) (Int 30))"),
         new Definition("x", 10),
         new Definition("y", 20));

    public static Program termWhatX = new Program
        (new Var("x"));

    public static Program parsedTermWhatX = new Program
        (parser.LintParser.parse(Exp.class, "(Var x)"));
       
    public static void main(String[] args) {
        int i = term10times20plus30.peval(env0);
        System.out.println("term10times20plus30 = "+i);
       
        i = parsedTerm10times20plus30.peval(env0);
        System.out.println("parsedTerm10times20plus30 = "+i);
       
        try {
            i = termWhatX.peval(env0);
            System.out.println("termWhatX = "+i);
            throw new AssertionError("This should throw an EnvironmentException");
        }
        catch(EnvironmentException ee) { System.out.println("termWhatX threw "+ee); /* expected */ }

        try {
            i = parsedTermWhatX.peval(env0);
            System.out.println("parsedTermWhatX = "+i);
            throw new AssertionError("This should throw an EnvironmentException");
        }
        catch(EnvironmentException ee) { System.out.println("parsedTermWhatX threw "+ee); /* expected */ }
    }
}
Share

About Mathias

Software development engineer. Principal developer of DrJava. Recent Ph.D. graduate from the Department of Computer Science at Rice University.
This entry was posted in Mint. Bookmark the permalink.

One Response to Reflection-Based S-Expression Parser

  1. Pingback: BSD License for the Reflection-Based S-Expression Parser | A Concurrent Affair

Leave a Reply