At work, we deployed the latest version of our code on Wednesday night. It included a new query language. We used ANTLR to do the parsing.
We had never tested the new queries in a production setting, but the deployments to our Alpha, Gamma, and two regional clusters worked without problems. A few hours after deploying to the last clusters, a customer started using it, and that’s when all hell broke lose. I ended up rolling back to the previous version at 4 AM on Thursday.
A quick analysis by my coworkers yesterday showed that most of the time was spent doing the parsing.
The grammar that we used was essentially this one, which required backtracking by ANTLR:
E ::= ( E )
| C
C ::= ( E and E )
| ( E or E )
We never even let our test query finish, but it had been running for 45 minutes non-stop. I suggested a small change, which amounted to something like this:
E ::= ( E R
R ::= )
| and E )
| or E )
Sometimes it is helpful that I spent grad school in the programming languages group.
The parser that ANTLR generated now parsed our test query in under a second. Sure, we changed an LL(*) parser into an LL(1) parser, but I never would have thought that changing the grammar could result in an improvement by a factor of 2700.