Category Archives: Languages

GLL Parsing

GLL Parsing is Generalized LL Parsing, which is a generalized parser using LL parsing (left-to-right input, leftmost derivation). GLL parsing allows left-recursion. LL has several advantages over LR (easier to understand, can be written by a person, can be implemented with recursive descent). However, LL by itself is fairly limited, it doesn’t allow left-recursion (see Limitations of LL(1) Parsing).

What looks like a fundamental paper came out in 2010 from Elizabeth Scott and Adrian Johnstone: GLL Parsing

Recursive Descent (RD) parsers are popular because their control flow follows the structure of the grammar and hence they are easy to write and to debug. However, the class of grammars which admit RD parsers is very limited. Backtracking techniques may be used to extend this class, but can have explosive runtimes and cannot deal with grammars with left recursion. Tomita-style RNGLR parsers are fully general but are based on LR techniques and do not have the direct relationship with the grammar that an RD parser has. We develop the fully general GLL parsing technique which is recursive descent-like, and has the property that the parse follows closely the structure of the grammar rules, but uses RNGLR-like machinery to handle non determinism. The resulting recognisers run in worst-case cubic time and can be built even for left recursive grammars.

(the entire proceedings of the conference this paper was introduced at: Preliminary Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications LDTA 2009 – page 113).

Another recent thesis uses GLL parsers to facilitate the design and development of domain specific languages (DSLs):

Language embedding (extending a language by embedding another language within it) is one big arena for generalized parsing techniques.

Daniel Spiewak has started doing work with GLL parsers – see his series of articles

Laurie Tratt has been writing about parsing lately. He wrote a very quoted paper (“Parsing: The Solved Problem that Isn’t” in 2011) saying that grammar composition is important, that ambiguity is important. He also wrote a paper showing how to do left-recursion safely in PEG parsing, even though he seems to think that PEG is not the way to go.

Ben Basten wrote a paper on ambiguity detection methods for context-free grammars, which is useful once you start using generalized parsing techniques (since it is easier to end up with unwanted ambiguities in your grammar). LL and LR don’t tolerate ambiguity, which is both good and bad – it’s a straightjacket, but it’s convenient. There’s probably some proof showing that good grammars are ambiguous, but we only want certain kinds of ambiguity, the kinds that lead to more expressiveness than the confusion they can introduce.

Parsing

GLR parsing

GLR is the way to go. Really.

Marpa

active as of late 2012. http://jeffreykegler.github.com/Marpa/. Paper: http://cloud.github.com/downloads/jeffreykegler/Marpa-theory/recce.pdf. Google groups: https://groups.google.com/forum/#!forum/marpa-parser

Accent

last update in 2006? http://accent.compilertools.net/

Elkhound

Not really active any more.

Elkhound home page and Google talk that Scott McPeak gave in 2006. Also 2004 paper. http://www.cs.berkeley.edu/~necula/Papers/elkhound_cc04.pdf

DParser

http://dparser.sourceforge.net/

Scannerless GLR

http://oai.cwi.nl/oai/asset/12772/12772A.pdf

ASF+SDF -> Rascal

http://www.meta-environment.org/ and http://www.rascal-mpl.org/. See http://en.wikipedia.org/wiki/ASF%2BSDF_Meta_Environment too.

GLL Parsing

Maybe GLL is the way to go?

GLL Parsing paper by Elizabeth Scott and Adrian Johnstone. Other papers

Stratego/XT and Spoofax.

http://strategoxt.org/ and http://strategoxt.org/Spoofax.

Books

Dick Grune wrote a book on just parsing, named Parsing Techniques/

Articles

http://tratt.net/laurie/tech_articles/articles/parsing_the_solved_problem_that_isnt

Packrat/PEG

Not sure that these are worth studying, I think they are a dead end. Yes, you can compose two PEG grammars together, but a PEG grammar is more limited than LR . This is the canonical page: http://bford.info/packrat/. ANTLR is the most significant parser generator using PEG parsing.

Other

http://dinosaur.compilertools.net/ – home page for Lex and Yacc.

Converge is interesting: http://convergepl.org/

Cling is an interactive C++ interpreter based on LLVM and clang.

Some useful notes on parsing and compilers from Lutz Hamel at University of Rhode Island: CSC 402 – Lecture Notes

Why I prefer LALR parsers – “[elimination of left-recursion] is achieved by converting left-recursion to right-recursion; which will always convert an S-attribute grammar into an L-attribute grammar; and seriously complicate any L-attribute grammar.”

http://code.google.com/p/blacc/wiki/DesignLeftRecursion