There are many ways to write the lexical specification for a grammar. But the performance of the generated token manager varies significantly depending on how you do this. Here are a few tips:
SKIP : { " " | "\t" | "\n" }is more efficient than doing
SKIP : { < ([" ", "\t", "\n"])+ > }because in the first case you only have string literals, it will generate a DFA whereas for the second case it will generate an NFA.
MORE : { < ~[] > }is better than doing
TOKEN : { < (~[])+ > }of course, if your grammar dictates that one of these cannot be used, then you don't have a choice, but try to use < ~[] > as much as possible.
< NONE : "\"none\"" | "\'none\'" >Instead, have two different token kinds for this and use a nonterminal which is a choice between those choices. The above example can be written as :
< NONE1 : "\"none\"" > | < NONE2 : "\'none'\" >and define a nonterminal called None() as :
void None() : {} { <NONE1> | <NONE2> }This will make recognition much faster. Note however, that if the choice is between two complex regular expressions, it is OK to have the choice.