Type Definition regex::dfa::StatePtr [−][src]
type StatePtr = u32;
StatePtr
is a 32 bit pointer to the start of a row in the transition
table.
It has many special values. There are two types of special values: sentinels and flags.
Sentinels corresponds to special states that carry some kind of significance. There are three such states: unknown, dead and quit states.
Unknown states are states that haven't been computed yet. They indicate that a transition should be filled in that points to either an existing cached state or a new state altogether. In general, an unknown state means "follow the NFA's epsilon transitions."
Dead states are states that can never lead to a match, no matter what subsequent input is observed. This means that the DFA should quit immediately and return the longest match it has found thus far.
Quit states are states that imply the DFA is not capable of matching the regex correctly. Currently, this is only used when a Unicode word boundary exists in the regex and a non-ASCII byte is observed.
The other type of state pointer is a state pointer with special flag bits.
There are two flags: a start flag and a match flag. The lower bits of both
kinds always contain a "valid" StatePtr
(indicated by the STATE_MAX
mask).
The start flag means that the state is a start state, and therefore may be subject to special prefix scanning optimizations.
The match flag means that the state is a match state, and therefore the current position in the input (while searching) should be recorded.
The above exists mostly in the service of making the inner loop fast. In particular, the inner inner loop looks something like this:
while state <= STATE_MAX and i < len(text): state = state.next[i]
This is nice because it lets us execute a lazy DFA as if it were an entirely offline DFA (i.e., with very few instructions). The loop will quit only when we need to examine a case that needs special attention.