Struct regex::dfa::Fsm[][src]

pub struct Fsm<'a> {
    prog: &'a Program,
    start: u32,
    at: usize,
    quit_after_match: bool,
    last_match_si: u32,
    last_cache_flush: usize,
    cache: &'a mut CacheInner,
}

Fsm encapsulates the actual execution of the DFA.

Fields

prog contains the NFA instruction opcodes. DFA execution uses either the dfa instructions or the dfa_reverse instructions from exec::ExecReadOnly. (It never uses ExecReadOnly.nfa, which may have Unicode opcodes that cannot be executed by the DFA.)

The start state. We record it here because the pointer may change when the cache is wiped.

The current position in the input.

Should we quit after seeing the first match? e.g., When the caller uses is_match or shortest_match.

The last state that matched.

When no match has occurred, this is set to STATE_UNKNOWN.

This is only useful when matching regex sets. The last match state is useful because it contains all of the match instructions seen, thereby allowing us to enumerate which regexes in the set matched.

The input position of the last cache flush. We use this to determine if we're thrashing in the cache too often. If so, the DFA quits so that we can fall back to the NFA algorithm.

All cached DFA information that is persisted between searches.

Methods

impl<'a> Fsm<'a>
[src]

Executes the DFA on a forward NFA.

{qcur,qnext} are scratch ordered sets which may be non-empty.

Executes the DFA on a reverse NFA.

next_si transitions to the next state, where the transition input corresponds to text[i].

This elides bounds checks, and is therefore unsafe.

Computes the next state given the current state and the current input byte (which may be EOF).

If STATE_DEAD is returned, then there is no valid state transition. This implies that no permutation of future input can lead to a match state.

STATE_UNKNOWN can never be returned.

Follows the epsilon transitions starting at (and including) ip. The resulting states are inserted into the ordered set q.

Conditional epsilon transitions (i.e., empty width assertions) are only followed if they are satisfied by the given flags, which should represent the flags set at the current location in the input.

If the current location corresponds to the empty string, then only the end line and/or end text flags may be set. If the current location corresponds to a real byte in the input, then only the start line and/or start text flags may be set.

As an exception to the above, when finding the initial state, any of the above flags may be set:

If matching starts at the beginning of the input, then start text and start line should be set. If the input is empty, then end text and end line should also be set.

If matching starts after the beginning of the input, then only start line should be set if the preceding byte is \n. End line should never be set in this case. (Even if the proceding byte is a \n, it will be handled in a subsequent DFA state.)

Find a previously computed state matching the given set of instructions and is_match bool.

The given set of instructions should represent a single state in the NFA along with all states reachable without consuming any input.

The is_match bool should be true if and only if the preceding DFA state contains an NFA matching state. The cached state produced here will then signify a match. (This enables us to delay a match by one byte, in order to account for the EOF sentinel byte.)

If the cache is full, then it is wiped before caching a new state.

The current state should be specified if it exists, since it will need to be preserved if the cache clears itself. (Start states are always saved, so they should not be passed here.) It takes a mutable pointer to the index because if the cache is cleared, the state's location may change.

Produces a key suitable for describing a state in the DFA cache.

The key invariant here is that equivalent keys are produced for any two sets of ordered NFA states (and toggling of whether the previous NFA states contain a match state) that do not discriminate a match for any input.

Specifically, q should be an ordered set of NFA states and is_match should be true if and only if the previous NFA states contained a match state.

Clears the cache, but saves and restores current_state if it is not none.

The current state must be provided here in case its location in the cache changes.

This returns false if the cache is not cleared and the DFA should give up.

Wipes the state cache, but saves and restores the current start state.

This returns false if the cache is not cleared and the DFA should give up.

Restores the given state back into the cache, and returns a pointer to it.

Returns the next state given the current state si and current byte b. {qcur,qnext} are used as scratch space for storing ordered NFA states.

This tries to fetch the next state from the cache, but if that fails, it computes the next state, caches it and returns a pointer to it.

The pointer can be to a real state, or it can be STATE_DEAD. STATE_UNKNOWN cannot be returned.

None is returned if a new state could not be allocated (i.e., the DFA ran out of space and thinks it's running too slowly).

Computes and returns the start state, where searching begins at position at in text. If the state has already been computed, then it is pulled from the cache. If the state hasn't been cached, then it is computed, cached and a pointer to it is returned.

This may return STATE_DEAD but never STATE_UNKNOWN.

Computes the set of starting flags for the given position in text.

This should only be used when executing the DFA forwards over the input.

Computes the set of starting flags for the given position in text.

This should only be used when executing the DFA in reverse over the input.

Returns a reference to a State given a pointer to it.

Adds the given state to the DFA.

This allocates room for transitions out of this state in self.cache.trans. The transitions can be set with the returned StatePtr.

If None is returned, then the state limit was reached and the DFA should quit.

Quickly finds the next occurrence of any literal prefixes in the regex. If there are no literal prefixes, then the current position is returned. If there are literal prefixes and one could not be found, then None is returned.

This should only be called when the DFA is in a start state.

Returns the number of byte classes required to discriminate transitions in each state.

invariant: num_byte_classes() == len(State.next)

Given an input byte or the special EOF sentinel, return its corresponding byte class.

Like byte_class, but explicitly for u8s.

Returns true if the DFA should continue searching past the first match.

Leftmost first semantics in the DFA are preserved by not following NFA transitions after the first match is seen.

On occasion, we want to avoid leftmost first semantics to find either the longest match (for reverse search) or all possible matches (for regex sets).

Returns true if there is a prefix we can quickly search for.

Sets the STATE_START bit in the given state pointer if and only if we have a prefix to scan for.

If there's no prefix, then it's a waste to treat the start state specially.

Approximate size returns the approximate heap space currently used by the DFA. It is used to determine whether the DFA's state cache needs to be wiped. Namely, it is possible that for certain regexes on certain inputs, a new state could be created for every byte of input. (This is bad for memory use, so we bound it with a cache.)

Trait Implementations

impl<'a> Debug for Fsm<'a>
[src]

Formats the value using the given formatter. Read more

Auto Trait Implementations

impl<'a> Send for Fsm<'a>

impl<'a> Sync for Fsm<'a>