Struct regex::dfa::Fsm [−][src]
pub struct Fsm<'a> { prog: &'a Program, start: u32, at: usize, quit_after_match: bool, last_match_si: u32, last_cache_flush: usize, cache: &'a mut CacheInner, }
Fsm encapsulates the actual execution of the DFA.
Fields
prog: &'a Program
prog contains the NFA instruction opcodes. DFA execution uses either
the dfa
instructions or the dfa_reverse
instructions from
exec::ExecReadOnly
. (It never uses ExecReadOnly.nfa
, which may have
Unicode opcodes that cannot be executed by the DFA.)
start: u32
The start state. We record it here because the pointer may change when the cache is wiped.
at: usize
The current position in the input.
quit_after_match: bool
Should we quit after seeing the first match? e.g., When the caller
uses is_match
or shortest_match
.
last_match_si: u32
The last state that matched.
When no match has occurred, this is set to STATE_UNKNOWN.
This is only useful when matching regex sets. The last match state is useful because it contains all of the match instructions seen, thereby allowing us to enumerate which regexes in the set matched.
last_cache_flush: usize
The input position of the last cache flush. We use this to determine if we're thrashing in the cache too often. If so, the DFA quits so that we can fall back to the NFA algorithm.
cache: &'a mut CacheInner
All cached DFA information that is persisted between searches.
Methods
impl<'a> Fsm<'a>
[src]
impl<'a> Fsm<'a>
pub fn forward(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
quit_after_match: bool,
text: &[u8],
at: usize
) -> Result<usize>
[src]
pub fn forward(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
quit_after_match: bool,
text: &[u8],
at: usize
) -> Result<usize>
pub fn reverse(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
quit_after_match: bool,
text: &[u8],
at: usize
) -> Result<usize>
[src]
pub fn reverse(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
quit_after_match: bool,
text: &[u8],
at: usize
) -> Result<usize>
pub fn forward_many(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
matches: &mut [bool],
text: &[u8],
at: usize
) -> Result<usize>
[src]
pub fn forward_many(
prog: &'a Program,
cache: &RefCell<ProgramCacheInner>,
matches: &mut [bool],
text: &[u8],
at: usize
) -> Result<usize>
fn exec_at(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
text: &[u8]
) -> Result<usize>
[src]
fn exec_at(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
text: &[u8]
) -> Result<usize>
Executes the DFA on a forward NFA.
{qcur,qnext} are scratch ordered sets which may be non-empty.
fn exec_at_reverse(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
text: &[u8]
) -> Result<usize>
[src]
fn exec_at_reverse(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
text: &[u8]
) -> Result<usize>
Executes the DFA on a reverse NFA.
unsafe fn next_si(&self, si: u32, text: &[u8], i: usize) -> u32
[src]
unsafe fn next_si(&self, si: u32, text: &[u8], i: usize) -> u32
next_si transitions to the next state, where the transition input corresponds to text[i].
This elides bounds checks, and is therefore unsafe.
fn exec_byte(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
si: u32,
b: Byte
) -> Option<u32>
[src]
fn exec_byte(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
si: u32,
b: Byte
) -> Option<u32>
Computes the next state given the current state and the current input byte (which may be EOF).
If STATE_DEAD is returned, then there is no valid state transition. This implies that no permutation of future input can lead to a match state.
STATE_UNKNOWN can never be returned.
fn follow_epsilons(&mut self, ip: u32, q: &mut SparseSet, flags: EmptyFlags)
[src]
fn follow_epsilons(&mut self, ip: u32, q: &mut SparseSet, flags: EmptyFlags)
Follows the epsilon transitions starting at (and including) ip
. The
resulting states are inserted into the ordered set q
.
Conditional epsilon transitions (i.e., empty width assertions) are only followed if they are satisfied by the given flags, which should represent the flags set at the current location in the input.
If the current location corresponds to the empty string, then only the end line and/or end text flags may be set. If the current location corresponds to a real byte in the input, then only the start line and/or start text flags may be set.
As an exception to the above, when finding the initial state, any of the above flags may be set:
If matching starts at the beginning of the input, then start text and start line should be set. If the input is empty, then end text and end line should also be set.
If matching starts after the beginning of the input, then only start
line should be set if the preceding byte is \n
. End line should never
be set in this case. (Even if the proceding byte is a \n
, it will
be handled in a subsequent DFA state.)
fn cached_state(
&mut self,
q: &SparseSet,
state_flags: StateFlags,
current_state: Option<&mut u32>
) -> Option<u32>
[src]
fn cached_state(
&mut self,
q: &SparseSet,
state_flags: StateFlags,
current_state: Option<&mut u32>
) -> Option<u32>
Find a previously computed state matching the given set of instructions and is_match bool.
The given set of instructions should represent a single state in the NFA along with all states reachable without consuming any input.
The is_match bool should be true if and only if the preceding DFA state contains an NFA matching state. The cached state produced here will then signify a match. (This enables us to delay a match by one byte, in order to account for the EOF sentinel byte.)
If the cache is full, then it is wiped before caching a new state.
The current state should be specified if it exists, since it will need to be preserved if the cache clears itself. (Start states are always saved, so they should not be passed here.) It takes a mutable pointer to the index because if the cache is cleared, the state's location may change.
fn cached_state_key(
&mut self,
q: &SparseSet,
state_flags: &mut StateFlags
) -> Option<State>
[src]
fn cached_state_key(
&mut self,
q: &SparseSet,
state_flags: &mut StateFlags
) -> Option<State>
Produces a key suitable for describing a state in the DFA cache.
The key invariant here is that equivalent keys are produced for any two sets of ordered NFA states (and toggling of whether the previous NFA states contain a match state) that do not discriminate a match for any input.
Specifically, q should be an ordered set of NFA states and is_match should be true if and only if the previous NFA states contained a match state.
fn clear_cache_and_save(&mut self, current_state: Option<&mut u32>) -> bool
[src]
fn clear_cache_and_save(&mut self, current_state: Option<&mut u32>) -> bool
Clears the cache, but saves and restores current_state if it is not none.
The current state must be provided here in case its location in the cache changes.
This returns false if the cache is not cleared and the DFA should give up.
fn clear_cache(&mut self) -> bool
[src]
fn clear_cache(&mut self) -> bool
Wipes the state cache, but saves and restores the current start state.
This returns false if the cache is not cleared and the DFA should give up.
fn restore_state(&mut self, state: State) -> Option<u32>
[src]
fn restore_state(&mut self, state: State) -> Option<u32>
Restores the given state back into the cache, and returns a pointer to it.
fn next_state(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
si: u32,
b: Byte
) -> Option<u32>
[src]
fn next_state(
&mut self,
qcur: &mut SparseSet,
qnext: &mut SparseSet,
si: u32,
b: Byte
) -> Option<u32>
Returns the next state given the current state si and current byte b. {qcur,qnext} are used as scratch space for storing ordered NFA states.
This tries to fetch the next state from the cache, but if that fails, it computes the next state, caches it and returns a pointer to it.
The pointer can be to a real state, or it can be STATE_DEAD. STATE_UNKNOWN cannot be returned.
None is returned if a new state could not be allocated (i.e., the DFA ran out of space and thinks it's running too slowly).
fn start_state(
&mut self,
q: &mut SparseSet,
empty_flags: EmptyFlags,
state_flags: StateFlags
) -> Option<u32>
[src]
fn start_state(
&mut self,
q: &mut SparseSet,
empty_flags: EmptyFlags,
state_flags: StateFlags
) -> Option<u32>
Computes and returns the start state, where searching begins at
position at
in text
. If the state has already been computed,
then it is pulled from the cache. If the state hasn't been cached,
then it is computed, cached and a pointer to it is returned.
This may return STATE_DEAD but never STATE_UNKNOWN.
fn start_flags(&self, text: &[u8], at: usize) -> (EmptyFlags, StateFlags)
[src]
fn start_flags(&self, text: &[u8], at: usize) -> (EmptyFlags, StateFlags)
Computes the set of starting flags for the given position in text.
This should only be used when executing the DFA forwards over the input.
fn start_flags_reverse(
&self,
text: &[u8],
at: usize
) -> (EmptyFlags, StateFlags)
[src]
fn start_flags_reverse(
&self,
text: &[u8],
at: usize
) -> (EmptyFlags, StateFlags)
Computes the set of starting flags for the given position in text.
This should only be used when executing the DFA in reverse over the input.
fn state(&self, si: u32) -> &State
[src]
fn state(&self, si: u32) -> &State
Returns a reference to a State given a pointer to it.
fn add_state(&mut self, state: State) -> Option<u32>
[src]
fn add_state(&mut self, state: State) -> Option<u32>
Adds the given state to the DFA.
This allocates room for transitions out of this state in self.cache.trans. The transitions can be set with the returned StatePtr.
If None is returned, then the state limit was reached and the DFA should quit.
fn prefix_at(&self, text: &[u8], at: usize) -> Option<usize>
[src]
fn prefix_at(&self, text: &[u8], at: usize) -> Option<usize>
Quickly finds the next occurrence of any literal prefixes in the regex. If there are no literal prefixes, then the current position is returned. If there are literal prefixes and one could not be found, then None is returned.
This should only be called when the DFA is in a start state.
fn num_byte_classes(&self) -> usize
[src]
fn num_byte_classes(&self) -> usize
Returns the number of byte classes required to discriminate transitions in each state.
invariant: num_byte_classes() == len(State.next)
fn byte_class(&self, b: Byte) -> usize
[src]
fn byte_class(&self, b: Byte) -> usize
Given an input byte or the special EOF sentinel, return its corresponding byte class.
fn u8_class(&self, b: u8) -> usize
[src]
fn u8_class(&self, b: u8) -> usize
Like byte_class, but explicitly for u8s.
fn continue_past_first_match(&self) -> bool
[src]
fn continue_past_first_match(&self) -> bool
Returns true if the DFA should continue searching past the first match.
Leftmost first semantics in the DFA are preserved by not following NFA transitions after the first match is seen.
On occasion, we want to avoid leftmost first semantics to find either the longest match (for reverse search) or all possible matches (for regex sets).
fn has_prefix(&self) -> bool
[src]
fn has_prefix(&self) -> bool
Returns true if there is a prefix we can quickly search for.
fn start_ptr(&self, si: u32) -> u32
[src]
fn start_ptr(&self, si: u32) -> u32
Sets the STATE_START bit in the given state pointer if and only if we have a prefix to scan for.
If there's no prefix, then it's a waste to treat the start state specially.
fn approximate_size(&self) -> usize
[src]
fn approximate_size(&self) -> usize
Approximate size returns the approximate heap space currently used by the DFA. It is used to determine whether the DFA's state cache needs to be wiped. Namely, it is possible that for certain regexes on certain inputs, a new state could be created for every byte of input. (This is bad for memory use, so we bound it with a cache.)