Struct regex::literal::BoyerMooreSearch[][src]

pub struct BoyerMooreSearch {
    pattern: Vec<u8>,
    skip_table: Vec<usize>,
    guard: u8,
    guard_reverse_idx: usize,
    md2_shift: usize,
}

An implementation of Tuned Boyer-Moore as laid out by Andrew Hume and Daniel Sunday in "Fast String Searching". O(n) in the size of the input.

Fast string searching algorithms come in many variations, but they can generally be described in terms of three main components.

The skip loop is where the string searcher wants to spend as much time as possible. Exactly which character in the pattern the skip loop examines varies from algorithm to algorithm, but in the simplest case this loop repeated looks at the last character in the pattern and jumps forward in the input if it is not in the pattern. Robert Boyer and J Moore called this the "fast" loop in their original paper.

The match loop is responsible for actually examining the whole potentially matching substring. In order to fail faster, the match loop sometimes has a guard test attached. The guard test uses frequency analysis of the different characters in the pattern to choose the least frequency occurring character and use it to find match failures as quickly as possible.

The shift rule governs how the algorithm will shuffle its test window in the event of a failure during the match loop. Certain shift rules allow the worst-case run time of the algorithm to be shown to be O(n) in the size of the input rather than O(nm) in the size of the input and the size of the pattern (as naive Boyer-Moore is).

"Fast String Searching", in addition to presenting a tuned algorithm, provides a comprehensive taxonomy of the many different flavors of string searchers. Under that taxonomy TBM, the algorithm implemented here, uses an unrolled fast skip loop with memchr fallback, a forward match loop with guard, and the mini Sunday's delta shift rule. To unpack that you'll have to read the paper.

Fields

The pattern we are going to look for in the haystack.

The skip table for the skip loop.

Maps the character at the end of the input to a shift.

The guard character (least frequently occurring char).

The reverse-index of the guard character in the pattern.

Daniel Sunday's mini generalized delta2 shift table.

We use a skip loop, so we only have to provide a shift for the skip char (last char). This is why it is a mini shift rule.

Methods

impl BoyerMooreSearch
[src]

Create a new string searcher, performing whatever compilation steps are required.

Find the pattern in haystack, returning the offset of the start of the first occurrence of the pattern in haystack.

The key heuristic behind which the BoyerMooreSearch lives.

See rust-lang/regex/issues/408.

Tuned Boyer-Moore is actually pretty slow! It turns out a handrolled platform-specific memchr routine with a bit of frequency analysis sprinkled on top actually wins most of the time. However, there are a few cases where Tuned Boyer-Moore still wins.

If the haystack is random, frequency analysis doesn't help us, so Boyer-Moore will win for sufficiently large needles. Unfortunately, there is no obvious way to determine this ahead of time.

If the pattern itself consists of very common characters, frequency analysis won't get us anywhere. The most extreme example of this is a pattern like eeeeeeeeeeeeeeee. Fortunately, this case is wholly determined by the pattern, so we can actually implement the heuristic.

A third case is if the pattern is sufficiently long. The idea here is that once the pattern gets long enough the Tuned Boyer-Moore skip loop will start making strides long enough to beat the asm deep magic that is memchr.

Check to see if there is a match at the given position

Skip forward according to the shift table.

Returns the offset of the next occurrence of the last char in the pattern, or the none if it never reappears. If skip_loop hits the backstop it will leave early.

Compute the ufast skip table.

Select the guard character based off of the precomputed frequency table.

If there is another occurrence of the skip char, shift to it, otherwise just shift to the next window.

Trait Implementations

impl Clone for BoyerMooreSearch
[src]

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

impl Debug for BoyerMooreSearch
[src]

Formats the value using the given formatter. Read more

Auto Trait Implementations

impl Send for BoyerMooreSearch

impl Sync for BoyerMooreSearch