Struct regex::internal::Compiler[][src]

pub struct Compiler {
    insts: Vec<MaybeInst>,
    compiled: Program,
    capture_name_idx: HashMap<String, usize>,
    num_exprs: usize,
    size_limit: usize,
    suffix_cache: SuffixCache,
    utf8_seqs: Option<Utf8Sequences>,
    byte_classes: ByteClassSet,
}

A compiler translates a regular expression AST to a sequence of instructions. The sequence of instructions represents an NFA.

Fields

Methods

impl Compiler
[src]

Create a new regular expression compiler.

Various options can be set before calling compile on an expression.

The size of the resulting program is limited by size_limit. If the program approximately exceeds the given size (in bytes), then compilation will stop and return an error.

If bytes is true, then the program is compiled as a byte based automaton, which incorporates UTF-8 decoding into the machine. If it's false, then the automaton is Unicode scalar value based, e.g., an engine utilizing such an automaton is resposible for UTF-8 decoding.

The specific invariant is that when returning a byte based machine, the neither the Char nor Ranges instructions are produced. Conversely, when producing a Unicode scalar value machine, the Bytes instruction is never produced.

Note that dfa(true) implies bytes(true).

When disabled, the program compiled may match arbitrary bytes.

When enabled (the default), all compiled programs exclusively match valid UTF-8 bytes.

When set, the machine returned is suitable for use in the DFA matching engine.

In particular, this ensures that if the regex is not anchored in the beginning, then a preceding .*? is included in the program. (The NFA based engines handle the preceding .*? explicitly, which is difficult or impossible in the DFA engine.)

When set, the machine returned is suitable for matching text in reverse. In particular, all concatenations are flipped.

Compile a regular expression given its AST.

The compiler is guaranteed to succeed unless the program exceeds the specified size limit. If the size limit is exceeded, then compilation stops and returns an error.

Compile expr into self.insts, returning a patch on success, or an error if we run out of memory.

All of the c_* methods of the compiler share the contract outlined here.

The main thing that a c_* method does is mutate self.insts to add a list of mostly compiled instructions required to execute the given expression. self.insts contains MaybeInsts rather than Insts because there is some backpatching required.

The Patch value returned by each c_* method provides metadata about the compiled instructions emitted to self.insts. The entry member of the patch refers to the first instruction (the entry point), while the hole member contains zero or more offsets to partial instructions that need to be backpatched. The c_* routine can't know where its list of instructions are going to jump to after execution, so it is up to the caller to patch these jumps to point to the right place. So compiling some expression, e, we would end up with a situation that looked like:

self.insts = [ ..., i1, i2, ..., iexit1, ..., iexitn, ...]
                    ^              ^             ^
                    |                \         /
                  entry                \     /
                                        hole

To compile two expressions, e1 and e2, concatinated together we would do:

This example is not tested
let patch1 = self.c(e1);
let patch2 = self.c(e2);

while leaves us with a situation that looks like

self.insts = [ ..., i1, ..., iexit1, ..., i2, ..., iexit2 ]
                    ^        ^            ^        ^
                    |        |            |        |
               entry1        hole1   entry2        hole2

Then to merge the two patches together into one we would backpatch hole1 with entry2 and return a new patch that enters at entry1 and has hole2 for a hole. In fact, if you look at the c_concat method you will see that it does exactly this, though it handles a list of expressions rather than just the two that we use for an example.

Auto Trait Implementations

impl Send for Compiler

impl Sync for Compiler