Struct regex::internal::Compiler [−][src]
pub struct Compiler { insts: Vec<MaybeInst>, compiled: Program, capture_name_idx: HashMap<String, usize>, num_exprs: usize, size_limit: usize, suffix_cache: SuffixCache, utf8_seqs: Option<Utf8Sequences>, byte_classes: ByteClassSet, }
A compiler translates a regular expression AST to a sequence of instructions. The sequence of instructions represents an NFA.
Fields
insts: Vec<MaybeInst>
compiled: Program
capture_name_idx: HashMap<String, usize>
num_exprs: usize
size_limit: usize
suffix_cache: SuffixCache
utf8_seqs: Option<Utf8Sequences>
byte_classes: ByteClassSet
Methods
impl Compiler
[src]
impl Compiler
pub fn new() -> Self
[src]
pub fn new() -> Self
Create a new regular expression compiler.
Various options can be set before calling compile
on an expression.
pub fn size_limit(self, size_limit: usize) -> Self
[src]
pub fn size_limit(self, size_limit: usize) -> Self
The size of the resulting program is limited by size_limit. If the program approximately exceeds the given size (in bytes), then compilation will stop and return an error.
pub fn bytes(self, yes: bool) -> Self
[src]
pub fn bytes(self, yes: bool) -> Self
If bytes is true, then the program is compiled as a byte based automaton, which incorporates UTF-8 decoding into the machine. If it's false, then the automaton is Unicode scalar value based, e.g., an engine utilizing such an automaton is resposible for UTF-8 decoding.
The specific invariant is that when returning a byte based machine,
the neither the Char
nor Ranges
instructions are produced.
Conversely, when producing a Unicode scalar value machine, the Bytes
instruction is never produced.
Note that dfa(true)
implies bytes(true)
.
pub fn only_utf8(self, yes: bool) -> Self
[src]
pub fn only_utf8(self, yes: bool) -> Self
When disabled, the program compiled may match arbitrary bytes.
When enabled (the default), all compiled programs exclusively match valid UTF-8 bytes.
pub fn dfa(self, yes: bool) -> Self
[src]
pub fn dfa(self, yes: bool) -> Self
When set, the machine returned is suitable for use in the DFA matching engine.
In particular, this ensures that if the regex is not anchored in the
beginning, then a preceding .*?
is included in the program. (The NFA
based engines handle the preceding .*?
explicitly, which is difficult
or impossible in the DFA engine.)
pub fn reverse(self, yes: bool) -> Self
[src]
pub fn reverse(self, yes: bool) -> Self
When set, the machine returned is suitable for matching text in reverse. In particular, all concatenations are flipped.
pub fn compile(self, exprs: &[Hir]) -> Result<Program, Error>
[src]
pub fn compile(self, exprs: &[Hir]) -> Result<Program, Error>
Compile a regular expression given its AST.
The compiler is guaranteed to succeed unless the program exceeds the specified size limit. If the size limit is exceeded, then compilation stops and returns an error.
fn compile_one(self, expr: &Hir) -> Result<Program, Error>
[src]
fn compile_one(self, expr: &Hir) -> Result<Program, Error>
fn compile_many(self, exprs: &[Hir]) -> Result<Program, Error>
[src]
fn compile_many(self, exprs: &[Hir]) -> Result<Program, Error>
fn compile_finish(self) -> Result<Program, Error>
[src]
fn compile_finish(self) -> Result<Program, Error>
fn c(&mut self, expr: &Hir) -> Result<Patch, Error>
[src]
fn c(&mut self, expr: &Hir) -> Result<Patch, Error>
Compile expr into self.insts, returning a patch on success, or an error if we run out of memory.
All of the c_* methods of the compiler share the contract outlined here.
The main thing that a c_* method does is mutate self.insts
to add a list of mostly compiled instructions required to execute
the given expression. self.insts
contains MaybeInsts rather than
Insts because there is some backpatching required.
The Patch
value returned by each c_* method provides metadata
about the compiled instructions emitted to self.insts
. The
entry
member of the patch refers to the first instruction
(the entry point), while the hole
member contains zero or
more offsets to partial instructions that need to be backpatched.
The c_* routine can't know where its list of instructions are going to
jump to after execution, so it is up to the caller to patch
these jumps to point to the right place. So compiling some
expression, e, we would end up with a situation that looked like:
self.insts = [ ..., i1, i2, ..., iexit1, ..., iexitn, ...]
^ ^ ^
| \ /
entry \ /
hole
To compile two expressions, e1 and e2, concatinated together we would do:
let patch1 = self.c(e1); let patch2 = self.c(e2);
while leaves us with a situation that looks like
self.insts = [ ..., i1, ..., iexit1, ..., i2, ..., iexit2 ]
^ ^ ^ ^
| | | |
entry1 hole1 entry2 hole2
Then to merge the two patches together into one we would backpatch hole1 with entry2 and return a new patch that enters at entry1 and has hole2 for a hole. In fact, if you look at the c_concat method you will see that it does exactly this, though it handles a list of expressions rather than just the two that we use for an example.
fn c_capture(&mut self, first_slot: usize, expr: &Hir) -> Result<Patch, Error>
[src]
fn c_capture(&mut self, first_slot: usize, expr: &Hir) -> Result<Patch, Error>
fn c_dotstar(&mut self) -> Result<Patch, Error>
[src]
fn c_dotstar(&mut self) -> Result<Patch, Error>
fn c_literal(&mut self, chars: &[char]) -> Result<Patch, Error>
[src]
fn c_literal(&mut self, chars: &[char]) -> Result<Patch, Error>
fn c_char(&mut self, c: char) -> Result<Patch, Error>
[src]
fn c_char(&mut self, c: char) -> Result<Patch, Error>
fn c_class(&mut self, ranges: &[ClassUnicodeRange]) -> Result<Patch, Error>
[src]
fn c_class(&mut self, ranges: &[ClassUnicodeRange]) -> Result<Patch, Error>
fn c_bytes(&mut self, bytes: &[u8]) -> Result<Patch, Error>
[src]
fn c_bytes(&mut self, bytes: &[u8]) -> Result<Patch, Error>
fn c_byte(&mut self, b: u8) -> Result<Patch, Error>
[src]
fn c_byte(&mut self, b: u8) -> Result<Patch, Error>
fn c_class_bytes(&mut self, ranges: &[ClassBytesRange]) -> Result<Patch, Error>
[src]
fn c_class_bytes(&mut self, ranges: &[ClassBytesRange]) -> Result<Patch, Error>
fn c_empty_look(&mut self, look: EmptyLook) -> Result<Patch, Error>
[src]
fn c_empty_look(&mut self, look: EmptyLook) -> Result<Patch, Error>
fn c_concat<'a, I>(&mut self, exprs: I) -> Result<Patch, Error> where
I: IntoIterator<Item = &'a Hir>,
[src]
fn c_concat<'a, I>(&mut self, exprs: I) -> Result<Patch, Error> where
I: IntoIterator<Item = &'a Hir>,
fn c_alternate(&mut self, exprs: &[Hir]) -> Result<Patch, Error>
[src]
fn c_alternate(&mut self, exprs: &[Hir]) -> Result<Patch, Error>
fn c_repeat(&mut self, rep: &Repetition) -> Result<Patch, Error>
[src]
fn c_repeat(&mut self, rep: &Repetition) -> Result<Patch, Error>
fn c_repeat_zero_or_one(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
[src]
fn c_repeat_zero_or_one(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
fn c_repeat_zero_or_more(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
[src]
fn c_repeat_zero_or_more(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
fn c_repeat_one_or_more(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
[src]
fn c_repeat_one_or_more(
&mut self,
expr: &Hir,
greedy: bool
) -> Result<Patch, Error>
fn c_repeat_range_min_or_more(
&mut self,
expr: &Hir,
greedy: bool,
min: u32
) -> Result<Patch, Error>
[src]
fn c_repeat_range_min_or_more(
&mut self,
expr: &Hir,
greedy: bool,
min: u32
) -> Result<Patch, Error>
fn c_repeat_range(
&mut self,
expr: &Hir,
greedy: bool,
min: u32,
max: u32
) -> Result<Patch, Error>
[src]
fn c_repeat_range(
&mut self,
expr: &Hir,
greedy: bool,
min: u32,
max: u32
) -> Result<Patch, Error>
fn fill(&mut self, hole: Hole, goto: usize)
[src]
fn fill(&mut self, hole: Hole, goto: usize)
fn fill_to_next(&mut self, hole: Hole)
[src]
fn fill_to_next(&mut self, hole: Hole)
fn fill_split(
&mut self,
hole: Hole,
goto1: Option<usize>,
goto2: Option<usize>
) -> Hole
[src]
fn fill_split(
&mut self,
hole: Hole,
goto1: Option<usize>,
goto2: Option<usize>
) -> Hole
fn push_compiled(&mut self, inst: Inst)
[src]
fn push_compiled(&mut self, inst: Inst)
fn push_hole(&mut self, inst: InstHole) -> Hole
[src]
fn push_hole(&mut self, inst: InstHole) -> Hole
fn push_split_hole(&mut self) -> Hole
[src]
fn push_split_hole(&mut self) -> Hole
fn check_size(&self) -> Result<(), Error>
[src]
fn check_size(&self) -> Result<(), Error>