Struct regex_syntax::ast::parse::ParserI[][src]

struct ParserI<'s, P> {
    parser: P,
    pattern: &'s str,
}

ParserI is the internal parser implementation.

We use this separate type so that we can carry the provided pattern string along with us. In particular, a Parser internal state is not tied to any one pattern, but ParserI is.

This type also lets us use ParserI<&Parser> in production code while retaining the convenience of ParserI<Parser> for tests, which sometimes work against the internal interface of the parser.

Fields

The parser state/configuration.

The full regular expression provided by the user.

Methods

impl<'s, P: Borrow<Parser>> ParserI<'s, P>
[src]

Build an internal parser from a parser configuration and a pattern.

Return a reference to the parser state.

Return a reference to the pattern being parsed.

Create a new error with the given span and error type.

Return the current offset of the parser.

The offset starts at 0 from the beginning of the regular expression pattern string.

Return the current line number of the parser.

The line number starts at 1.

Return the current column of the parser.

The column number starts at 1 and is reset whenever a \n is seen.

Return the next capturing index. Each subsequent call increments the internal index.

The span given should correspond to the location of the opening parenthesis.

If the capture limit is exceeded, then an error is returned.

Adds the given capture name to this parser. If this capture name has already been used, then an error is returned.

Return whether the parser should ignore whitespace or not.

Return the character at the current position of the parser.

This panics if the current position does not point to a valid char.

Return the character at the given position.

This panics if the given position does not point to a valid char.

Bump the parser to the next Unicode scalar value.

If the end of the input has been reached, then false is returned.

If the substring starting at the current position of the parser has the given prefix, then bump the parser to the character immediately following the prefix and return true. Otherwise, don't bump the parser and return false.

Returns true if and only if the parser is positioned at a look-around prefix. The conditions under which this returns true must always correspond to a regular expression that would otherwise be consider invalid.

This should only be called immediately after parsing the opening of a group or a set of flags.

Bump the parser, and if the x flag is enabled, bump through any subsequent spaces. Return true if and only if the parser is not at EOF.

If the x flag is enabled (i.e., whitespace insensitivity with comments), then this will advance the parser through all whitespace and comments to the next non-whitespace non-comment byte.

If the x flag is disabled, then this is a no-op.

This should be used selectively throughout the parser where arbitrary whitespace is permitted when the x flag is enabled. For example, { 5 , 6} is equivalent to {5,6}.

Peek at the next character in the input without advancing the parser.

If the input has been exhausted, then this returns None.

Like peek, but will ignore spaces when the parser is in whitespace insensitive mode.

Returns true if the next call to bump would return false.

Return the current position of the parser, which includes the offset, line and column.

Create a span at the current position of the parser. Both the start and end of the span are set.

Create a span that covers the current character.

Parse and push a single alternation on to the parser's internal stack. If the top of the stack already has an alternation, then add to that instead of pushing a new one.

The concatenation given corresponds to a single alternation branch. The concatenation returned starts the next branch and is empty.

This assumes the parser is currently positioned at | and will advance the parser to the character following |.

Pushes or adds the given branch of an alternation to the parser's internal stack of state.

Parse and push a group AST (and its parent concatenation) on to the parser's internal stack. Return a fresh concatenation corresponding to the group's sub-AST.

If a set of flags was found (with no group), then the concatenation is returned with that set of flags added.

This assumes that the parser is currently positioned on the opening parenthesis. It advances the parser to the character at the start of the sub-expression (or adjoining expression).

If there was a problem parsing the start of the group, then an error is returned.

Pop a group AST from the parser's internal stack and set the group's AST to the given concatenation. Return the concatenation containing the group.

This assumes that the parser is currently positioned on the closing parenthesis and advances the parser to the character following the ).

If no such group could be popped, then an unopened group error is returned.

Pop the last state from the parser's internal stack, if it exists, and add the given concatenation to it. There either must be no state or a single alternation item on the stack. Any other scenario produces an error.

This assumes that the parser has advanced to the end.

Parse the opening of a character class and push the current class parsing context onto the parser's stack. This assumes that the parser is positioned at an opening [. The given union should correspond to the union of set items built up before seeing the [.

If there was a problem parsing the opening of the class, then an error is returned. Otherwise, a new union of set items for the class is returned (which may be populated with either a ] or a -).

Parse the end of a character class set and pop the character class parser stack. The union given corresponds to the last union built before seeing the closing ]. The union returned corresponds to the parent character class set with the nested class added to it.

This assumes that the parser is positioned at a ] and will advance the parser to the byte immediately following the ].

If the stack is empty after popping, then this returns the final "top-level" character class AST (where a "top-level" character class is one that is not nested inside any other character class).

If there is no corresponding opening bracket on the parser's stack, then an error is returned.

Return an "unclosed class" error whose span points to the most recently opened class.

This should only be called while parsing a character class.

Push the current set of class items on to the class parser's stack as the left hand side of the given operator.

A fresh set union is returned, which should be used to build the right hand side of this operator.

Pop a character class set from the character class parser stack. If the top of the stack is just an item (not an operation), then return the given set unchanged. If the top of the stack is an operation, then the given set will be used as the rhs of the operation on the top of the stack. In that case, the binary operation is returned as a set.

impl<'s, P: Borrow<Parser>> ParserI<'s, P>
[src]

Parse the regular expression into an abstract syntax tree.

Parse the regular expression and return an abstract syntax tree with all of the comments found in the pattern.

Parses an uncounted repetition operation. An uncounted repetition operator includes ?, * and +, but does not include the {m,n} syntax. The given kind should correspond to the operator observed by the caller.

This assumes that the paser is currently positioned at the repetition operator and advances the parser to the first character after the operator. (Note that the operator may include a single additional ?, which makes the operator ungreedy.)

The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.

Parses a counted repetition operation. A counted repetition operator corresponds to the {m,n} syntax, and does not include the ?, * or + operators.

This assumes that the paser is currently positioned at the opening { and advances the parser to the first character after the operator. (Note that the operator may include a single additional ?, which makes the operator ungreedy.)

The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.

Parse a group (which contains a sub-expression) or a set of flags.

If a group was found, then it is returned with an empty AST. If a set of flags is found, then that set is returned.

The parser should be positioned at the opening parenthesis.

This advances the parser to the character before the start of the sub-expression (in the case of a group) or to the closing parenthesis immediately following the set of flags.

Errors

If flags are given and incorrectly specified, then a corresponding error is returned.

If a capture name is given and it is incorrectly specified, then a corresponding error is returned.

Parses a capture group name. Assumes that the parser is positioned at the first character in the name following the opening < (and may possibly be EOF). This advances the parser to the first character following the closing >.

The caller must provide the capture index of the group for this name.

Parse a sequence of flags starting at the current character.

This advances the parser to the character immediately following the flags, which is guaranteed to be either : or ).

Errors

If any flags are duplicated, then an error is returned.

If the negation operator is used more than once, then an error is returned.

If no flags could be found or if the negation operation is not followed by any flags, then an error is returned.

Parse the current character as a flag. Do not advance the parser.

Errors

If the flag is not recognized, then an error is returned.

Parse a primitive AST. e.g., A literal, non-set character class or assertion.

This assumes that the parser expects a primitive at the current location. i.e., All other non-primitive cases have been handled. For example, if the parser's position is at |, then | will be treated as a literal (e.g., inside a character class).

This advances the parser to the first character immediately following the primitive.

Parse an escape sequence as a primitive AST.

This assumes the parser is positioned at the start of the escape sequence, i.e., \. It advances the parser to the first position immediately following the escape sequence.

Parse an octal representation of a Unicode codepoint up to 3 digits long. This expects the parser to be positioned at the first octal digit and advances the parser to the first character immediately following the octal number. This also assumes that parsing octal escapes is enabled.

Assuming the preconditions are met, this routine can never fail.

Parse a hex representation of a Unicode codepoint. This handles both hex notations, i.e., \xFF and \x{FFFF}. This expects the parser to be positioned at the x, u or U prefix. The parser is advanced to the first character immediately following the hexadecimal literal.

Parse an N-digit hex representation of a Unicode codepoint. This expects the parser to be positioned at the first digit and will advance the parser to the first character immediately following the escape sequence.

The number of digits given must be 2 (for \xNN), 4 (for \uNNNN) or 8 (for \UNNNNNNNN).

Parse a hex representation of any Unicode scalar value. This expects the parser to be positioned at the opening brace { and will advance the parser to the first character following the closing brace }.

Parse a decimal number into a u32 while trimming leading and trailing whitespace.

This expects the parser to be positioned at the first position where a decimal digit could occur. This will advance the parser to the byte immediately following the last contiguous decimal digit.

If no decimal digit could be found or if there was a problem parsing the complete set of digits into a u32, then an error is returned.

Parse a standard character class consisting primarily of characters or character ranges, but can also contain nested character classes of any type (sans .).

This assumes the parser is positioned at the opening [. If parsing is successful, then the parser is advanced to the position immediately following the closing ].

Parse a single primitive item in a character class set. The item to be parsed can either be one of a simple literal character, a range between two simple literal characters or a "primitive" character class like \w or \p{Greek}.

If an invalid escape is found, or if a character class is found where a simple literal is expected (e.g., in a range), then an error is returned.

Parse a single item in a character class as a primitive, where the primitive either consists of a verbatim literal or a single escape sequence.

This assumes the parser is positioned at the beginning of a primitive, and advances the parser to the first position after the primitive if successful.

Note that it is the caller's responsibility to report an error if an illegal primitive was parsed.

Parses the opening of a character class set. This includes the opening bracket along with ^ if present to indicate negation. This also starts parsing the opening set of unioned items if applicable, since there are special rules applied to certain characters in the opening of a character class. For example, [^]] is the class of all characters not equal to ]. (] would need to be escaped in any other position.) Similarly for -.

In all cases, the op inside the returned ast::ClassBracketed is an empty union. This empty union should be replaced with the actual item when it is popped from the parser's stack.

This assumes the parser is positioned at the opening [ and advances the parser to the first non-special byte of the character class.

An error is returned if EOF is found.

Attempt to parse an ASCII character class, e.g., [:alnum:].

This assumes the parser is positioned at the opening [.

If no valid ASCII character class could be found, then this does not advance the parser and None is returned. Otherwise, the parser is advanced to the first byte following the closing ] and the corresponding ASCII class is returned.

Parse a Unicode class in either the single character notation, \pN or the multi-character bracketed notation, \p{Greek}. This assumes the parser is positioned at the p (or P for negation) and will advance the parser to the character immediately following the class.

Note that this does not check whether the class name is valid or not.

Parse a Perl character class, e.g., \d or \W. This assumes the parser is currently at a valid character class name and will be advanced to the character immediately following the class.

Trait Implementations

impl<'s, P: Clone> Clone for ParserI<'s, P>
[src]

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

impl<'s, P: Debug> Debug for ParserI<'s, P>
[src]

Formats the value using the given formatter. Read more

Auto Trait Implementations

impl<'s, P> Send for ParserI<'s, P> where
    P: Send

impl<'s, P> Sync for ParserI<'s, P> where
    P: Sync