Function regex::utf8::decode_utf8 [−][src]
pub fn decode_utf8(src: &[u8]) -> Option<(char, usize)>
Decode a single UTF-8 sequence into a single Unicode codepoint from src
.
If no valid UTF-8 sequence could be found, then None
is returned.
Otherwise, the decoded codepoint and the number of bytes read is returned.
The number of bytes read (for a valid UTF-8 sequence) is guaranteed to be
1, 2, 3 or 4.
Note that a UTF-8 sequence is invalid if it is incorrect UTF-8, encodes a codepoint that is out of range (surrogate codepoints are out of range) or is not the shortest possible UTF-8 sequence for that codepoint.