Function regex::utf8::decode_utf8[][src]

pub fn decode_utf8(src: &[u8]) -> Option<(char, usize)>

Decode a single UTF-8 sequence into a single Unicode codepoint from src.

If no valid UTF-8 sequence could be found, then None is returned. Otherwise, the decoded codepoint and the number of bytes read is returned. The number of bytes read (for a valid UTF-8 sequence) is guaranteed to be 1, 2, 3 or 4.

Note that a UTF-8 sequence is invalid if it is incorrect UTF-8, encodes a codepoint that is out of range (surrogate codepoints are out of range) or is not the shortest possible UTF-8 sequence for that codepoint.