Enum regex::Regex
[-] [+]
[src]
pub enum Regex { // some variants omitted }
A compiled regular expression
It is represented as either a sequence of bytecode instructions (dynamic)
or as a specialized Rust function (native). It can be used to search, split
or replace text. All searching is done with an implicit .*?
at the
beginning and end of an expression. To force an expression to match the
whole string (or a prefix or a suffix), you must use an anchor like ^
or
$
(or \A
and \z
).
While this crate will handle Unicode strings (whether in the regular expression or in the search text), all positions returned are byte indices. Every byte index is guaranteed to be at a Unicode code point boundary.
The lifetimes 'r
and 't
in this crate correspond to the lifetime of a
compiled regular expression and text to search, respectively.
The only methods that allocate new strings are the string replacement methods. All other methods (searching and splitting) return borrowed pointers into the string given.
Examples
Find the location of a US phone number:
let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap(); assert_eq!(re.find("phone: 111-222-3333"), Some((7, 19)));
Using the std::str::StrExt
methods with Regex
Since Regex
implements Pattern
, you can use regexes with methods
defined on std::str::StrExt
. For example, is_match
, find
, find_iter
and split
can be replaced with StrExt::contains
, StrExt::find
,
StrExt::match_indices
and StrExt::split
.
Here are some examples:
let re = Regex::new(r"\d+").unwrap(); let haystack = "a111b222c"; assert!(haystack.contains(&re)); assert_eq!(haystack.find(&re), Some(1)); assert_eq!(haystack.match_indices(&re).collect::<Vec<_>>(), vec![(1, 4), (5, 8)]); assert_eq!(haystack.split(&re).collect::<Vec<_>>(), vec!["a", "b", "c"]);
Methods
impl Regex
fn new(re: &str) -> Result<Regex, Error>
Compiles a dynamic regular expression. Once compiled, it can be used repeatedly to search, split or replace text in a string.
If an invalid expression is given, then an error is returned.
fn is_match(&self, text: &str) -> bool
Returns true if and only if the regex matches the string given.
Example
Test if some text contains at least one word with exactly 13 characters:
let text = "I categorically deny having triskaidekaphobia."; let matched = Regex::new(r"\b\w{13}\b").unwrap().is_match(text); assert!(matched);
fn find(&self, text: &str) -> Option<(usize, usize)>
Returns the start and end byte range of the leftmost-first match in
text
. If no match exists, then None
is returned.
Note that this should only be used if you want to discover the position
of the match. Testing the existence of a match is faster if you use
is_match
.
Example
Find the start and end location of the first word with exactly 13 characters:
let text = "I categorically deny having triskaidekaphobia."; let pos = Regex::new(r"\b\w{13}\b").unwrap().find(text); assert_eq!(pos, Some((2, 15)));
fn find_iter<'r, 't>(&'r self, text: &'t str) -> FindMatches<'r, 't>
Returns an iterator for each successive non-overlapping match in
text
, returning the start and end byte indices with respect to
text
.
Example
Find the start and end location of every word with exactly 13 characters:
let text = "Retroactively relinquishing remunerations is reprehensible."; for pos in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) { println!("{:?}", pos); } // Output: // (0, 13) // (14, 27) // (28, 41) // (45, 58)
fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>
Returns the capture groups corresponding to the leftmost-first
match in text
. Capture group 0
always corresponds to the entire
match. If no match is found, then None
is returned.
You should only use captures
if you need access to submatches.
Otherwise, find
is faster for discovering the location of the overall
match.
Examples
Say you have some text with movie names and their release years, like "'Citizen Kane' (1941)". It'd be nice if we could search for text looking like that, while also extracting the movie name and its release year separately.
let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap(); let text = "Not my favorite movie: 'Citizen Kane' (1941)."; let caps = re.captures(text).unwrap(); assert_eq!(caps.at(1), Some("Citizen Kane")); assert_eq!(caps.at(2), Some("1941")); assert_eq!(caps.at(0), Some("'Citizen Kane' (1941)"));
Note that the full match is at capture group 0
. Each subsequent
capture group is indexed by the order of its opening (
.
We can make this example a bit clearer by using named capture groups:
let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)") .unwrap(); let text = "Not my favorite movie: 'Citizen Kane' (1941)."; let caps = re.captures(text).unwrap(); assert_eq!(caps.name("title"), Some("Citizen Kane")); assert_eq!(caps.name("year"), Some("1941")); assert_eq!(caps.at(0), Some("'Citizen Kane' (1941)"));
Here we name the capture groups, which we can access with the name
method. Note that the named capture groups are still accessible with
at
.
The 0
th capture group is always unnamed, so it must always be
accessed with at(0)
.
fn captures_iter<'r, 't>(&'r self, text: &'t str) -> FindCaptures<'r, 't>
Returns an iterator over all the non-overlapping capture groups matched
in text
. This is operationally the same as find_iter
(except it
yields information about submatches).
Example
We can use this to find all movie titles and their release years in some text, where the movie is formatted like "'Title' (xxxx)":
let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)") .unwrap(); let text = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."; for caps in re.captures_iter(text) { println!("Movie: {:?}, Released: {:?}", caps.name("title"), caps.name("year")); } // Output: // Movie: Citizen Kane, Released: 1941 // Movie: The Wizard of Oz, Released: 1939 // Movie: M, Released: 1931
fn split<'r, 't>(&'r self, text: &'t str) -> RegexSplits<'r, 't>
Returns an iterator of substrings of text
delimited by a match
of the regular expression.
Namely, each element of the iterator corresponds to text that isn't
matched by the regular expression.
This method will not copy the text given.
Example
To split a string delimited by arbitrary amounts of spaces or tabs:
let re = Regex::new(r"[ \t]+").unwrap(); let fields: Vec<&str> = re.split("a b \t c\td e").collect(); assert_eq!(fields, vec!("a", "b", "c", "d", "e"));
fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> RegexSplitsN<'r, 't>
Returns an iterator of at most limit
substrings of text
delimited
by a match of the regular expression. (A limit
of 0
will return no
substrings.)
Namely, each element of the iterator corresponds to text that isn't
matched by the regular expression.
The remainder of the string that is not split will be the last element
in the iterator.
This method will not copy the text given.
Example
Get the first two words in some text:
let re = Regex::new(r"\W+").unwrap(); let fields: Vec<&str> = re.splitn("Hey! How are you?", 3).collect(); assert_eq!(fields, vec!("Hey", "How", "are you?"));
fn replace<R: Replacer>(&self, text: &str, rep: R) -> String
Replaces the leftmost-first match with the replacement provided.
The replacement can be a regular string (where $N
and $name
are
expanded to match capture groups) or a function that takes the matches'
Captures
and returns the replaced string.
If no match is found, then a copy of the string is returned unchanged.
Examples
Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:
let re = Regex::new("[^01]+").unwrap(); assert_eq!(re.replace("1078910", ""), "1010");
But anything satisfying the Replacer
trait will work. For example,
a closure of type |&Captures| -> String
provides direct access to the
captures corresponding to a match. This allows one to access
submatches easily:
let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap(); let result = re.replace("Springsteen, Bruce", |caps: &Captures| { format!("{} {}", caps.at(2).unwrap_or(""), caps.at(1).unwrap_or("")) }); assert_eq!(result, "Bruce Springsteen");
But this is a bit cumbersome to use all the time. Instead, a simple
syntax is supported that expands $name
into the corresponding capture
group. Here's the last example, but using this expansion technique
with named capture groups:
let re = Regex::new(r"(?P<last>[^,\s]+),\s+(?P<first>\S+)").unwrap(); let result = re.replace("Springsteen, Bruce", "$first $last"); assert_eq!(result, "Bruce Springsteen");
Note that using $2
instead of $first
or $1
instead of $last
would produce the same result. To write a literal $
use $$
.
Finally, sometimes you just want to replace a literal string with no
submatch expansion. This can be done by wrapping a string with
NoExpand
:
use regex::NoExpand; let re = Regex::new(r"(?P<last>[^,\s]+),\s+(\S+)").unwrap(); let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last")); assert_eq!(result, "$2 $last");
fn replace_all<R: Replacer>(&self, text: &str, rep: R) -> String
Replaces all non-overlapping matches in text
with the
replacement provided. This is the same as calling replacen
with
limit
set to 0
.
See the documentation for replace
for details on how to access
submatches in the replacement string.
fn replacen<R: Replacer>(&self, text: &str, limit: usize, rep: R) -> String
Replaces at most limit
non-overlapping matches in text
with the
replacement provided. If limit
is 0, then all non-overlapping matches
are replaced.
See the documentation for replace
for details on how to access
submatches in the replacement string.
fn as_str<'a>(&'a self) -> &'a str
Returns the original string of this regex.