347 lines
19 KiB
HTML
347 lines
19 KiB
HTML
|
<!DOCTYPE html>
|
|||
|
<html lang="en">
|
|||
|
<head>
|
|||
|
<meta charset="utf-8">
|
|||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|||
|
<meta name="generator" content="rustdoc">
|
|||
|
<meta name="description" content="API documentation for the Rust `bytes` mod in crate `regex`.">
|
|||
|
<meta name="keywords" content="rust, rustlang, rust-lang, bytes">
|
|||
|
|
|||
|
<title>regex::bytes - Rust</title>
|
|||
|
|
|||
|
<link rel="stylesheet" type="text/css" href="../../normalize.css">
|
|||
|
<link rel="stylesheet" type="text/css" href="../../rustdoc.css"
|
|||
|
id="mainThemeStyle">
|
|||
|
|
|||
|
<link rel="stylesheet" type="text/css" href="../../dark.css">
|
|||
|
<link rel="stylesheet" type="text/css" href="../../light.css" id="themeStyle">
|
|||
|
<script src="../../storage.js"></script>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
</head>
|
|||
|
<body class="rustdoc mod">
|
|||
|
<!--[if lte IE 8]>
|
|||
|
<div class="warning">
|
|||
|
This old browser is unsupported and will most likely display funky
|
|||
|
things.
|
|||
|
</div>
|
|||
|
<![endif]-->
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<nav class="sidebar">
|
|||
|
<div class="sidebar-menu">☰</div>
|
|||
|
|
|||
|
<p class='location'>Module bytes</p><div class="sidebar-elems"><div class="block items"><ul><li><a href="#structs">Structs</a></li><li><a href="#traits">Traits</a></li></ul></div><p class='location'><a href='../index.html'>regex</a></p><script>window.sidebarCurrent = {name: 'bytes', ty: 'mod', relpath: '../'};</script><script defer src="../sidebar-items.js"></script></div>
|
|||
|
</nav>
|
|||
|
|
|||
|
<div class="theme-picker">
|
|||
|
<button id="theme-picker" aria-label="Pick another theme!">
|
|||
|
<img src="../../brush.svg" width="18" alt="Pick another theme!">
|
|||
|
</button>
|
|||
|
<div id="theme-choices"></div>
|
|||
|
</div>
|
|||
|
<script src="../../theme.js"></script>
|
|||
|
<nav class="sub">
|
|||
|
<form class="search-form js-only">
|
|||
|
<div class="search-container">
|
|||
|
<input class="search-input" name="search"
|
|||
|
autocomplete="off"
|
|||
|
placeholder="Click or press ‘S’ to search, ‘?’ for more options…"
|
|||
|
type="search">
|
|||
|
</div>
|
|||
|
</form>
|
|||
|
</nav>
|
|||
|
|
|||
|
<section id='main' class="content"><h1 class='fqn'><span class='in-band'>Module <a href='../index.html'>regex</a>::<wbr><a class="mod" href=''>bytes</a></span><span class='out-of-band'><span id='render-detail'>
|
|||
|
<a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">
|
|||
|
[<span class='inner'>−</span>]
|
|||
|
</a>
|
|||
|
</span><a class='srclink' href='../../src/regex/lib.rs.html#642-648' title='goto source code'>[src]</a></span></h1><div class='docblock'><p>Match regular expressions on arbitrary bytes.</p>
|
|||
|
<p>This module provides a nearly identical API to the one found in the
|
|||
|
top-level of this crate. There are two important differences:</p>
|
|||
|
<ol>
|
|||
|
<li>Matching is done on <code>&[u8]</code> instead of <code>&str</code>. Additionally, <code>Vec<u8></code>
|
|||
|
is used where <code>String</code> would have been used.</li>
|
|||
|
<li>Unicode support can be disabled even when disabling it would result in
|
|||
|
matching invalid UTF-8 bytes.</li>
|
|||
|
</ol>
|
|||
|
<h1 id="example-match-null-terminated-string" class="section-header"><a href="#example-match-null-terminated-string">Example: match null terminated string</a></h1>
|
|||
|
<p>This shows how to find all null-terminated strings in a slice of bytes:</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?-u)(?P<cstr>[^\x00]+)\x00"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b"foo\x00bar\x00baz\x00"</span>;
|
|||
|
|
|||
|
<span class="comment">// Extract all of the strings without the null terminator from each match.</span>
|
|||
|
<span class="comment">// The unwrap is OK here since a match requires the `cstr` capture to match.</span>
|
|||
|
<span class="kw">let</span> <span class="ident">cstrs</span>: <span class="ident">Vec</span><span class="op"><</span><span class="kw-2">&</span>[<span class="ident">u8</span>]<span class="op">></span> <span class="op">=</span>
|
|||
|
<span class="ident">re</span>.<span class="ident">captures_iter</span>(<span class="ident">text</span>)
|
|||
|
.<span class="ident">map</span>(<span class="op">|</span><span class="ident">c</span><span class="op">|</span> <span class="ident">c</span>.<span class="ident">name</span>(<span class="string">"cstr"</span>).<span class="ident">unwrap</span>().<span class="ident">as_bytes</span>())
|
|||
|
.<span class="ident">collect</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="kw-2">&</span><span class="string">b"foo"</span>[..], <span class="kw-2">&</span><span class="string">b"bar"</span>[..], <span class="kw-2">&</span><span class="string">b"baz"</span>[..]], <span class="ident">cstrs</span>);</pre>
|
|||
|
<h1 id="example-selectively-enable-unicode-support" class="section-header"><a href="#example-selectively-enable-unicode-support">Example: selectively enable Unicode support</a></h1>
|
|||
|
<p>This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded
|
|||
|
string (e.g., to extract a title from a Matroska file):</p>
|
|||
|
|
|||
|
<pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(
|
|||
|
<span class="string">r"(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))"</span>
|
|||
|
).<span class="ident">unwrap</span>();
|
|||
|
<span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b"\x12\xd0\x3b\x5f\x7b\xa9\x85\xe2\x98\x83\x80\x98\x54\x76\x68\x65"</span>;
|
|||
|
<span class="kw">let</span> <span class="ident">caps</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="ident">text</span>).<span class="ident">unwrap</span>();
|
|||
|
|
|||
|
<span class="comment">// Notice that despite the `.*` at the end, it will only match valid UTF-8</span>
|
|||
|
<span class="comment">// because Unicode mode was enabled with the `u` flag. Without the `u` flag,</span>
|
|||
|
<span class="comment">// the `.*` would match the rest of the bytes.</span>
|
|||
|
<span class="kw">let</span> <span class="ident">mat</span> <span class="op">=</span> <span class="ident">caps</span>.<span class="ident">get</span>(<span class="number">1</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>((<span class="number">7</span>, <span class="number">10</span>), (<span class="ident">mat</span>.<span class="ident">start</span>(), <span class="ident">mat</span>.<span class="ident">end</span>()));
|
|||
|
|
|||
|
<span class="comment">// If there was a match, Unicode mode guarantees that `title` is valid UTF-8.</span>
|
|||
|
<span class="kw">let</span> <span class="ident">title</span> <span class="op">=</span> <span class="ident">str</span>::<span class="ident">from_utf8</span>(<span class="kw-2">&</span><span class="ident">caps</span>[<span class="number">1</span>]).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="string">"☃"</span>, <span class="ident">title</span>);</pre>
|
|||
|
<p>In general, if the Unicode flag is enabled in a capture group and that capture
|
|||
|
is part of the overall match, then the capture is <em>guaranteed</em> to be valid
|
|||
|
UTF-8.</p>
|
|||
|
<h1 id="syntax" class="section-header"><a href="#syntax">Syntax</a></h1>
|
|||
|
<p>The supported syntax is pretty much the same as the syntax for Unicode
|
|||
|
regular expressions with a few changes that make sense for matching arbitrary
|
|||
|
bytes:</p>
|
|||
|
<ol>
|
|||
|
<li>The <code>u</code> flag can be disabled even when disabling it might cause the regex to
|
|||
|
match invalid UTF-8. When the <code>u</code> flag is disabled, the regex is said to be in
|
|||
|
"ASCII compatible" mode.</li>
|
|||
|
<li>In ASCII compatible mode, neither Unicode scalar values nor Unicode
|
|||
|
character classes are allowed.</li>
|
|||
|
<li>In ASCII compatible mode, Perl character classes (<code>\w</code>, <code>\d</code> and <code>\s</code>)
|
|||
|
revert to their typical ASCII definition. <code>\w</code> maps to <code>[[:word:]]</code>, <code>\d</code> maps
|
|||
|
to <code>[[:digit:]]</code> and <code>\s</code> maps to <code>[[:space:]]</code>.</li>
|
|||
|
<li>In ASCII compatible mode, word boundaries use the ASCII compatible <code>\w</code> to
|
|||
|
determine whether a byte is a word byte or not.</li>
|
|||
|
<li>Hexadecimal notation can be used to specify arbitrary bytes instead of
|
|||
|
Unicode codepoints. For example, in ASCII compatible mode, <code>\xFF</code> matches the
|
|||
|
literal byte <code>\xFF</code>, while in Unicode mode, <code>\xFF</code> is a Unicode codepoint that
|
|||
|
matches its UTF-8 encoding of <code>\xC3\xBF</code>. Similarly for octal notation when
|
|||
|
enabled.</li>
|
|||
|
<li><code>.</code> matches any <em>byte</em> except for <code>\n</code> instead of any Unicode scalar value.
|
|||
|
When the <code>s</code> flag is enabled, <code>.</code> matches any byte.</li>
|
|||
|
</ol>
|
|||
|
<h1 id="performance" class="section-header"><a href="#performance">Performance</a></h1>
|
|||
|
<p>In general, one should expect performance on <code>&[u8]</code> to be roughly similar to
|
|||
|
performance on <code>&str</code>.</p>
|
|||
|
</div><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.CaptureMatches.html"
|
|||
|
title='struct regex::bytes::CaptureMatches'>CaptureMatches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator that yields all non-overlapping capture groups matching a
|
|||
|
particular regular expression.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.CaptureNames.html"
|
|||
|
title='struct regex::bytes::CaptureNames'>CaptureNames</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over the names of all possible captures.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Captures.html"
|
|||
|
title='struct regex::bytes::Captures'>Captures</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Captures represents a group of captured byte strings for a single match.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Match.html"
|
|||
|
title='struct regex::bytes::Match'>Match</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Match represents a single match of a regex in a haystack.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Matches.html"
|
|||
|
title='struct regex::bytes::Matches'>Matches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator over all non-overlapping matches for a particular string.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.NoExpand.html"
|
|||
|
title='struct regex::bytes::NoExpand'>NoExpand</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p><code>NoExpand</code> indicates literal byte string replacement.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Regex.html"
|
|||
|
title='struct regex::bytes::Regex'>Regex</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A compiled regular expression for matching arbitrary bytes.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexBuilder.html"
|
|||
|
title='struct regex::bytes::RegexBuilder'>RegexBuilder</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A configurable builder for a regular expression.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexSet.html"
|
|||
|
title='struct regex::bytes::RegexSet'>RegexSet</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Match multiple (possibly overlapping) regular expressions in a single scan.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.RegexSetBuilder.html"
|
|||
|
title='struct regex::bytes::RegexSetBuilder'>RegexSetBuilder</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A configurable builder for a set of regular expressions.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.ReplacerRef.html"
|
|||
|
title='struct regex::bytes::ReplacerRef'>ReplacerRef</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>By-reference adaptor for a <code>Replacer</code></p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatches.html"
|
|||
|
title='struct regex::bytes::SetMatches'>SetMatches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A set of matches returned by a regex set.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatchesIntoIter.html"
|
|||
|
title='struct regex::bytes::SetMatchesIntoIter'>SetMatchesIntoIter</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An owned iterator over the set of matches from a regex set.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SetMatchesIter.html"
|
|||
|
title='struct regex::bytes::SetMatchesIter'>SetMatchesIter</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>A borrowed iterator over the set of matches from a regex set.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.Split.html"
|
|||
|
title='struct regex::bytes::Split'>Split</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Yields all substrings delimited by a regular expression match.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SplitN.html"
|
|||
|
title='struct regex::bytes::SplitN'>SplitN</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Yields at most <code>N</code> substrings delimited by a regular expression match.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="struct" href="struct.SubCaptureMatches.html"
|
|||
|
title='struct regex::bytes::SubCaptureMatches'>SubCaptureMatches</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>An iterator that yields all capturing matches in the order in which they
|
|||
|
appear in the regex.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr></table><h2 id='traits' class='section-header'><a href="#traits">Traits</a></h2>
|
|||
|
<table>
|
|||
|
<tr class=' module-item'>
|
|||
|
<td><a class="trait" href="trait.Replacer.html"
|
|||
|
title='trait regex::bytes::Replacer'>Replacer</a></td>
|
|||
|
<td class='docblock-short'>
|
|||
|
<p>Replacer describes types that can be used to replace matches in a byte
|
|||
|
string.</p>
|
|||
|
|
|||
|
</td>
|
|||
|
</tr></table></section>
|
|||
|
<section id='search' class="content hidden"></section>
|
|||
|
|
|||
|
<section class="footer"></section>
|
|||
|
|
|||
|
<aside id="help" class="hidden">
|
|||
|
<div>
|
|||
|
<h1 class="hidden">Help</h1>
|
|||
|
|
|||
|
<div class="shortcuts">
|
|||
|
<h2>Keyboard Shortcuts</h2>
|
|||
|
|
|||
|
<dl>
|
|||
|
<dt><kbd>?</kbd></dt>
|
|||
|
<dd>Show this help dialog</dd>
|
|||
|
<dt><kbd>S</kbd></dt>
|
|||
|
<dd>Focus the search field</dd>
|
|||
|
<dt><kbd>↑</kbd></dt>
|
|||
|
<dd>Move up in search results</dd>
|
|||
|
<dt><kbd>↓</kbd></dt>
|
|||
|
<dd>Move down in search results</dd>
|
|||
|
<dt><kbd>↹</kbd></dt>
|
|||
|
<dd>Switch tab</dd>
|
|||
|
<dt><kbd>⏎</kbd></dt>
|
|||
|
<dd>Go to active search result</dd>
|
|||
|
<dt><kbd>+</kbd></dt>
|
|||
|
<dd>Expand all sections</dd>
|
|||
|
<dt><kbd>-</kbd></dt>
|
|||
|
<dd>Collapse all sections</dd>
|
|||
|
</dl>
|
|||
|
</div>
|
|||
|
|
|||
|
<div class="infos">
|
|||
|
<h2>Search Tricks</h2>
|
|||
|
|
|||
|
<p>
|
|||
|
Prefix searches with a type followed by a colon (e.g.
|
|||
|
<code>fn:</code>) to restrict the search to a given type.
|
|||
|
</p>
|
|||
|
|
|||
|
<p>
|
|||
|
Accepted types are: <code>fn</code>, <code>mod</code>,
|
|||
|
<code>struct</code>, <code>enum</code>,
|
|||
|
<code>trait</code>, <code>type</code>, <code>macro</code>,
|
|||
|
and <code>const</code>.
|
|||
|
</p>
|
|||
|
|
|||
|
<p>
|
|||
|
Search functions by type signature (e.g.
|
|||
|
<code>vec -> usize</code> or <code>* -> vec</code>)
|
|||
|
</p>
|
|||
|
</div>
|
|||
|
</div>
|
|||
|
</aside>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<script>
|
|||
|
window.rootPath = "../../";
|
|||
|
window.currentCrate = "regex";
|
|||
|
</script>
|
|||
|
<script src="../../main.js"></script>
|
|||
|
<script defer src="../../search-index.js"></script>
|
|||
|
</body>
|
|||
|
</html>
|