mentat/docs/apis/rust/regex/bytes/index.html
2018-06-22 12:08:32 +01:00

347 lines
No EOL
19 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="rustdoc">
<meta name="description" content="API documentation for the Rust `bytes` mod in crate `regex`.">
<meta name="keywords" content="rust, rustlang, rust-lang, bytes">
<title>regex::bytes - Rust</title>
<link rel="stylesheet" type="text/css" href="../../normalize.css">
<link rel="stylesheet" type="text/css" href="../../rustdoc.css"
id="mainThemeStyle">
<link rel="stylesheet" type="text/css" href="../../dark.css">
<link rel="stylesheet" type="text/css" href="../../light.css" id="themeStyle">
<script src="../../storage.js"></script>
</head>
<body class="rustdoc mod">
<!--[if lte IE 8]>
<div class="warning">
This old browser is unsupported and will most likely display funky
things.
</div>
<![endif]-->
<nav class="sidebar">
<div class="sidebar-menu">&#9776;</div>
<p class='location'>Module bytes</p><div class="sidebar-elems"><div class="block items"><ul><li><a href="#structs">Structs</a></li><li><a href="#traits">Traits</a></li></ul></div><p class='location'><a href='../index.html'>regex</a></p><script>window.sidebarCurrent = {name: 'bytes', ty: 'mod', relpath: '../'};</script><script defer src="../sidebar-items.js"></script></div>
</nav>
<div class="theme-picker">
<button id="theme-picker" aria-label="Pick another theme!">
<img src="../../brush.svg" width="18" alt="Pick another theme!">
</button>
<div id="theme-choices"></div>
</div>
<script src="../../theme.js"></script>
<nav class="sub">
<form class="search-form js-only">
<div class="search-container">
<input class="search-input" name="search"
autocomplete="off"
placeholder="Click or press S to search, ? for more options…"
type="search">
</div>
</form>
</nav>
<section id='main' class="content"><h1 class='fqn'><span class='in-band'>Module <a href='../index.html'>regex</a>::<wbr><a class="mod" href=''>bytes</a></span><span class='out-of-band'><span id='render-detail'>
<a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">
[<span class='inner'>&#x2212;</span>]
</a>
</span><a class='srclink' href='../../src/regex/lib.rs.html#642-648' title='goto source code'>[src]</a></span></h1><div class='docblock'><p>Match regular expressions on arbitrary bytes.</p>
<p>This module provides a nearly identical API to the one found in the
top-level of this crate. There are two important differences:</p>
<ol>
<li>Matching is done on <code>&amp;[u8]</code> instead of <code>&amp;str</code>. Additionally, <code>Vec&lt;u8&gt;</code>
is used where <code>String</code> would have been used.</li>
<li>Unicode support can be disabled even when disabling it would result in
matching invalid UTF-8 bytes.</li>
</ol>
<h1 id="example-match-null-terminated-string" class="section-header"><a href="#example-match-null-terminated-string">Example: match null terminated string</a></h1>
<p>This shows how to find all null-terminated strings in a slice of bytes:</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r&quot;(?-u)(?P&lt;cstr&gt;[^\x00]+)\x00&quot;</span>).<span class="ident">unwrap</span>();
<span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b&quot;foo\x00bar\x00baz\x00&quot;</span>;
<span class="comment">// Extract all of the strings without the null terminator from each match.</span>
<span class="comment">// The unwrap is OK here since a match requires the `cstr` capture to match.</span>
<span class="kw">let</span> <span class="ident">cstrs</span>: <span class="ident">Vec</span><span class="op">&lt;</span><span class="kw-2">&amp;</span>[<span class="ident">u8</span>]<span class="op">&gt;</span> <span class="op">=</span>
<span class="ident">re</span>.<span class="ident">captures_iter</span>(<span class="ident">text</span>)
.<span class="ident">map</span>(<span class="op">|</span><span class="ident">c</span><span class="op">|</span> <span class="ident">c</span>.<span class="ident">name</span>(<span class="string">&quot;cstr&quot;</span>).<span class="ident">unwrap</span>().<span class="ident">as_bytes</span>())
.<span class="ident">collect</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="kw-2">&amp;</span><span class="string">b&quot;foo&quot;</span>[..], <span class="kw-2">&amp;</span><span class="string">b&quot;bar&quot;</span>[..], <span class="kw-2">&amp;</span><span class="string">b&quot;baz&quot;</span>[..]], <span class="ident">cstrs</span>);</pre>
<h1 id="example-selectively-enable-unicode-support" class="section-header"><a href="#example-selectively-enable-unicode-support">Example: selectively enable Unicode support</a></h1>
<p>This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded
string (e.g., to extract a title from a Matroska file):</p>
<pre class="rust rust-example-rendered">
<span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(
<span class="string">r&quot;(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))&quot;</span>
).<span class="ident">unwrap</span>();
<span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b&quot;\x12\xd0\x3b\x5f\x7b\xa9\x85\xe2\x98\x83\x80\x98\x54\x76\x68\x65&quot;</span>;
<span class="kw">let</span> <span class="ident">caps</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="ident">text</span>).<span class="ident">unwrap</span>();
<span class="comment">// Notice that despite the `.*` at the end, it will only match valid UTF-8</span>
<span class="comment">// because Unicode mode was enabled with the `u` flag. Without the `u` flag,</span>
<span class="comment">// the `.*` would match the rest of the bytes.</span>
<span class="kw">let</span> <span class="ident">mat</span> <span class="op">=</span> <span class="ident">caps</span>.<span class="ident">get</span>(<span class="number">1</span>).<span class="ident">unwrap</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>((<span class="number">7</span>, <span class="number">10</span>), (<span class="ident">mat</span>.<span class="ident">start</span>(), <span class="ident">mat</span>.<span class="ident">end</span>()));
<span class="comment">// If there was a match, Unicode mode guarantees that `title` is valid UTF-8.</span>
<span class="kw">let</span> <span class="ident">title</span> <span class="op">=</span> <span class="ident">str</span>::<span class="ident">from_utf8</span>(<span class="kw-2">&amp;</span><span class="ident">caps</span>[<span class="number">1</span>]).<span class="ident">unwrap</span>();
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="string">&quot;&quot;</span>, <span class="ident">title</span>);</pre>
<p>In general, if the Unicode flag is enabled in a capture group and that capture
is part of the overall match, then the capture is <em>guaranteed</em> to be valid
UTF-8.</p>
<h1 id="syntax" class="section-header"><a href="#syntax">Syntax</a></h1>
<p>The supported syntax is pretty much the same as the syntax for Unicode
regular expressions with a few changes that make sense for matching arbitrary
bytes:</p>
<ol>
<li>The <code>u</code> flag can be disabled even when disabling it might cause the regex to
match invalid UTF-8. When the <code>u</code> flag is disabled, the regex is said to be in
&quot;ASCII compatible&quot; mode.</li>
<li>In ASCII compatible mode, neither Unicode scalar values nor Unicode
character classes are allowed.</li>
<li>In ASCII compatible mode, Perl character classes (<code>\w</code>, <code>\d</code> and <code>\s</code>)
revert to their typical ASCII definition. <code>\w</code> maps to <code>[[:word:]]</code>, <code>\d</code> maps
to <code>[[:digit:]]</code> and <code>\s</code> maps to <code>[[:space:]]</code>.</li>
<li>In ASCII compatible mode, word boundaries use the ASCII compatible <code>\w</code> to
determine whether a byte is a word byte or not.</li>
<li>Hexadecimal notation can be used to specify arbitrary bytes instead of
Unicode codepoints. For example, in ASCII compatible mode, <code>\xFF</code> matches the
literal byte <code>\xFF</code>, while in Unicode mode, <code>\xFF</code> is a Unicode codepoint that
matches its UTF-8 encoding of <code>\xC3\xBF</code>. Similarly for octal notation when
enabled.</li>
<li><code>.</code> matches any <em>byte</em> except for <code>\n</code> instead of any Unicode scalar value.
When the <code>s</code> flag is enabled, <code>.</code> matches any byte.</li>
</ol>
<h1 id="performance" class="section-header"><a href="#performance">Performance</a></h1>
<p>In general, one should expect performance on <code>&amp;[u8]</code> to be roughly similar to
performance on <code>&amp;str</code>.</p>
</div><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2>
<table>
<tr class=' module-item'>
<td><a class="struct" href="struct.CaptureMatches.html"
title='struct regex::bytes::CaptureMatches'>CaptureMatches</a></td>
<td class='docblock-short'>
<p>An iterator that yields all non-overlapping capture groups matching a
particular regular expression.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.CaptureNames.html"
title='struct regex::bytes::CaptureNames'>CaptureNames</a></td>
<td class='docblock-short'>
<p>An iterator over the names of all possible captures.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.Captures.html"
title='struct regex::bytes::Captures'>Captures</a></td>
<td class='docblock-short'>
<p>Captures represents a group of captured byte strings for a single match.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.Match.html"
title='struct regex::bytes::Match'>Match</a></td>
<td class='docblock-short'>
<p>Match represents a single match of a regex in a haystack.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.Matches.html"
title='struct regex::bytes::Matches'>Matches</a></td>
<td class='docblock-short'>
<p>An iterator over all non-overlapping matches for a particular string.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.NoExpand.html"
title='struct regex::bytes::NoExpand'>NoExpand</a></td>
<td class='docblock-short'>
<p><code>NoExpand</code> indicates literal byte string replacement.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.Regex.html"
title='struct regex::bytes::Regex'>Regex</a></td>
<td class='docblock-short'>
<p>A compiled regular expression for matching arbitrary bytes.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.RegexBuilder.html"
title='struct regex::bytes::RegexBuilder'>RegexBuilder</a></td>
<td class='docblock-short'>
<p>A configurable builder for a regular expression.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.RegexSet.html"
title='struct regex::bytes::RegexSet'>RegexSet</a></td>
<td class='docblock-short'>
<p>Match multiple (possibly overlapping) regular expressions in a single scan.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.RegexSetBuilder.html"
title='struct regex::bytes::RegexSetBuilder'>RegexSetBuilder</a></td>
<td class='docblock-short'>
<p>A configurable builder for a set of regular expressions.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.ReplacerRef.html"
title='struct regex::bytes::ReplacerRef'>ReplacerRef</a></td>
<td class='docblock-short'>
<p>By-reference adaptor for a <code>Replacer</code></p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.SetMatches.html"
title='struct regex::bytes::SetMatches'>SetMatches</a></td>
<td class='docblock-short'>
<p>A set of matches returned by a regex set.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.SetMatchesIntoIter.html"
title='struct regex::bytes::SetMatchesIntoIter'>SetMatchesIntoIter</a></td>
<td class='docblock-short'>
<p>An owned iterator over the set of matches from a regex set.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.SetMatchesIter.html"
title='struct regex::bytes::SetMatchesIter'>SetMatchesIter</a></td>
<td class='docblock-short'>
<p>A borrowed iterator over the set of matches from a regex set.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.Split.html"
title='struct regex::bytes::Split'>Split</a></td>
<td class='docblock-short'>
<p>Yields all substrings delimited by a regular expression match.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.SplitN.html"
title='struct regex::bytes::SplitN'>SplitN</a></td>
<td class='docblock-short'>
<p>Yields at most <code>N</code> substrings delimited by a regular expression match.</p>
</td>
</tr>
<tr class=' module-item'>
<td><a class="struct" href="struct.SubCaptureMatches.html"
title='struct regex::bytes::SubCaptureMatches'>SubCaptureMatches</a></td>
<td class='docblock-short'>
<p>An iterator that yields all capturing matches in the order in which they
appear in the regex.</p>
</td>
</tr></table><h2 id='traits' class='section-header'><a href="#traits">Traits</a></h2>
<table>
<tr class=' module-item'>
<td><a class="trait" href="trait.Replacer.html"
title='trait regex::bytes::Replacer'>Replacer</a></td>
<td class='docblock-short'>
<p>Replacer describes types that can be used to replace matches in a byte
string.</p>
</td>
</tr></table></section>
<section id='search' class="content hidden"></section>
<section class="footer"></section>
<aside id="help" class="hidden">
<div>
<h1 class="hidden">Help</h1>
<div class="shortcuts">
<h2>Keyboard Shortcuts</h2>
<dl>
<dt><kbd>?</kbd></dt>
<dd>Show this help dialog</dd>
<dt><kbd>S</kbd></dt>
<dd>Focus the search field</dd>
<dt><kbd></kbd></dt>
<dd>Move up in search results</dd>
<dt><kbd></kbd></dt>
<dd>Move down in search results</dd>
<dt><kbd></kbd></dt>
<dd>Switch tab</dd>
<dt><kbd>&#9166;</kbd></dt>
<dd>Go to active search result</dd>
<dt><kbd>+</kbd></dt>
<dd>Expand all sections</dd>
<dt><kbd>-</kbd></dt>
<dd>Collapse all sections</dd>
</dl>
</div>
<div class="infos">
<h2>Search Tricks</h2>
<p>
Prefix searches with a type followed by a colon (e.g.
<code>fn:</code>) to restrict the search to a given type.
</p>
<p>
Accepted types are: <code>fn</code>, <code>mod</code>,
<code>struct</code>, <code>enum</code>,
<code>trait</code>, <code>type</code>, <code>macro</code>,
and <code>const</code>.
</p>
<p>
Search functions by type signature (e.g.
<code>vec -> usize</code> or <code>* -> vec</code>)
</p>
</div>
</div>
</aside>
<script>
window.rootPath = "../../";
window.currentCrate = "regex";
</script>
<script src="../../main.js"></script>
<script defer src="../../search-index.js"></script>
</body>
</html>