mentat/parser-utils/src/lib.rs

216 lines
8 KiB
Rust
Raw Normal View History

// Copyright 2016 Mozilla
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use
// this file except in compliance with the License. You may obtain a copy of the
// License at http://www.apache.org/licenses/LICENSE-2.0
// Unless required by applicable law or agreed to in writing, software distributed
// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.
extern crate combine;
extern crate edn;
Improve parsing of nested `edn::ValueAndSpan` streams. r=rnewman (#393) * Pre: Expose more in edn. * Pre: Make it easier to work with ValueAndSpan. with_spans() is a temporary hack, needed only because I don't care to parse the bootstrap assertions from text right now. * Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances. I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`; there are multiple obstacles. I chose to roll with `edn::ValueAndSpan` since it exposes the additional span information that we will want to form good error messages in the future. * Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`. * Part 1c: Add `Log`/`.log(...)` for logging parser progress. This is a terrible hack, but it sure helps to debug complicated nested parsers. I don't even know what a principled approach would look like; since our parser combinators are so frequently expressed in code, it's hard to imagine a data-driven interpreter that can help debug things. * Part 2: Use `value_and_span` apparatus in tx-parser/. I break an abstraction boundary by returning a value column `edn::ValueAndSpan` rather than just an `edn::Value`. That is, the transaction processor shouldn't care where the `edn::Value` it is processing arose -- even we care to track that information we should bake it into the `Entity` type. We do this because we need to dynamically parse the value column to support nested maps, and parsing requires a full `edn::ValueAndSpan`. Alternately, we could cheat and fake the spans when parsing nested maps, but that's potentially expensive. * Part 3: Use `value_and_span` apparatus in query-parser/. * Part 4: Use `value_and_span` apparatus in root crate. * Review comment: Make Span and SpanPosition Copy. * Review comment: nits. * Review comment: Make `or` be `or_exactly`. I baked the eof checking directly into the parser, rather than using the skip and eof parsers. I also took the time to restore some tests that were mistakenly commented out. * Review comment: Extract and use def_matches_* macros. * Review comment: .map() as late as possible.
2017-04-06 17:06:28 +00:00
use combine::{
ParseResult,
};
use combine::combinator::{
Expected,
FnParser,
};
pub mod log;
pub mod value_and_span;
pub use log::{
LogParsing,
};
2017-02-02 23:11:55 +00:00
/// A type definition for a function parser that either parses an `O` from an input stream of type
/// `I`, or fails with an "expected" failure.
/// See <https://docs.rs/combine/2.2.1/combine/trait.Parser.html#method.expected> for more
/// illumination.
/// Nothing about this is specific to the result type of the parser.
pub type ResultParser<O, I> = Expected<FnParser<I, fn(I) -> ParseResult<O, I>>>;
/// `assert_parses_to!` simplifies some of the boilerplate around running a
/// parser function against input and expecting a certain result.
#[macro_export]
macro_rules! assert_parses_to {
Improve parsing of nested `edn::ValueAndSpan` streams. r=rnewman (#393) * Pre: Expose more in edn. * Pre: Make it easier to work with ValueAndSpan. with_spans() is a temporary hack, needed only because I don't care to parse the bootstrap assertions from text right now. * Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances. I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`; there are multiple obstacles. I chose to roll with `edn::ValueAndSpan` since it exposes the additional span information that we will want to form good error messages in the future. * Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`. * Part 1c: Add `Log`/`.log(...)` for logging parser progress. This is a terrible hack, but it sure helps to debug complicated nested parsers. I don't even know what a principled approach would look like; since our parser combinators are so frequently expressed in code, it's hard to imagine a data-driven interpreter that can help debug things. * Part 2: Use `value_and_span` apparatus in tx-parser/. I break an abstraction boundary by returning a value column `edn::ValueAndSpan` rather than just an `edn::Value`. That is, the transaction processor shouldn't care where the `edn::Value` it is processing arose -- even we care to track that information we should bake it into the `Entity` type. We do this because we need to dynamically parse the value column to support nested maps, and parsing requires a full `edn::ValueAndSpan`. Alternately, we could cheat and fake the spans when parsing nested maps, but that's potentially expensive. * Part 3: Use `value_and_span` apparatus in query-parser/. * Part 4: Use `value_and_span` apparatus in root crate. * Review comment: Make Span and SpanPosition Copy. * Review comment: nits. * Review comment: Make `or` be `or_exactly`. I baked the eof checking directly into the parser, rather than using the skip and eof parsers. I also took the time to restore some tests that were mistakenly commented out. * Review comment: Extract and use def_matches_* macros. * Review comment: .map() as late as possible.
2017-04-06 17:06:28 +00:00
( $parser: expr, $input: expr, $expected: expr ) => {{
let mut par = $parser();
Improve parsing of nested `edn::ValueAndSpan` streams. r=rnewman (#393) * Pre: Expose more in edn. * Pre: Make it easier to work with ValueAndSpan. with_spans() is a temporary hack, needed only because I don't care to parse the bootstrap assertions from text right now. * Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances. I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`; there are multiple obstacles. I chose to roll with `edn::ValueAndSpan` since it exposes the additional span information that we will want to form good error messages in the future. * Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`. * Part 1c: Add `Log`/`.log(...)` for logging parser progress. This is a terrible hack, but it sure helps to debug complicated nested parsers. I don't even know what a principled approach would look like; since our parser combinators are so frequently expressed in code, it's hard to imagine a data-driven interpreter that can help debug things. * Part 2: Use `value_and_span` apparatus in tx-parser/. I break an abstraction boundary by returning a value column `edn::ValueAndSpan` rather than just an `edn::Value`. That is, the transaction processor shouldn't care where the `edn::Value` it is processing arose -- even we care to track that information we should bake it into the `Entity` type. We do this because we need to dynamically parse the value column to support nested maps, and parsing requires a full `edn::ValueAndSpan`. Alternately, we could cheat and fake the spans when parsing nested maps, but that's potentially expensive. * Part 3: Use `value_and_span` apparatus in query-parser/. * Part 4: Use `value_and_span` apparatus in root crate. * Review comment: Make Span and SpanPosition Copy. * Review comment: nits. * Review comment: Make `or` be `or_exactly`. I baked the eof checking directly into the parser, rather than using the skip and eof parsers. I also took the time to restore some tests that were mistakenly commented out. * Review comment: Extract and use def_matches_* macros. * Review comment: .map() as late as possible.
2017-04-06 17:06:28 +00:00
let result = par.parse($input.with_spans().into_atom_stream()).map(|x| x.0); // TODO: check remainder of stream.
assert_eq!(result, Ok($expected));
}}
}
/// `satisfy_unwrap!` makes it a little easier to implement a `satisfy_map`
/// body that matches a particular `Value` enum case, otherwise returning `None`.
#[macro_export]
macro_rules! satisfy_unwrap {
( $cas: path, $var: ident, $body: block ) => {
satisfy_map(|x: edn::Value| if let $cas($var) = x $body else { None })
}
}
/// Generate a `satisfy_map` expression that matches a `PlainSymbol`
/// value with the given name.
///
/// We do this rather than using `combine::token` so that we don't
/// need to allocate a new `String` inside a `PlainSymbol` inside a `Value`
/// just to match input.
#[macro_export]
macro_rules! matches_plain_symbol {
($name: expr, $input: ident) => {
satisfy_map(|x: edn::Value| {
if let edn::Value::PlainSymbol(ref s) = x {
if s.0.as_str() == $name {
return Some(());
}
}
return None;
}).parse_stream($input)
}
}
/// Define an `impl` body for the `$parser` type. The body will contain a parser
/// function called `$name`, consuming a stream of `$item_type`s. The parser's
/// result type will be `$result_type`.
///
/// The provided `$body` will be evaluated with `$input` bound to the input stream.
///
/// `$body`, when run, should return a `ParseResult` of the appropriate result type.
#[macro_export]
macro_rules! def_parser_fn {
( $parser: ident, $name: ident, $item_type: ty, $result_type: ty, $input: ident, $body: block ) => {
impl<I> $parser<I> where I: Stream<Item = $item_type> {
fn $name() -> ResultParser<$result_type, I> {
fn inner<I: Stream<Item = $item_type>>($input: I) -> ParseResult<$result_type, I> {
$body
}
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
parser(inner as fn(I) -> ParseResult<$result_type, I>).expected(stringify!($name))
Improve parsing of nested `edn::ValueAndSpan` streams. r=rnewman (#393) * Pre: Expose more in edn. * Pre: Make it easier to work with ValueAndSpan. with_spans() is a temporary hack, needed only because I don't care to parse the bootstrap assertions from text right now. * Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances. I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`; there are multiple obstacles. I chose to roll with `edn::ValueAndSpan` since it exposes the additional span information that we will want to form good error messages in the future. * Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`. * Part 1c: Add `Log`/`.log(...)` for logging parser progress. This is a terrible hack, but it sure helps to debug complicated nested parsers. I don't even know what a principled approach would look like; since our parser combinators are so frequently expressed in code, it's hard to imagine a data-driven interpreter that can help debug things. * Part 2: Use `value_and_span` apparatus in tx-parser/. I break an abstraction boundary by returning a value column `edn::ValueAndSpan` rather than just an `edn::Value`. That is, the transaction processor shouldn't care where the `edn::Value` it is processing arose -- even we care to track that information we should bake it into the `Entity` type. We do this because we need to dynamically parse the value column to support nested maps, and parsing requires a full `edn::ValueAndSpan`. Alternately, we could cheat and fake the spans when parsing nested maps, but that's potentially expensive. * Part 3: Use `value_and_span` apparatus in query-parser/. * Part 4: Use `value_and_span` apparatus in root crate. * Review comment: Make Span and SpanPosition Copy. * Review comment: nits. * Review comment: Make `or` be `or_exactly`. I baked the eof checking directly into the parser, rather than using the skip and eof parsers. I also took the time to restore some tests that were mistakenly commented out. * Review comment: Extract and use def_matches_* macros. * Review comment: .map() as late as possible.
2017-04-06 17:06:28 +00:00
}
}
}
}
#[macro_export]
macro_rules! def_parser {
( $parser: ident, $name: ident, $result_type: ty, $body: block ) => {
impl $parser {
fn $name() -> ResultParser<$result_type, $crate::value_and_span::Stream> {
fn inner(input: $crate::value_and_span::Stream) -> ParseResult<$result_type, $crate::value_and_span::Stream> {
$body.parse_lazy(input).into()
}
parser(inner as fn($crate::value_and_span::Stream) -> ParseResult<$result_type, $crate::value_and_span::Stream>).expected(stringify!($name))
}
}
}
}
/// `def_value_parser_fn` is a short-cut to `def_parser_fn` with the input type
/// being `edn::Value`.
#[macro_export]
macro_rules! def_value_parser_fn {
( $parser: ident, $name: ident, $result_type: ty, $input: ident, $body: block ) => {
def_parser_fn!($parser, $name, edn::Value, $result_type, $input, $body);
}
}
/// `def_value_satisfy_parser_fn` is a short-cut to `def_parser_fn` with the input type
/// being `edn::Value` and the body being a call to `satisfy_map` with the given transformer.
///
/// In practice this allows you to simply pass a function that accepts an `&edn::Value` and
/// returns an `Option<$result_type>`: if a suitable value is at the front of the stream,
/// it will be converted and returned by the parser; otherwise, the parse will fail.
#[macro_export]
macro_rules! def_value_satisfy_parser_fn {
( $parser: ident, $name: ident, $result_type: ty, $transformer: path ) => {
def_value_parser_fn!($parser, $name, $result_type, input, {
satisfy_map(|x: edn::Value| $transformer(&x)).parse_stream(input)
});
}
}
Extract partial storage abstraction; use error-chain throughout. Fixes #328. r=rnewman (#341) * Pre: Drop unneeded tx0 from search results. * Pre: Don't require a schema in some of the DB code. The idea is to separate the transaction applying code, which is schema-aware, from the concrete storage code, which is just concerned with getting bits onto disk. * Pre: Only reference Schema, not DB, in debug module. This is part of a larger separation of the volatile PartitionMap, which is modified every transaction, from the stable Schema, which is infrequently modified. * Pre: Fix indentation. * Extract part of DB to new SchemaTypeChecking trait. * Extract part of DB to new PartitionMapping trait. * Pre: Don't expect :db.part/tx partition to advance when tx fails. This fails right now, because we allocate tx IDs even when we shouldn't. * Sketch a db interface without DB. * Add ValueParseError; use error-chain in tx-parser. This can be simplified when https://github.com/Marwes/combine/issues/86 makes it to a published release, but this unblocks us for now. This converts the `combine` error type `ParseError<&'a [edn::Value]>` to a type with owned `Vec<edn::Value>` collections, re-using `edn::Value::Vector` for making them `Display`. * Pre: Accept Borrow<Schema> instead of just &Schema in debug module. This makes it easy to use Rc<Schema> or Arc<Schema> without inserting &* sigils throughout the code. * Use error-chain in query-parser. There are a few things to point out here: - the fine grained error types have been flattened into one crate-wide error type; it's pretty easy to regain the granularity as needed. - edn::ParseError is automatically lifted to mentat_query_parser::errors::Error; - we use mentat_parser_utils::ValueParser to maintain parsing error information from `combine`. * Patch up top-level. * Review comment: Only `borrow()` once.
2017-02-24 23:32:41 +00:00
/// A `ValueParseError` is a `combine::primitives::ParseError`-alike that implements the `Debug`,
/// `Display`, and `std::error::Error` traits. In addition, it doesn't capture references, making
/// it possible to store `ValueParseError` instances in local links with the `error-chain` crate.
///
/// This is achieved by wrapping slices of type `&'a [edn::Value]` in an owning type that implements
/// `Display`; rather than introducing a newtype like `DisplayVec`, we re-use `edn::Value::Vector`.
#[derive(PartialEq)]
pub struct ValueParseError {
pub position: usize,
// Think of this as `Vec<Error<edn::Value, DisplayVec<edn::Value>>>`; see above.
pub errors: Vec<combine::primitives::Error<edn::Value, edn::Value>>,
}
impl std::fmt::Debug for ValueParseError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f,
"ParseError {{ position: {:?}, errors: {:?} }}",
self.position,
self.errors)
}
}
impl std::fmt::Display for ValueParseError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
try!(writeln!(f, "Parse error at {}", self.position));
combine::primitives::Error::fmt_errors(&self.errors, f)
}
}
impl std::error::Error for ValueParseError {
fn description(&self) -> &str {
"parse error parsing EDN values"
}
}
impl<'a> From<combine::primitives::ParseError<&'a [edn::Value]>> for ValueParseError {
fn from(e: combine::primitives::ParseError<&'a [edn::Value]>) -> ValueParseError {
ValueParseError {
position: e.position,
errors: e.errors.into_iter().map(|e| e.map_range(|r| {
let mut v = Vec::new();
v.extend_from_slice(r);
edn::Value::Vector(v)
})).collect(),
}
}
}
/// Allow to map the range types of combine::primitives::{Info, Error}.
trait MapRange<R, S> {
type Output;
fn map_range<F>(self, f: F) -> Self::Output where F: FnOnce(R) -> S;
}
impl<T, R, S> MapRange<R, S> for combine::primitives::Info<T, R> {
type Output = combine::primitives::Info<T, S>;
fn map_range<F>(self, f: F) -> combine::primitives::Info<T, S> where F: FnOnce(R) -> S {
use combine::primitives::Info::*;
match self {
Token(t) => Token(t),
Range(r) => Range(f(r)),
Owned(s) => Owned(s),
Borrowed(x) => Borrowed(x),
}
}
}
impl<T, R, S> MapRange<R, S> for combine::primitives::Error<T, R> {
type Output = combine::primitives::Error<T, S>;
fn map_range<F>(self, f: F) -> combine::primitives::Error<T, S> where F: FnOnce(R) -> S {
use combine::primitives::Error::*;
match self {
Unexpected(x) => Unexpected(x.map_range(f)),
Expected(x) => Expected(x.map_range(f)),
Message(x) => Message(x.map_range(f)),
Other(x) => Other(x),
}
}
}