Commit graph

24 commits

Author SHA1 Message Date
Nick Alexander
09f1d633b5 Part 4: Parse queries with rust-peg.
There's an unfortunate conflation here between implementing the query
parser in `rust-peg` and moving some validation that now happens at
parse time to happen later.  The result is that we introduce
`ParsedFindQuery` as a less-processed `FindQuery`, and that we only
use string errors (which is all `rust-peg` supports) instead of the
structured errors in query-parser's errors module.  The next commit
will address this, on the road to removing the `query-parser` module
entirely.
2018-06-04 15:04:39 -07:00
Nick Alexander
729fe59578 [edn] Pre: Rename keyword to namespaced_keyword.
The `Keyword` type evolved to become more general: we now use the one
type for both :regular and :name/spaced keywords.  This changes
reflects the new generality.
2018-06-04 14:52:51 -07:00
Richard Newman
b2e98f44f6
Generalize Entity by value type. (#701) (#691) r=rnewman
* Part 3: Parameterize Entity by value type.

This isn't quite right, because after parsing, we shouldn't care
about` `edn::ValueAndSpan`, we should care only about edn::Value.
However, I think we can drop `ValueAndSpan` entirely if we just use
`rust-peg` (and its simpler error messages) rather than a mix of
`rust-peg` and `combine`.

In any case, this paves the way to transacting `Entity<TypedValue>`,
which is a nice step towards building general entities.

* Part 1: Add AttributePlace.

* Part 2: Name other places EntityPlace and ValuePlace.

Now we're consistent and closer to self-documenting.  Both matter more
as we expose `Entity` as the thing to build for programmatic usage.

* Part 4: Allow Ident and TempId in ValuePlace.

The parser will never produce these, since determining whether an
integer/keyword or string is an ident or a tempid, respectively, in
the value place requires the schema.

But a builder that produces `Entity` instances directly will want to
produce these.
2018-05-15 00:43:07 -07:00
Richard Newman
3dc68bcd38 Combine NamespacedKeyword and Keyword. (#689) r=nalexander
* Make properties on NamespacedKeyword/NamespacedSymbol private

* Use only a single String for NamespacedKeyword/NamespacedSymbol

* Review comments.

* Remove unsafe code in namespaced_name.

Benchmarking shows approximately zero change.

* Allow the types of ns and name to differ when constructing a NamespacedName.

* Make symbol namespaces optional.

* Normalize names of keyword/symbol constructors.

This will make the subsequent refactor much less painful.

* Use expect not unwrap.

* Merge Keyword and NamespacedKeyword.
2018-05-11 09:52:17 -07:00
Nick Alexander
c8f74fa41b [edn] Round-trip instants. (#686) (#687) r=rnewman
First, the parser had a small grouping bug where-by it wouldn't parse
Z as timezone correctly.  Second, we weren't printing instants in the format
that we parse.
2018-05-11 02:11:04 -07:00
Nick Alexander
cbffe5e545 Use rust-peg for tx parsing.
There are few reasons to do this:

- it's difficult to add symbol interning to combine-based parsers like
  tx-parser -- literally every type changes to reflect the interner,
  and that means every convenience macro we've built needs to chagne.
  It's trivial to add interning to rust-peg-based parsers.

- combine has rolled forward to 3.2, and I spent a similar amount of
  time investigating how to upgrade tx-parser (to take advantage of
  the new parser! macros in combine that I think are necessary for
  adapting to changing types) as I did just converting to rust-peg.

- it's easy to improve the error messages in rust-peg, where-as I have
  tried twice to improve the nested error messages in combine and am
  stumped.

- it's roughly 4x faster to parse strings directly as opposed to
  edn::ValueAndSpan, and it'll be even better when we intern directly.
2018-05-10 10:24:05 -07:00
Richard Newman
df58de52f4
Correctly parse and unescape quotes etc. inside EDN strings. (#434) (#589) 2018-03-15 07:13:27 -07:00
Richard Newman
9b23cf3945
Speed up EDN parser (fixes #445) (#581) r=nalexander
Fixes from @kevinmehall.

* Prefer character sets over backtracking in the EDN parser.
* Avoid duplicate effort when parsing floats in the EDN parser.
* Clean up duplicate position tracking code.

This turns out to have little performance impact, but makes the grammar
much cleaner.

* Fix EDN work to pass tests with correct numeric precedence.
2018-03-05 20:33:51 -08:00
Richard Newman
c600152d78
Update some dependencies. (#492) r=etoop
* Update some dependencies.

* Update rusqlite to 0.12.

* Update error-chain to a forked version that implements Sync.

* Fix some compiler warnings.

* Remove unused imports in tests.

* Parse errors no longer naturally print with the expected symbol.
2017-11-21 16:24:08 +00:00
Richard Newman
daca8def57 UUIDs and instants. Fixes #44, #45, #426, #427. (#438) r=nalexander
* Pre: unused import in translate.rs.

* Part 2: take a dependency on rusqlite for query arguments.

* Part 1: flatten V2 schema into V1. Add UUID and URI.

Bump expected ident and bootstrap datom count in tests.

* Part 5: parse edn::Value::Uuid.

* Part 3: extend ValueType and TypedValue to include Uuid.

* Part 4: add Uuid to query arguments.

* Part 6: extend db to support Uuid.

* Part 8: add a tx-parser test for #f NaN and #uuid.

* Part 7: parse and algebrize UUIDs in queries.

* Part 1: parse #inst in EDN and throughout query engine.

* Part 3: handle instants in db.

* Part 2: instants never matches integers in queries.

* Part 4: use DateTime for tx_instants.

* Add a test for adding and querying UUIDs and instants.

* Review comments.
2017-04-28 20:11:55 -07:00
Nick Alexander
5369f03464 Improve parsing of nested edn::ValueAndSpan streams. r=rnewman (#393)
* Pre: Expose more in edn.

* Pre: Make it easier to work with ValueAndSpan.

with_spans() is a temporary hack, needed only because I don't care to
parse the bootstrap assertions from text right now.

* Part 1a: Add `value_and_span` for parsing nested `edn::ValueAndSpan` instances.

I wasn't able to abstract over `edn::Value` and `edn::ValueAndSpan`;
there are multiple obstacles.  I chose to roll with
`edn::ValueAndSpan` since it exposes the additional span information
that we will want to form good error messages in the future.

* Part 1b: Add keyword_map() parsing an `edn::Value::Vector` into an `edn::Value::map`.

* Part 1c: Add `Log`/`.log(...)` for logging parser progress.

This is a terrible hack, but it sure helps to debug complicated nested
parsers.  I don't even know what a principled approach would look
like; since our parser combinators are so frequently expressed in
code, it's hard to imagine a data-driven interpreter that can help
debug things.

* Part 2: Use `value_and_span` apparatus in tx-parser/.

I break an abstraction boundary by returning a value column
`edn::ValueAndSpan` rather than just an `edn::Value`.  That is, the
transaction processor shouldn't care where the `edn::Value` it is
processing arose -- even we care to track that information we should
bake it into the `Entity` type.  We do this because we need to
dynamically parse the value column to support nested maps, and parsing
requires a full `edn::ValueAndSpan`.  Alternately, we could cheat and
fake the spans when parsing nested maps, but that's potentially
expensive.

* Part 3: Use `value_and_span` apparatus in query-parser/.

* Part 4: Use `value_and_span` apparatus in root crate.

* Review comment: Make Span and SpanPosition Copy.

* Review comment: nits.

* Review comment: Make `or` be `or_exactly`.

I baked the eof checking directly into the parser, rather than using
the skip and eof parsers.  I also took the time to restore some tests
that were mistakenly commented out.

* Review comment: Extract and use def_matches_* macros.

* Review comment: .map() as late as possible.
2017-04-06 10:06:28 -07:00
Richard Newman
85f3b79f75 Support a limited set of '.'-prefixed non-keyword symbols. (#352) r=nalexander
This commit allows `.` and `...` to parse correctly as `PlainSymbol`.

Tests in edn, query-translator, and the top level have been added.
2017-03-06 15:01:19 -08:00
Victor Porof
896d7f8f88 Add a span component to edn::Value, r=ncalexan
Signed-off-by: Victor Porof <victor.porof@gmail.com>
2017-02-17 18:31:26 +01:00
Jordan Santell
4f5c94891a Add octal, hexadecimal, and arbitrary base integers to the EDN parser. Fixes #277. r=rnewman (#286) 2017-02-10 16:03:35 -08:00
Victor Porof
42580539b8 Properly handle whitespace for Infinity and NaN, r=rnewman (#246)
Signed-off-by: Victor Porof <vporof@mozilla.com>
2017-02-09 18:13:44 +01:00
Victor Porof
a627f532f0 Relax whitespace rules for edn vectors, lists, sets and maps
Signed-off-by: Victor Porof <vporof@mozilla.com>
2017-02-04 08:45:31 +01:00
Victor Porof
419db388da Relax whitespace rules for Infinity and NaN
Signed-off-by: Victor Porof <vporof@mozilla.com>
2017-02-04 08:45:02 +01:00
Jordan Santell
0b20d7691b Parse and display EDN values for NaN, +Infinity and -Infinity. Fixes #232 (#238) r=victorporof 2017-02-03 10:14:23 -08:00
Victor Porof
9ee0ac8e00 Unify and generalize keywords and symbols parsing
Signed-off-by: Victor Porof <vporof@mozilla.com>
2017-02-03 09:06:42 +01:00
Victor Porof
72da5722ae Update rustpeg to latest version and follow new syntax and formatting rules
Signed-off-by: Victor Porof <vporof@mozilla.com>
2017-02-03 09:06:42 +01:00
Nick Alexander
ab041291fb edn: Bound values by optional whitespace; treat comma as whitespace. 2017-01-18 08:34:27 -08:00
Nick Alexander
247035cc9b edn: Allow comments.
EDN supports only one type of comment: initiated by ; and lasting
until the end of the current line or the end of the input stream.
2017-01-18 08:34:27 -08:00
Richard Newman
a152e60040 Read EDN keywords and symbols as rich types. Fixes #154. r=nalexander 2017-01-12 09:09:48 -08:00
Joe Walker
c4735119c4 Implement a basic EDN parser. (#149) r=rnewman,bgrins,nalexander
The parser mostly works and has a decent test suite. It parses all the
queries issued by the Tofino UAS, with some caveats. Known flaws:

* No support for tagged elements, comments, discarded elements or "'".
* Incomplete support for escaped characters in strings and the range of
  characters that are allowed in keywords and symbols.
* Possible whitespace handling problems.
2017-01-11 13:03:04 -08:00