Expand the README to give a guide to the crates in the repo.

This commit is contained in:
Richard Newman 2018-03-19 14:32:58 -07:00
parent 16a66517e4
commit f71b2b207e

103
README.md
View file

@ -8,6 +8,7 @@ The first version of Project Mentat, named Datomish, [was written in ClojureScri
The Rust implementation gives us a smaller compiled output, better performance, more type safety, better tooling, and easier deployment into Firefox and mobile platforms.
---
## Motivation
@ -101,6 +102,8 @@ SQLite is a traditional SQL database in most respects: schemas conflate semantic
Mentat aims to offer many of the advantages of SQLite — single-file use, embeddability, and good performance — while building a more relaxed, reusable, and expressive data model on top.
---
## Contributing
Please note that this project is released with a Contributor Code of Conduct.
@ -112,6 +115,8 @@ This project is very new, so we'll probably revise these guidelines. Please
comment on an issue before putting significant effort in if you'd like to
contribute.
---
## Building
You first need to clone the project. To build and test the project, we are using [Cargo](https://crates.io/install).
@ -135,14 +140,100 @@ cargo test -p mentat_query_parser -- --nocapture
For most `cargo` commands you can pass the `-p` argument to run the command just on that package. So, `cargo build -p mentat_query_parser` will build just the "query-parser" folder.
## License
## What are all of these crates?
Project Mentat is currently licensed under the Apache License v2.0. See the `LICENSE` file for details.
We use multiple sub-crates for Mentat for four reasons:
1. To improve incremental build times.
2. To encourage encapsulation; writing `extern crate` feels worse than just `use mod`.
3. To simplify the creation of targets that don't use certain features: _e.g._, a build with no syncing, or with no query system.
4. To allow for reuse (_e.g._, the EDN parser is essentially a separate library).
So what are they?
### Building blocks
#### `edn`
Our EDN parser. It uses `rust-peg` to parse [EDN](https://github.com/edn-format/edn), which is Clojure/Datomic's richer alternative to JSON. `edn`'s dependencies are all either for representing rich values (`chrono`, `uuid`, `ordered-float`) or for parsing (`serde`, `peg`).
#### `mentat_core`
This is the lowest-level Mentat crate. It collects together the following things:
- Fundamental domain-specific data structures like `ValueType` and `TypedValue`.
- Fundamental SQL-related linkages like `SQLValueType`. These encode the mapping between Mentat's types and values and their representation in our SQLite format.
- Conversion to and from EDN types (_e.g._, `edn::NamespacedKeyword` to `TypedValue::Keyword`).
- Common utilities (some in the `util` module, and others that should be moved there or broken out) like `Either`, `InternSet`, and `RcCounter`.
- Reusable lazy namespaced keywords (_e.g._, `DB_TYPE_DOUBLE`) that are used by `mentat_db` and EDN serialization of core structs.
#### `mentat_parser_utils`
This is a utility library for writing `combine` parsers over streams of `edn::Value`/`edn::ValueAndSpan`.
### Types
#### `mentat_query`
This crate defines the structs and enums that are the output of the query parser and used by the translator and algebrizer. `SrcVar`, `NonIntegerConstant`, `FnArg`… these all live here.
#### `mentat_query_sql`
Similarly, this crate defines an abstract representation of a SQL query as understood by Mentat. This bridges between Mentat's types (_e.g._, `TypedValue`) and SQL concepts (`ColumnOrExpression`, `GroupBy`). It's produced by the algebrizer and consumed by the translator.
#### `mentat_tx`
Mentat has two main inputs: reads (queries) and writes (transacts). Just as `mentat_query` defines the types produced by the query parser, `mentat_tx` defines the types produced by the tx parser.
### Transact processing
#### `mentat_tx_parser`
This is a `combine` parser that turns a stream of EDN values into a representation suitable to be transacted.
### Query processing
#### `mentat_query_parser`
This is a `combine` parser that uses `mentat_parser_utils` and `mentat_query` to turn a stream of EDN values into a more usable representation of a query.
#### `mentat_query_algebrizer`
This is the biggest piece of the query engine. It takes a parsed query, which at this point is _independent of a database_, and combines it with the current state of the schema and data. This involves translating keywords into attributes, abstract values into concrete values with a known type, and producing an `AlgebraicQuery`, which is a representation of how a query's Datalog semantics can be satisfied as SQL table joins and constraints over Mentat's SQL schema. An algebrized query is tightly coupled with both the disk schema and the vocabulary present in the store when the work is done.
#### `mentat_query_projector`
A Datalog query _projects_ some of the variables in the query into data structures in the output. This crate takes an algebrized query and a projection list and figures out how to get values out of the running SQL query and into the right format for the consumer.
#### `mentat_query_translator`
This crate works with all of the above to turn the output of the algebrizer and projector into the data structures defined in `mentat_query_sql`.
#### `mentat_sql`
This simple crate turns those data structures into SQL text and bindings that can later be executed by `rusqlite`.
### The data layer: `mentat_db`
This is a big one: it implements the core storage logic on top of SQLite. This crate is responsible for bootstrapping new databases, transacting new data, maintaining the attribute cache, and building and updating in-memory representations of the storage schema.
### The main crate
The top-level main crate of Mentat assembles these component crates into something useful. It wraps up a connection to a database file and the associated metadata into a `Store`, and encapsulates an in-progress transaction (`InProgress`). It provides modules for programmatically writing (`entity_builder.rs`) and managing vocabulary (`vocabulary.rs`).
### Syncing
Sync code lives, for [referential reasons](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying), in a crate named `tolstoy`. This code is a work in progress.
### The command-line interface
This is under `tools/cli`. It's essentially an external consumer of the main `mentat` crate. This code is ugly, but it mostly works.
---
## SQLite dependencies
Mentat uses partial indices, which are available in SQLite 3.8.0 and higher.
Mentat uses partial indices, which are available in SQLite 3.8.0 and higher. It relies on correlation between aggregate and non-aggregate columns in the output, which was added in SQLite 3.7.11.
It also uses FTS4, which is [a compile time option](http://www.sqlite.org/fts3.html#section_2).
@ -156,3 +247,9 @@ version = "0.6"
# System sqlite is known to be new.
default-features = false
```
---
## License
Project Mentat is currently licensed under the Apache License v2.0. See the `LICENSE` file for details.