Initial commit.

Nick Alexander 2017-01-25 15:05:27 -08:00
parent 754d84cd96
commit f0ed86db1e

68
Transacting.md Normal file

@ -0,0 +1,68 @@
## The highest of high-level overviews
Transacting, or committing a transaction, follows this conceptual sequence:
1. EDN parsing.
2. Entity parsing: turning EDN into an abstract representation of a transaction as a series of entities to be applied to the database.
3. Entity expansion and rewriting: replacing keyword idents with numeric entids; expanding entity syntactic sugar (like map notation or embedded vector notation) into simpler entity forms.
4. Type checking: ensuring that all attribute/value pairs are well-typed.
5. *SQL transaction opened*
6. Resolving lookup-refs: ensuring all attribute/value lookup-refs map to a unique entid.
7. Resolving temp IDs: processing upserts; allocating new entids as necessary.
8. SQL evaluation: executing bulk `INSERT` and `DELETE` SQL statements to update the `transactions` and `datoms` tables in the SQL store.
9. Transaction report generation: extracting the transacted datoms from the SQL store; finalizing the temp ID map.
10. Schema evolution: interpreting any `:db.install/*` and `:db.alter/*` entities encountered.
11. Transaction finalization: updating the `idents`, `schema`, and `parts` materialized views in the SQL store.
12. *SQL transaction committed*
## Parsing
A transaction usually arrives as a string. That string is parsed to EDN, and from there parsed to a vector of `Entity` elements. Each `Entity` element represents a transaction operation, one of:
* `:db/add`
* `:db/retract`
* `:db/retractEntity`
* `:db/retractAttribute`
This doesn't depend on the schema but almost all transactions will include variable data so it's usually not sensible to preprocess or cache them.
## Expansion, rewriting, and type checking
This is the point in the process at which the contents of the database — in particular, its schema and ident mappings — are first used.
The entities of the parsed transaction are walked to map keyword idents to numeric entids using the ident mappings. Syntactic sugar like the map notation and the embedded vector notation are expanded into multiple entities. Attribute/value pairs are checked to ensure that they are well-typed.
## SQL transaction opened
This is the point in the process at which the contents of the SQL store are used. Therefore, we open a SQL write transaction to isolate our multi-step process from concurrent readers. (We assume that writes are serialized. See the Wiki notes on [[modeling the DB and connection in Rust|Thoughts:-modeling-db-conn-in-Rust]].
## Resolution
Next, the set of attribute/value pairs requiring resolution is collected. There are two ways that an attribute/value pair can require resolution: it can be used in a _lookup ref_, like
```clojure
[[:db/add [:db/ident :user/unique-attribute] ATTRIBUTE VALUE]
```
or it could be used in an _upsert_, like
```clojure
[[:db/add "tempid" :user/unique-attribute EXISTING-VALUE]
[:db/add "tempid" :user/other-attribute NEW-VALUE]]
```
Every lookup-ref must resolve to a unique entid already in the database or the transaction fails immediately. Upserted temp IDs, however, may or may not resolve. If any resolve to _multiple_ entids, the transaction fails immediately. Those that do not resolve will have new entids in the appropriate partition allocated for them.
The upsert resolution process is a multi-step algorithm that iteratively refines sets. See the Wiki notes on [[resolving upserts|Upsert-resolution-algorithm]].
## SQL evaluation
At this point, the transaction entities are in a streamlined form, like
```clojure
[[:db/add numeric-entid numeric-attr-entid well-typed-value]]
```
Using the properties of the relevant attribute, we generate bulk `INSERT` and `DELETE` SQL statements to update the `transactions` and `datoms` tables in the SQL store. The goal is to have SQLite do the work to look up and replace `:db.cardinality/one` datoms, to produce a minimal number of `DELETE` statements, to efficiently `INSERT` fulltext values and the corresponding datoms, etc.
The translation to SQL is not complicated, but there are several fiddly cases. Eventually we'll write Wiki notes on [[translating entities to bulk SQL|Entity-to-SQL-translation]].
## Transaction report generation
At this point, the `datoms` and `transactions` tables are updated but the transactor itself does not know the details of what has happened! For example, a `:db.cardinality/many` datom may have already existed and _not_ been transacted; or a new `:db/ident` may have been transacted. The transactor therefore queries the `transactions` table to find out what the actual SQL changes have been, for presentation to the `transact` consumer.
## Schema evolution and transaction finalization
Using the report generated in the previous step, the transactor interprets any `:db.install/*`, `:db.alter/*`, and changes to the schema definitions encountered.
Finally, the `idents` (if new `:db/ident` datoms were transacted), `schema` (if `:db.install/*` datoms were transacted), and `parts` (always, since we allocate a new `:db/tx` entid each transaction) materialized views in the SQL store are updated.
## SQL transaction closed
Finally, we've updated the data stores and the materialized metadata views. The encompassing SQL transaction is committed. See the Wiki notes on [[modeling the DB and connection in Rust|Thoughts:-modeling-db-conn-in-Rust]].