mentat/db/src/upsert_resolution.rs

362 lines
16 KiB
Rust
Raw Normal View History

Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
// Copyright 2016 Mozilla
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use
// this file except in compliance with the License. You may obtain a copy of the
// License at http://www.apache.org/licenses/LICENSE-2.0
// Unless required by applicable law or agreed to in writing, software distributed
// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.
#![allow(dead_code)]
//! This module implements the upsert resolution algorithm described at
//! https://github.com/mozilla/mentat/wiki/Transacting:-upsert-resolution-algorithm.
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
use std::collections::{
BTreeMap,
BTreeSet,
};
use indexmap;
use petgraph::unionfind;
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
2018-06-06 01:23:59 +00:00
use errors::{
2018-06-27 00:17:01 +00:00
DbErrorKind,
2018-06-06 01:23:59 +00:00
Result,
};
2017-02-20 13:13:16 +00:00
use types::{
AVPair,
};
use internal_types::{
Population,
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
TempIdHandle,
2017-02-20 13:13:16 +00:00
TempIdMap,
Term,
TermWithoutTempIds,
TermWithTempIds,
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
TypedValueOr,
2017-02-20 13:13:16 +00:00
};
2017-06-14 21:42:34 +00:00
use mentat_core::util::Either::*;
use core_traits::{
Entid,
};
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
use mentat_core::{
attribute,
Attribute,
Schema,
TypedValue,
};
use edn::entities::OpType;
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
use schema::SchemaBuilding;
/// A "Simple upsert" that looks like [:db/add TEMPID a v], where a is :db.unique/identity.
#[derive(Clone,Debug,Eq,Hash,Ord,PartialOrd,PartialEq)]
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
struct UpsertE(TempIdHandle, Entid, TypedValue);
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
/// A "Complex upsert" that looks like [:db/add TEMPID a OTHERID], where a is :db.unique/identity
#[derive(Clone,Debug,Eq,Hash,Ord,PartialOrd,PartialEq)]
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
struct UpsertEV(TempIdHandle, Entid, TempIdHandle);
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
/// A generation collects entities into populations at a single evolutionary step in the upsert
/// resolution evolution process.
///
/// The upsert resolution process is only concerned with [:db/add ...] entities until the final
/// entid allocations. That's why we separate into special simple and complex upsert types
/// immediately, and then collect the more general term types for final resolution.
#[derive(Clone,Debug,Default,Eq,Hash,Ord,PartialOrd,PartialEq)]
pub(crate) struct Generation {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
/// "Simple upserts" that look like [:db/add TEMPID a v], where a is :db.unique/identity.
upserts_e: Vec<UpsertE>,
/// "Complex upserts" that look like [:db/add TEMPID a OTHERID], where a is :db.unique/identity
upserts_ev: Vec<UpsertEV>,
/// Entities that look like:
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
/// - [:db/add TEMPID b OTHERID]. b may be :db.unique/identity if it has failed to upsert.
/// - [:db/add TEMPID b v]. b may be :db.unique/identity if it has failed to upsert.
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
/// - [:db/add e b OTHERID].
allocations: Vec<TermWithTempIds>,
/// Entities that upserted and no longer reference tempids. These assertions are guaranteed to
/// be in the store.
upserted: Vec<TermWithoutTempIds>,
/// Entities that resolved due to other upserts and no longer reference tempids. These
/// assertions may or may not be in the store.
resolved: Vec<TermWithoutTempIds>,
}
#[derive(Clone,Debug,Default,Eq,Hash,Ord,PartialOrd,PartialEq)]
pub(crate) struct FinalPopulations {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
/// Upserts that upserted.
pub upserted: Vec<TermWithoutTempIds>,
/// Allocations that resolved due to other upserts.
pub resolved: Vec<TermWithoutTempIds>,
/// Allocations that required new entid allocations.
pub allocated: Vec<TermWithoutTempIds>,
}
impl Generation {
/// Split entities into a generation of populations that need to evolve to have their tempids
/// resolved or allocated, and a population of inert entities that do not reference tempids.
2018-06-06 01:23:59 +00:00
pub(crate) fn from<I>(terms: I, schema: &Schema) -> Result<(Generation, Population)> where I: IntoIterator<Item=TermWithTempIds> {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
let mut generation = Generation::default();
let mut inert = vec![];
2018-06-06 01:23:59 +00:00
let is_unique = |a: Entid| -> Result<bool> {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
let attribute: &Attribute = schema.require_attribute_for_entid(a)?;
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Ok(attribute.unique == Some(attribute::Unique::Identity))
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
};
for term in terms.into_iter() {
match term {
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(e), a, Right(v)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
if op == OpType::Add && is_unique(a)? {
generation.upserts_ev.push(UpsertEV(e, a, v));
} else {
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
generation.allocations.push(Term::AddOrRetract(op, Right(e), a, Right(v)));
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(e), a, Left(v)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
if op == OpType::Add && is_unique(a)? {
generation.upserts_e.push(UpsertE(e, a, v));
} else {
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
generation.allocations.push(Term::AddOrRetract(op, Right(e), a, Left(v)));
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Left(e), a, Right(v)) => {
generation.allocations.push(Term::AddOrRetract(op, Left(e), a, Right(v)));
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Left(e), a, Left(v)) => {
inert.push(Term::AddOrRetract(op, Left(e), a, Left(v)));
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
},
}
}
Ok((generation, inert))
}
/// Return true if it's possible to evolve this generation further.
///
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
/// Note that there can be complex upserts but no simple upserts to help resolve them, and in
/// this case, we cannot evolve further.
pub(crate) fn can_evolve(&self) -> bool {
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
!self.upserts_e.is_empty()
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
/// Evolve this generation one step further by rewriting the existing :db/add entities using the
/// given temporary IDs.
///
/// TODO: Considering doing this in place; the function already consumes `self`.
pub(crate) fn evolve_one_step(self, temp_id_map: &TempIdMap) -> Generation {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
let mut next = Generation::default();
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
// We'll iterate our own allocations to resolve more things, but terms that have already
// resolved stay resolved.
next.resolved = self.resolved;
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
for UpsertE(t, a, v) in self.upserts_e {
match temp_id_map.get(&*t) {
Some(&n) => next.upserted.push(Term::AddOrRetract(OpType::Add, n, a, v)),
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
None => next.allocations.push(Term::AddOrRetract(OpType::Add, Right(t), a, Left(v))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
}
for UpsertEV(t1, a, t2) in self.upserts_ev {
match (temp_id_map.get(&*t1), temp_id_map.get(&*t2)) {
(Some(_), Some(&n2)) => {
// Even though we can resolve entirely, it's possible that the remaining upsert
// could conflict. Moving straight to resolved doesn't give us a chance to
// search the store for the conflict.
next.upserts_e.push(UpsertE(t1, a, TypedValue::Ref(n2.0)))
},
(None, Some(&n2)) => next.upserts_e.push(UpsertE(t1, a, TypedValue::Ref(n2.0))),
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
(Some(&n1), None) => next.allocations.push(Term::AddOrRetract(OpType::Add, Left(n1), a, Right(t2))),
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
(None, None) => next.upserts_ev.push(UpsertEV(t1, a, t2))
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
}
// There's no particular need to separate resolved from allocations right here and right
// now, although it is convenient.
for term in self.allocations {
// TODO: find an expression that destructures less? I still expect this to be efficient
// but it's a little verbose.
match term {
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(t1), a, Right(t2)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match (temp_id_map.get(&*t1), temp_id_map.get(&*t2)) {
(Some(&n1), Some(&n2)) => next.resolved.push(Term::AddOrRetract(op, n1, a, TypedValue::Ref(n2.0))),
(None, Some(&n2)) => next.allocations.push(Term::AddOrRetract(op, Right(t1), a, Left(TypedValue::Ref(n2.0)))),
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
(Some(&n1), None) => next.allocations.push(Term::AddOrRetract(op, Left(n1), a, Right(t2))),
(None, None) => next.allocations.push(Term::AddOrRetract(op, Right(t1), a, Right(t2))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(t), a, Left(v)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match temp_id_map.get(&*t) {
Some(&n) => next.resolved.push(Term::AddOrRetract(op, n, a, v)),
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
None => next.allocations.push(Term::AddOrRetract(op, Right(t), a, Left(v))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Left(e), a, Right(t)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match temp_id_map.get(&*t) {
Some(&n) => next.resolved.push(Term::AddOrRetract(op, e, a, TypedValue::Ref(n.0))),
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
None => next.allocations.push(Term::AddOrRetract(op, Left(e), a, Right(t))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(_, Left(_), _, Left(_)) => unreachable!(),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
}
next
}
// Collect id->[a v] pairs that might upsert at this evolutionary step.
pub(crate) fn temp_id_avs<'a>(&'a self) -> Vec<(TempIdHandle, AVPair)> {
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
let mut temp_id_avs: Vec<(TempIdHandle, AVPair)> = vec![];
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
// TODO: map/collect.
for &UpsertE(ref t, ref a, ref v) in &self.upserts_e {
// TODO: figure out how to make this less expensive, i.e., don't require
// clone() of an arbitrary value.
temp_id_avs.push((t.clone(), (*a, v.clone())));
}
temp_id_avs
}
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
/// Evolve potential upserts that haven't resolved into allocations.
2018-06-06 01:23:59 +00:00
pub(crate) fn allocate_unresolved_upserts(&mut self) -> Result<()> {
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
let mut upserts_ev = vec![];
::std::mem::swap(&mut self.upserts_ev, &mut upserts_ev);
self.allocations.extend(upserts_ev.into_iter().map(|UpsertEV(t1, a, t2)| Term::AddOrRetract(OpType::Add, Right(t1), a, Right(t2))));
Ok(())
}
/// After evolution is complete, yield the set of tempids that require entid allocation.
///
/// Some of the tempids may be identified, so we also provide a map from tempid to a dense set
/// of contiguous integer labels.
2018-06-06 01:23:59 +00:00
pub(crate) fn temp_ids_in_allocations(&self, schema: &Schema) -> Result<BTreeMap<TempIdHandle, usize>> {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
assert!(self.upserts_e.is_empty(), "All upserts should have been upserted, resolved, or moved to the allocated population!");
assert!(self.upserts_ev.is_empty(), "All upserts should have been upserted, resolved, or moved to the allocated population!");
Lookup refs, nested vector values, map notation. Fixes #180, fixes #183, fixes #284. (#382) r=rnewman * Pre: Fix error in parser macros. * Pre: Make test unwrapping more verbose. * Pre: Make lookup refs be (lookup-ref a v) in the entity position. This has the advantage of being explicit in all situations and unambiguous at parse-time. This choice agrees with the Clojure implementation but not with Datomic. Datomic treats [a v] as a lookup ref, is ambiguous at parse-time, and is disambiguated in ways I do not understand at transaction time. We mooted making lookup refs [[a v]] and outlawing nested value vectors in transactions, but after implementing that approach I decided it was better to handle lookup refs at parse time and therefore outlawing nested value vectors is not necessary. * Handle lookup refs in the entity and value columns. Fixes #183. * Pre 0a: Use a stack instead of into_iter. * Pre 0b: Dedent. * Pre 0c: Handle `e` after `v`. This allows to use the original `e` while handling `v`. * Explode value lists for :db.cardinality/many attributes. Fixes #284. * Parse and accept map notation. Fixes #180. * Pre: Modernize add() and retract() into one add_or_retract(). * Pre: Add is_collection and is_atom to edn::Value. * Pre: Differentiate atoms from lookup-refs in value position. Initially, I expected to accept arbitrary edn::Value instances in the value position, and to differentiate in the transactor. However, the implementation quickly became a two-stage parser, since we always wanted to parse the resulting value position into some other known thing using the tx-parser. To save calls into the parser and to allow the parser to move forward with a smaller API surface, I push as much of this parsing as possible into the initial parse. * Pre: Modernize entities(). * Pre: Quote edn::Value::Text in Display. * Review comment: Add and use edn::Value::into_atom. * Review comment: Use skip(eof()) throughout. * Review comment: VecDeque instead of Vec. * Review comment: Part 0: Rename TempId to TempIdHandle. * Review comment: Part 1: Differentiate internal and external tempids. This breaks an abstraction boundary by pushing the Internal/External split up to the Entity level in tx/ and tx-parser/. This just makes it easier to explode Entity map notation instances into Entity instances, taking an existing External tempid :db/id or generating a new Internal tempid as appropriate. To do this without breaking the abstraction boundary would require adding flexibility to the transaction processor: we'd need to be able to turn Entity instances into some internal enum and handle the two cases independently. It wouldn't be too hard, but this reduces the combinatorial type explosion.
2017-03-27 23:30:04 +00:00
let mut temp_ids: BTreeSet<TempIdHandle> = BTreeSet::default();
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
let mut tempid_avs: BTreeMap<(Entid, TypedValueOr<TempIdHandle>), Vec<TempIdHandle>> = BTreeMap::default();
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
for term in self.allocations.iter() {
match term {
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
&Term::AddOrRetract(OpType::Add, Right(ref t1), a, Right(ref t2)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
temp_ids.insert(t1.clone());
temp_ids.insert(t2.clone());
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
let attribute: &Attribute = schema.require_attribute_for_entid(a)?;
if attribute.unique == Some(attribute::Unique::Identity) {
tempid_avs.entry((a, Right(t2.clone()))).or_insert(vec![]).push(t1.clone());
}
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
},
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
&Term::AddOrRetract(OpType::Add, Right(ref t), a, ref x @ Left(_)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
temp_ids.insert(t.clone());
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
let attribute: &Attribute = schema.require_attribute_for_entid(a)?;
if attribute.unique == Some(attribute::Unique::Identity) {
tempid_avs.entry((a, x.clone())).or_insert(vec![]).push(t.clone());
}
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
&Term::AddOrRetract(OpType::Add, Left(_), _, Right(ref t)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
temp_ids.insert(t.clone());
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
&Term::AddOrRetract(OpType::Add, Left(_), _, Left(_)) => unreachable!(),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
&Term::AddOrRetract(OpType::Retract, _, _, _) => {
// [:db/retract ...] entities never allocate entids; they have to resolve due to
// other upserts (or they fail the transaction).
},
}
}
Add type checking and constraint checking to the transactor. (#663, #532, #679) This should address #663, by re-inserting type checking in the transactor stack after the entry point used by the term builder. Before this commit, we were using an SQLite UNIQUE index to assert that no `[e a]` pair, with `a` a cardinality one attribute, was asserted more than once. However, that's not in line with Datomic, which treats transaction inputs as a set and allows a single datom like `[e a v]` to appear multiple times. It's both awkward and not particularly efficient to look for _distinct_ repetitions in SQL, so we accept some runtime cost in order to check for repetitions in the transactor. This will allow us to address #532, which is really about whether we treat inputs as sets. A side benefit is that we can provide more helpful error messages when the transactor does detect that the input truly violates the cardinality constraints of the schema. This commit builds a trie while error checking and collecting final terms, which should be fairly efficient. It also allows a simpler expression of input-provided :db/txInstant datoms, which in turn uncovered a small issue with the transaction watcher, where-by the watcher would not see non-input-provided :db/txInstant datoms. This transition to Datomic-like input-as-set semantics allows us to address #532. Previously, two tempids that upserted to the same entid would produce duplicate datoms, and that would have been rejected by the transactor -- correctly, since we did not allow duplicate datoms under the input-as-list semantics. With input-as-set semantics, duplicate datoms are allowed; and that means that we must allow tempids to be equivalent, i.e., to resolve to the same tempid. To achieve this, we: - index the set of tempids - identify tempid indices that share an upsert - map tempids to a dense set of contiguous integer labels We use the well-known union-find algorithm, as implemented by petgraph, to efficiently manage the set of equivalent tempids. Along the way, I've fixed and added tests for two small errors in the transactor. First, don't drop datoms resolved by upsert (#679). Second, ensure that complex upserts are allocated. I don't know quite what happened here. The Clojure implementation correctly kept complex upserts that hadn't resolved as complex upserts (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L436) and then allocated complex upserts if they didn't resolve (see https://github.com/mozilla/mentat/blob/9a9dfb502acf5e4cdb1059d4aac831d7603063c8/src/common/datomish/transact.cljc#L509). Based on the code comments, I think the Rust implementation must have incorrectly tried to optimize by handling all complex upserts in at most a single generation of evolution, and that's just not correct. We're effectively implementing a topological sort, using very specific domain knowledge, and its not true that a node in a topological sort can be considered only once!
2018-04-30 22:16:05 +00:00
// Now we union-find all the known tempids. Two tempids are unioned if they both appear as
// the entity of an `[a v]` upsert, including when the value column `v` is itself a tempid.
let mut uf = unionfind::UnionFind::new(temp_ids.len());
// The union-find implementation from petgraph operates on contiguous indices, so we need to
// maintain the map from our tempids to indices ourselves.
let temp_ids: BTreeMap<TempIdHandle, usize> = temp_ids.into_iter().enumerate().map(|(i, tempid)| (tempid, i)).collect();
debug!("need to label tempids aggregated using tempid_avs {:?}", tempid_avs);
for vs in tempid_avs.values() {
vs.first().and_then(|first| temp_ids.get(first)).map(|&first_index| {
for tempid in vs {
temp_ids.get(tempid).map(|&i| uf.union(first_index, i));
}
});
}
debug!("union-find aggregation {:?}", uf.clone().into_labeling());
// Now that we have aggregated tempids, we need to label them using the smallest number of
// contiguous labels possible.
let mut tempid_map: BTreeMap<TempIdHandle, usize> = BTreeMap::default();
let mut dense_labels: indexmap::IndexSet<usize> = indexmap::IndexSet::default();
// We want to produce results that are as deterministic as possible, so we allocate labels
// for tempids in sorted order. This has the effect of making "a" allocate before "b",
// which is pleasant for testing.
for (tempid, tempid_index) in temp_ids {
let rep = uf.find_mut(tempid_index);
dense_labels.insert(rep);
dense_labels.get_full(&rep).map(|(dense_index, _)| tempid_map.insert(tempid.clone(), dense_index));
}
debug!("labeled tempids using {} labels: {:?}", dense_labels.len(), tempid_map);
Ok(tempid_map)
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
/// After evolution is complete, use the provided allocated entids to segment `self` into
/// populations, each with no references to tempids.
2018-06-06 01:23:59 +00:00
pub(crate) fn into_final_populations(self, temp_id_map: &TempIdMap) -> Result<FinalPopulations> {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
assert!(self.upserts_e.is_empty());
assert!(self.upserts_ev.is_empty());
let mut populations = FinalPopulations::default();
populations.upserted = self.upserted;
populations.resolved = self.resolved;
for term in self.allocations {
let allocated = match term {
// TODO: consider require implementing require on temp_id_map.
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(t1), a, Right(t2)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match (op, temp_id_map.get(&*t1), temp_id_map.get(&*t2)) {
(op, Some(&n1), Some(&n2)) => Term::AddOrRetract(op, n1, a, TypedValue::Ref(n2.0)),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
(OpType::Add, _, _) => unreachable!(), // This is a coding error -- every tempid in a :db/add entity should resolve or be allocated.
2018-06-27 00:17:01 +00:00
(OpType::Retract, _, _) => bail!(DbErrorKind::NotYetImplemented(format!("[:db/retract ...] entity referenced tempid that did not upsert: one of {}, {}", t1, t2))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Right(t), a, Left(v)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match (op, temp_id_map.get(&*t)) {
(op, Some(&n)) => Term::AddOrRetract(op, n, a, v),
(OpType::Add, _) => unreachable!(), // This is a coding error.
2018-06-27 00:17:01 +00:00
(OpType::Retract, _) => bail!(DbErrorKind::NotYetImplemented(format!("[:db/retract ...] entity referenced tempid that did not upsert: {}", t))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(op, Left(e), a, Right(t)) => {
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
match (op, temp_id_map.get(&*t)) {
(op, Some(&n)) => Term::AddOrRetract(op, e, a, TypedValue::Ref(n.0)),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
(OpType::Add, _) => unreachable!(), // This is a coding error.
2018-06-27 00:17:01 +00:00
(OpType::Retract, _) => bail!(DbErrorKind::NotYetImplemented(format!("[:db/retract ...] entity referenced tempid that did not upsert: {}", t))),
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
}
},
Schema alteration. Fixes #294 and #295. (#370) r=rnewman * Pre: Don't retract :db/ident in test. Datomic (and eventually Mentat) don't allow to retract :db/ident in this way, so this runs afoul of future work to support mutating metadata. * Pre: s/VALUETYPE/VALUE_TYPE/. This is consistent with the capitalization (which is "valueType") and the other identifier. * Pre: Remove some single quotes from error output. * Part 1: Make materialized views be uniform [e a v value_type_tag]. This looks ahead to a time when we could support arbitrary user-defined materialized views. For now, the "idents" materialized view is those datoms of the form [e :db/ident :namespaced/keyword] and the "schema" materialized view is those datoms of the form [e a v] where a is in a particular set of attributes that will become clear in the following commits. This change is not backwards compatible, so I'm removing the open current (really, v2) test. It'll be re-instated when we get to https://github.com/mozilla/mentat/issues/194. * Pre: Map TypedValue::Ref to TypedValue::Keyword in debug output. * Part 3: Separate `schema_to_mutate` from the `schema` used to interpret. This is just to keep track of the expected changes during bootstrapping. I want bootstrap metadata mutations to flow through the same code path as metadata mutations during regular transactions; by differentiating the schema used for interpretation from the schema that will be updated I expect to be able to apply bootstrap metadata mutations to an empty schema and have things like materialized views created (using the regular code paths). This commit has been re-ordered for conceptual clarity, but it won't compile because it references the metadata module. It's possible to make it compile -- the functionality is there in the schema module -- but it's not worth the rebasing effort until after review (and possibly not even then, since we'll squash down to a single commit to land). * Part 2: Maintain entids separately from idents. In order to support historical idents, we need to distinguish the "current" map from entid -> ident from the "complete historical" map ident -> entid. This is what Datomic does; in Datomic, an ident is never retracted (although it can be replaced). This approach is an important part of allowing multiple consumers to share a schema fragment as it migrates forward. This fixes a limitation of the Clojure implementation, which did not handle historical idents across knowledge base close and re-open. The "entids" materialized view is naturally a slice of the "datoms" table. The "idents" materialized view is a slice of the "transactions" table. I hope that representing in this way, and casting the problem in this light, might generalize to future materialized views. * Pre: Add DiffSet. * Part 4: Collect mutations to a `Schema`. I haven't taken your review comment about consuming AttributeBuilder during each fluent function. If you read my response and still want this, I'm happy to do it in review. * Part 5: Handle :db/ident and :db.{install,alter}/attribute. This "loops" the committed datoms out of the SQL store and back through the metadata (schema, but in future also partition map) processor. The metadata processor updates the schema and produces a report of what changed; that report is then used to update the SQL store. That update includes: - the materialized views ("entids", "idents", and "schema"); - if needed, a subset of the datoms themselves (as flags change). I've left a TODO for handling attribute retraction in the cases that it makes sense. I expect that to be straight-forward. * Review comment: Rename DiffSet to AddRetractAlterSet. Also adds a little more commentary and a simple test. * Review comment: Use ToIdent trait. * Review comment: partially revert "Part 2: Maintain entids separately from idents." This reverts commit 23a91df9c35e14398f2ddbd1ba25315821e67401. Following our discussion, this removes the "entids" materialized view. The next commit will remove historical idents from the "idents" materialized view. * Post: Use custom Either rather than std::result::Result. This is not necessary, but it was suggested that we might be paying an overhead creating Err instances while using error_chain. That seems not to be the case, but this change shows that we don't actually use any of the Result helper methods, so there's no reason to overload Result. This change might avoid some future confusion, so I'm going to land it anyway. Signed-off-by: Nick Alexander <nalexander@mozilla.com> * Review comment: Don't preserve historical idents. * Review comment: More prepared statements when updating materialized views. * Post: Test altering :db/cardinality and :db/unique. These tests fail due to a Datomic limitation, namely that the marker flag :db.alter/attribute can only be asserted once for an attribute! That is, [:db.part/db :db.alter/attribute :attribute] will only be transacted at most once. Since older versions of Datomic required the :db.alter/attribute flag, I can only imagine they either never wrote :db.alter/attribute to the store, or they handled it specially. I'll need to remove the marker flag system from Mentat in order to address this fundamental limitation. * Post: Remove some more single quotes from error output. * Post: Add assert_transact! macro to unwrap safely. I was finding it very difficult to track unwrapping errors while making changes, due to an underlying Mac OS X symbolication issue that makes running tests with RUST_BACKTRACE=1 so slow that they all time out. * Post: Don't expect or recognize :db.{install,alter}/attribute. I had this all working... except we will never see a repeated `[:db.part/db :db.alter/attribute :attribute]` assertion in the store! That means my approach would let you alter an attribute at most one time. It's not worth hacking around this; it's better to just stop expecting (and recognizing) the marker flags. (We have all the data to distinguish the various cases that we need without the marker flags.) This brings Mentat in line with the thrust of newer Datomic versions, but isn't compatible with Datomic, because (if I understand correctly) Datomic automatically adds :db.{install,alter}/attribute assertions to transactions. I haven't purged the corresponding :db/ident and schema fragments just yet: - we might want them back - we might want them in order to upgrade v1 and v2 databases to the new on-disk layout we're fleshing out (v3?). * Post: Don't make :db/unique :db.unique/* imply :db/index true. This patch avoids a potential bug with the "schema" materialized view. If :db/unique :db.unique/value implies :db/index true, then what happens when you _retract_ :db.unique/value? I think Datomic defines this in some way, but I really want the "schema" materialized view to be a slice of "datoms" and not have these sort of ambiguities and persistent effects. Therefore, to ensure that we don't retract a schema characteristic and accidentally change more than we intended to, this patch stops having any schema characteristic imply any other schema characteristic(s). To achieve that, I added an Option<Unique::{Value,Identity}> type to Attribute; this helps with this patch, and also looks ahead to when we allow to retract :db/unique attributes. * Post: Allow to retract :db/ident. * Post: Include more details about invalid schema changes. The tests use strings, so they hide the chained errors which do in fact provide more detail. * Review comment: Fix outdated comment. * Review comment: s/_SET/_SQL_LIST/. * Review comment: Use a sub-select for checking cardinality. This might be faster in practice. * Review comment: Put `attribute::Unique` into its own namespace.
2017-03-20 20:18:59 +00:00
Term::AddOrRetract(_, Left(_), _, Left(_)) => unreachable!(), // This is a coding error -- these should not be in allocations.
Implement upsert resolution algorithm. (#186, #283). r=rnewman, f=jsantell * Pre: Implement batch [a v] pair lookup. * Pre: Add InternSet for sharing ref-counted handles to large values. * Pre: Derive more for Entity. * Pre: Return DB from creating; return TxReport from transact. I explicitly am not supporting opening existing databases yet, let alone upgrading databases from earlier versions. That can follow fast once basic transactions are supported. * Pre: Parse string temporary ID entities; remove ValueOrLookupRef. This adds TempId entities, but we can't disambiguate String temporary IDs from values without the use of the schema, so there's no new value branch. Similarly, we can't disambiguate lookup-ref values from two element list values without a schema, so we remove this entirely. We'll handle the ambiguity later in the transactor. * Persist partitions to SQL store; allocate transaction ID. (#186) * Post: Test upserting with vectors. This converts an existing test to EDN: https://github.com/mozilla/mentat/blob/84a80f40f5c888f8452d07bd15f3b5fba49d3963/test/datomish/db_test.cljc#L193. * Implement tempid upsert resolution algorithm. (#184) * Post: Separate Tx out of DB. This is very preliminary, since we don't have a real connection type to manage transactions and their metadata yet. * Post: Comment on implementation choices in the transactor. * Review comment: Put long use lists on separate lines. * Review comment: Accept String: Borrow<S> instead of just String. * Review comment: Address nits.
2017-02-15 00:50:40 +00:00
};
populations.allocated.push(allocated);
}
Ok(populations)
}
}