mentat/project.clj
Nick Alexander badec36aaa Completely rewrite main transaction logic to be faster.
This is almost complete; it passes the test suite save for retracting
fulltext datoms correctly.

There's a lot to say about this approach, but I don't have time to give
too many details.  The broad outline is as follows.  We collect datoms
to add and retract in a tx_lookup table.  Depending on flags ("search
value" sv and "search value type tag" svalue_type_tag) we "complete" the
tx_lookup table by joining matching datoms.  This allows us to find
datoms that are present (and should not be added as part of the
transaction, or should be retracted as part of the transaction, or
should be replaced as part of the transaction.  We complete the
tx_lookup (in place!) in two separate INSERTs to avoid a quadratic
two-table walk (explain the queries to observe that both INSERTs walk
the lookup table once and then use the datoms indexes to complete the
matching values).

We could simplify the code by using multiple lookup tables, both for the
two cases of search parameters (eav vs. ea) and for the incomplete and
completed rows.  Right now we differentiate the former with NULL checks,
and the latter by incrementing the added0 column.  It performs well
enough, so I haven't tried to understand the performance of separating
these things.

After the tx_lookup table is completed, we build the transaction from
it; and update the datoms materialized view table as well.  Observe the
careful handling of the "search value" sv parameters to handle replacing
:db.cardinality/one datoms.

Finally, we read the processed transaction back to produce to the API.
This is strictly to match the Datomic API; we might make allow to skip
this, since many consumers will not want to stream this over the wire.

Rough timings show the transactor processing a single >50k datom
transaction in about 3.5s, of which less than 0.5s is spent in the
expensive joins.  Further, repeating the processing of the same
transaction is only about 3.5s again!  That's the worst possible for the
joins, since every single inserted datom will already be present in the
database, making the most expensive join match every row.
2016-08-19 12:40:11 -07:00

65 lines
3.6 KiB
Clojure

(defproject datomish "0.1.0-SNAPSHOT"
:description "A persistent, embedded knowledge base inspired by Datomic and DataScript."
:url "https://github.com/mozilla/datomish"
:license {:name "Mozilla Public License Version 2.0"
:url "https://github.com/mozilla/datomish/blob/master/LICENSE"}
:dependencies [[org.clojure/clojurescript "1.9.89"]
[org.clojure/clojure "1.8.0"]
[org.clojure/core.async "0.2.385"]
[datascript "0.15.1"]
[honeysql "0.8.0"]
[com.datomic/datomic-free "0.9.5359"]
[com.taoensso/tufte "1.0.2"]
[jamesmacaulay/cljs-promises "0.1.0"]]
:cljsbuild {:builds {:release {
:source-paths ["src"]
:assert false
:compiler {:output-to "release-js/datomish.bare.js"
:optimizations :advanced
:pretty-print false
:elide-asserts true
:output-wrapper false
:parallel-build true}
:notify-command ["release-js/wrap_bare.sh"]}
:advanced {:source-paths ["src"]
:compiler {:output-to "target/advanced/datomish.js"
:optimizations :advanced
:source-map "target/advanced/datomish.js.map"
:pretty-print true
:recompile-dependents true
:parallel-build true
}}
:test {
:source-paths ["src" "test"]
:compiler {:output-to "target/test/datomish.js"
:output-dir "target/test"
:main datomish.test
:optimizations :none
:source-map true
:recompile-dependents true
:parallel-build true
:target :nodejs
}}
}
}
:profiles {:dev {:dependencies [[cljsbuild "1.1.3"]
[tempfile "0.2.0"]
[com.cemerick/piggieback "0.2.1"]
[org.clojure/tools.nrepl "0.2.10"]
[org.clojure/java.jdbc "0.6.2-alpha1"]
[org.xerial/sqlite-jdbc "3.8.11.2"]]
:jvm-opts ["-Xss4m"]
:repl-options {:nrepl-middleware [cemerick.piggieback/wrap-cljs-repl]}
:plugins [[lein-cljsbuild "1.1.3"]
[lein-doo "0.1.6"]]
}}
:doo {:build "test"}
:clean-targets ^{:protect false} ["target"
"release-js/datomish.bare.js"
"release-js/datomish.js"]
)