[doc] Clarify unification semantics #160

Open
opened 2020-08-06 16:55:56 +00:00 by gburd · 0 comments
gburd commented 2020-08-06 16:55:56 +00:00 (Migrated from github.com)

Query variables unify. This occurs at two levels: down in SQLite (which does all the work of a query engine, of course), and also deliberately within Mentat, as we decide how to construct SQL queries that implement Datalog semantics.

Datomic doesn't explicitly document its unification behavior. Clojure itself offers several kinds of equality comparison:

  • identical?, which tests whether two objects are the same. Primitives (with the exception of some boxed JVM fixnums) are almost never identical?.
  • =, which compares collections type-agnostically, and otherwise is the usual value-space equality comparison you might expect.
  • ==, which is numeric equality, and works across numeric types.

The following REPL session should be illuminating:

user=> (identical? 5.0 5.0)     ; Primitives are rarely identical.
false
user=> (== 5.0 5.0)             ; == compares primitive numbers correctly.
true
user=> (== 5.0 5)               ; == does numeric coercion.
true
user=> (= 5.0 5)                ; = doesn't do numeric coercion.
false
user=> (= '(1 2 3) [1 2 3])     ; = compares collection contents.
true

Without experimenting, I'd bet that Datomic unifies with =, and thus these two queries are identical:

[:find ?x ?y :where [?x :foo/bar ?z] [?z :foo/baz ?y]]
[:find ?x ?y :where [?x :foo/bar ?z] [?a :foo/baz ?y] [(= ?z ?a)]]

Notably this means that keywords don't unify with strings, and doubles don't unify with longs. That makes sense for Mentat, too: TypedValue::Keyword(":foo/bar") shouldn't unify with TypedValue::String(":foo/bar") despite having the same database representation, and arguably TypedValue::Double(5.0) and TypedValue::Long(5) shouldn't unify either.

We should do the following:

  • Experiment with Datomic and DataScript to define their behavior.

  • Decide on and document ours. A few good illustrations would be

    [:find ?x [?x _ 5]]
    [:find ?x [?x :some/double 5]]    ; Do we coerce in this case?
    [:find ?x [?x :some/long 5.0]]    ; Do we coerce in this one? We definitely wouldn't for 5.1!
    [:find ?x :in ?y [?x _ ?y]],   ?y = TypedValue::Long(5)
    [:find ?x [(ground 5) [?y]] [?x _ ?y]]    ; ground has to pick a single value to bind, no?
    [:find ?x [:my/statics :static/five ?y] [?x _ ?y]]
    

    Whether these match long-valued, double-valued, or both kinds of attributes of ?x is down to the chosen unification semantics and also the representation of literal values in queries and in inputs.

  • Implement = and == to allow for queries to be written that implement numeric or exact equality without being phrased as patterns.

  • Continue to correctly implement type tag matching clauses in order to follow our defined unification behavior.

Query variables _unify_. This occurs at two levels: down in SQLite (which does all the work of a query engine, of course), and also deliberately within Mentat, as we decide how to construct SQL queries that implement Datalog semantics. Datomic doesn't explicitly document its unification behavior. Clojure itself offers several kinds of equality comparison: - `identical?`, which tests whether two objects are the same. Primitives (with the exception of some boxed JVM fixnums) are almost never `identical?`. - `=`, which compares collections type-agnostically, and otherwise is the usual value-space equality comparison you might expect. - `==`, which is numeric equality, and works across numeric types. The following REPL session should be illuminating: ``` user=> (identical? 5.0 5.0) ; Primitives are rarely identical. false user=> (== 5.0 5.0) ; == compares primitive numbers correctly. true user=> (== 5.0 5) ; == does numeric coercion. true user=> (= 5.0 5) ; = doesn't do numeric coercion. false user=> (= '(1 2 3) [1 2 3]) ; = compares collection contents. true ``` Without experimenting, I'd bet that Datomic unifies with `=`, and thus these two queries are identical: ``` [:find ?x ?y :where [?x :foo/bar ?z] [?z :foo/baz ?y]] ``` ``` [:find ?x ?y :where [?x :foo/bar ?z] [?a :foo/baz ?y] [(= ?z ?a)]] ``` Notably this means that keywords don't unify with strings, and doubles don't unify with longs. That makes sense for Mentat, too: `TypedValue::Keyword(":foo/bar")` shouldn't unify with `TypedValue::String(":foo/bar")` despite having the same database representation, and arguably `TypedValue::Double(5.0)` and `TypedValue::Long(5)` shouldn't unify either. We should do the following: - Experiment with Datomic and DataScript to define their behavior. - Decide on and document ours. A few good illustrations would be ``` [:find ?x [?x _ 5]] [:find ?x [?x :some/double 5]] ; Do we coerce in this case? [:find ?x [?x :some/long 5.0]] ; Do we coerce in this one? We definitely wouldn't for 5.1! [:find ?x :in ?y [?x _ ?y]], ?y = TypedValue::Long(5) [:find ?x [(ground 5) [?y]] [?x _ ?y]] ; ground has to pick a single value to bind, no? [:find ?x [:my/statics :static/five ?y] [?x _ ?y]] ``` Whether these match long-valued, double-valued, or both kinds of attributes of `?x` is down to the chosen unification semantics and also the representation of literal values in queries and in inputs. - Implement `=` and `==` to allow for queries to be written that implement numeric or exact equality without being phrased as patterns. - Continue to correctly implement type tag matching clauses in order to follow our defined unification behavior.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/mentat#160
No description provided.