Consider an alternative disk representation for datom flags #69

New issue

Open

opened 2020-08-06 16:54:24 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:54:24 +00:00

(Migrated from github.com)

We have four fields in datoms: index_vaet, index_avet, index_fulltext and unique_value.

These fields appear in datoms and in the indices idx_datoms_eavt and idx_datoms_aevt.

Some rows also appear in idx_datoms_avet, idx_datoms_vaet, idx_datoms_fulltext, and idx_datoms_unique_value.

That is: each datom is responsible for anywhere from 12 to 28 TINYINTs in the DB. Sometimes these won't contribute to space (if they're zero, I've read that sqlite should pack them down to nothing), but assuming one byte per field, half a million datoms will add up to 14MB to the DB just for these flags.

These flags are entirely derived from the schema: a datom's attribute is the sole determinant (AFAICS) of whether these flags are 1 or 0 for a row.

Now, partial indexes cannot refer to other tables or call functions, so we can't simply join against the schema in the CREATE INDEX … WHERE clause. But we could use a more complex operator-driven expression, including bitmasks, to compress these flags.

We could also consider approaches to simply removing columns:

index_vaet and index_fulltext are mutually exclusive, so they could be two integer values in a single field.
index_vaet corresponds to :db/valueType :db.type/ref, which is already implicitly represented by a value_type_tag of 0, so we can filter on that instead.

Finally, we could consider direct schema creation as an approach: rather than having idx_datoms_fulltext, for example, we could create per-attribute indices when we register a schema fragment that includes :fulltext true:

CREATE INDEX idx_datoms_fulltext_page_title ON datoms (value_type_tag, v, a, e) WHERE a = 65538 -- The interned attribute.

I don't know if sqlite will happily use a collection of such indices (perhaps it'd be quicker for real-world queries!), but it's worth exploring.

We have four fields in `datoms`: `index_vaet`, `index_avet`, `index_fulltext` and `unique_value`. These fields appear in `datoms` and in the indices `idx_datoms_eavt` and `idx_datoms_aevt`. Some rows also appear in `idx_datoms_avet`, `idx_datoms_vaet`, `idx_datoms_fulltext`, and `idx_datoms_unique_value`. That is: each datom is responsible for anywhere from 12 to 28 `TINYINT`s in the DB. Sometimes these won't contribute to space (if they're zero, I've read that sqlite should pack them down to nothing), but assuming one byte per field, half a million datoms will add up to 14MB to the DB just for these flags. These flags are entirely derived from the schema: a datom's attribute is the sole determinant (AFAICS) of whether these flags are `1` or `0` for a row. Now, partial indexes cannot refer to other tables or call functions, so we can't simply join against the schema in the `CREATE INDEX … WHERE` clause. But we could use a more complex operator-driven expression, including bitmasks, to compress these flags. We could also consider approaches to simply removing columns: - `index_vaet` and `index_fulltext` are mutually exclusive, so they could be two integer values in a single field. - `index_vaet` corresponds to `:db/valueType :db.type/ref`, which is already implicitly represented by a `value_type_tag` of 0, so we can filter on that instead. Finally, we could consider direct schema creation as an approach: rather than having `idx_datoms_fulltext`, for example, we could create per-attribute indices when we register a schema fragment that includes `:fulltext true`: `CREATE INDEX idx_datoms_fulltext_page_title ON datoms (value_type_tag, v, a, e) WHERE a = 65538 -- The interned attribute.` I don't know if sqlite will happily use a collection of such indices (perhaps it'd be quicker for real-world queries!), but it's worth exploring.