Consider an alternative disk representation for datom flags #69
Labels
No labels
A-build
A-cli
A-core
A-design
A-edn
A-ffi
A-query
A-sdk
A-sdk-android
A-sdk-ios
A-sync
A-transact
A-views
A-vocab
P-Android
P-desktop
P-iOS
bug
correctness
dependencies
dev-ergonomics
discussion
documentation
duplicate
enhancement
enquiry
good first bug
good first issue
help wanted
hygiene
in progress
invalid
question
ready
size
speed
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: greg/mentat#69
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We have four fields in
datoms
:index_vaet
,index_avet
,index_fulltext
andunique_value
.These fields appear in
datoms
and in the indicesidx_datoms_eavt
andidx_datoms_aevt
.Some rows also appear in
idx_datoms_avet
,idx_datoms_vaet
,idx_datoms_fulltext
, andidx_datoms_unique_value
.That is: each datom is responsible for anywhere from 12 to 28
TINYINT
s in the DB. Sometimes these won't contribute to space (if they're zero, I've read that sqlite should pack them down to nothing), but assuming one byte per field, half a million datoms will add up to 14MB to the DB just for these flags.These flags are entirely derived from the schema: a datom's attribute is the sole determinant (AFAICS) of whether these flags are
1
or0
for a row.Now, partial indexes cannot refer to other tables or call functions, so we can't simply join against the schema in the
CREATE INDEX … WHERE
clause. But we could use a more complex operator-driven expression, including bitmasks, to compress these flags.We could also consider approaches to simply removing columns:
index_vaet
andindex_fulltext
are mutually exclusive, so they could be two integer values in a single field.index_vaet
corresponds to:db/valueType :db.type/ref
, which is already implicitly represented by avalue_type_tag
of 0, so we can filter on that instead.Finally, we could consider direct schema creation as an approach: rather than having
idx_datoms_fulltext
, for example, we could create per-attribute indices when we register a schema fragment that includes:fulltext true
:CREATE INDEX idx_datoms_fulltext_page_title ON datoms (value_type_tag, v, a, e) WHERE a = 65538 -- The interned attribute.
I don't know if sqlite will happily use a collection of such indices (perhaps it'd be quicker for real-world queries!), but it's worth exploring.