Consider alternative disk representations to minimize database fragmentation #96
Labels
No labels
A-build
A-cli
A-core
A-design
A-edn
A-ffi
A-query
A-sdk
A-sdk-android
A-sdk-ios
A-sync
A-transact
A-views
A-vocab
P-Android
P-desktop
P-iOS
bug
correctness
dependencies
dev-ergonomics
discussion
documentation
duplicate
enhancement
enquiry
good first bug
good first issue
help wanted
hygiene
in progress
invalid
question
ready
size
speed
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: greg/mentat#96
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our write approach goes something like this:
fulltext_values
.transactions
table.datoms
table.transactions
to build a transaction report.schema
,idents
, andparts
tables.All of this happens in one database transaction.
One concern about all of this is that there will be significant fragmentation: in each
transact
we write todatoms
andtransactions
, interleaving changes (and to other tables, too). Even with WAL, this is a long way from an optimized append-only workflowSQLite allows for attached databases, and writes to attached databases are atomic if not using WAL, and atomic but not crash-safe otherwise.
We could thus split our data, literally making our materialized
datoms
table and its indices a separate materialized store. (And we could do this via our transaction listener if we weren't so concerned about atomicity and speed.)So long as the transaction to the transaction log database completes first, we can replay transacted datoms into the datoms database in the event of necessary crash recovery.
This approach would allow each database to be tuned independently, would prevent the writer's readback of the transaction log from impacting the datoms-oriented page cache used by readers, and potentially allow us to scale better.
The transactor would have a connection to both DBs, attached. Readers would typically only use a connection to the datoms database.