Consider alternative disk representations to minimize database fragmentation #96

New issue

Open

opened 2020-08-06 16:54:50 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:54:50 +00:00

(Migrated from github.com)

Our write approach goes something like this:

Do some reads to figure out upserts etc.
Do some writes to temporary tables and fulltext_values.
Flush changes into the transactions table.
Flush changes into the datoms table.
Drop the temporary tables.
Read back from transactions to build a transaction report.
Apply changes — parts, idents, and attributes — to the schema, idents, and parts tables.

All of this happens in one database transaction.

One concern about all of this is that there will be significant fragmentation: in each transact we write to datoms and transactions, interleaving changes (and to other tables, too). Even with WAL, this is a long way from an optimized append-only workflow

SQLite allows for attached databases, and writes to attached databases are atomic if not using WAL, and atomic but not crash-safe otherwise.

We could thus split our data, literally making our materialized datoms table and its indices a separate materialized store. (And we could do this via our transaction listener if we weren't so concerned about atomicity and speed.)

So long as the transaction to the transaction log database completes first, we can replay transacted datoms into the datoms database in the event of necessary crash recovery.

This approach would allow each database to be tuned independently, would prevent the writer's readback of the transaction log from impacting the datoms-oriented page cache used by readers, and potentially allow us to scale better.

The transactor would have a connection to both DBs, attached. Readers would typically only use a connection to the datoms database.

Our write approach goes something like this: - Do some reads to figure out upserts etc. - Do some writes to temporary tables and `fulltext_values`. - Flush changes into the `transactions` table. - Flush changes into the `datoms` table. - Drop the temporary tables. - Read back from `transactions` to build a transaction report. - Apply changes — parts, idents, and attributes — to the `schema`, `idents`, and `parts` tables. All of this happens in one database transaction. One concern about all of this is that there will be significant fragmentation: in each `transact` we write to `datoms` and `transactions`, interleaving changes (and to other tables, too). Even with WAL, this is a long way from an optimized append-only workflow SQLite allows for [attached databases](https://www.sqlite.org/lang_attach.html), and writes to attached databases are atomic if not using WAL, and atomic but not crash-safe otherwise. We could thus split our data, _literally_ making our materialized `datoms` table and its indices a separate materialized store. (And we could do this via our transaction listener if we weren't so concerned about atomicity and speed.) So long as the transaction to the transaction log database completes first, we can replay transacted datoms into the datoms database in the event of necessary crash recovery. This approach would allow each database to be tuned independently, would prevent the writer's readback of the transaction log from impacting the datoms-oriented page cache used by readers, and potentially allow us to scale better. The transactor would have a connection to both DBs, attached. Readers would typically only use a connection to the datoms database.