Maximize throughput by minimizing time spent in transact critical section #86

New issue

Open

opened 2020-08-06 16:54:41 +00:00 by gburd · 0 comments

gburd commented

2020-08-06 16:54:41 +00:00

(Migrated from github.com)

Our writes are serialized, for good and obvious reasons.

However, transact operations involve some amount of non-write processing: turning a semi-structured map into an array of :db/add operations, turning keywords into entids, performing validation, etc.

Throughput can be increased by allowing other writes to be partially preprocessed while the current write is occurring against the DB.

(This is analogous to forcing shoppers to get out their credit cards while they're standing in line, rather than waiting until they get to the register to begin hunting in pockets and bags.)

Theoretical maximum throughput would mean queuing up only raw SQL+bindings in the transact queue itself: no transact work other than "run this SQL". That's unrealistic, because transacts (particularly upserts) depend on the transacts before them.

However, realistic opportunities exist.

Most transacts do not modify the schema. By queuing up something like [input transformed schema-counter], we can do preprocessing that depends on the schema, redoing the work in the unlikely event that a transact earlier in the queue change the schema.
Some schema parts aren't modifiable. Those are either safe to rely on, or can be reduced to a simpler check.
Regardless, some structural transformation (expanding maps) and validation can be performed early.

We should do two kinds of measurement:

Firstly, measure how much time is currently spent in preprocessing transact statements. This is the time spent per operation within the critical section prior to running SQL.
Secondly, figure out the contention for the transact queue in loaded scenarios (e.g., lots of user-driven writes on a large database on a slow disk). We are particularly interested in these, because they're the only scenarios in which throughput is a concern!

If we don't find contention in practice — the queue never fills — then there's no point doing this work unless we think it would make the code easier to understand or is reasonable future-proofing.

If we don't find much opportunity for time savings, then there's little point in doing this work now.

Our writes are serialized, for good and obvious reasons. However, transact operations involve some amount of non-write processing: turning a semi-structured map into an array of `:db/add` operations, turning keywords into entids, performing validation, etc. Throughput can be increased by allowing other writes to be partially preprocessed while the current write is occurring against the DB. (This is analogous to forcing shoppers to get out their credit cards while they're standing in line, rather than waiting until they get to the register to begin hunting in pockets and bags.) Theoretical maximum throughput would mean queuing up only raw SQL+bindings in the transact queue itself: no transact work other than "run this SQL". That's unrealistic, because transacts (particularly upserts) depend on the transacts before them. However, realistic opportunities exist. - Most transacts do not modify the schema. By queuing up something like `[input transformed schema-counter]`, we can do preprocessing that depends on the schema, redoing the work in the unlikely event that a transact earlier in the queue change the schema. - Some schema parts aren't modifiable. Those are either safe to rely on, or can be reduced to a simpler check. - Regardless, some structural transformation (expanding maps) and validation can be performed early. We should do two kinds of measurement: - Firstly, measure how much time is currently spent in preprocessing transact statements. This is the time spent per operation within the critical section prior to running SQL. - Secondly, figure out the contention for the transact queue in loaded scenarios (e.g., lots of user-driven writes on a large database on a slow disk). We are particularly interested in these, because they're the only scenarios in which throughput is a concern! If we don't find contention in practice — the queue never fills — then there's no point doing this work unless we think it would make the code easier to understand or is reasonable future-proofing. If we don't find much opportunity for time savings, then there's little point in doing this work now.