Commit graph

61 commits

Author SHA1 Message Date
Gregory Burd
2ddf0da53e Use malloc/free rather than enif_alloc/enif_free so as to avoid BEAM allocator
overhead (bytes and time).  Create static references to commonly used Erlang
atoms to avoid overhead re-creating them on each request cycle.
2013-08-21 12:20:19 -04:00
Gregory Burd
e67da86a9b Change backpressure method from EAGAIN to bump_reductions so as not to block Riak/KV vnode processes when queues backup. 2013-08-19 13:32:58 -04:00
Gregory Burd
ee904b4769 Lower the queue size to shrink potential for latency in queue. Remove earlier idea that more queues would lead to more even worker progress, just have 1 queue per Erlang-scheduler thread (generally, 1 per CPU core available). Also change the way worker threads decide when to cond_wait or migrate to other queues looking for work. 2013-07-31 15:06:28 -04:00
Gregory Burd
c9a4ab8325 Revert changes to async_nif and re-enable stats. Fixed selective recv. 2013-07-31 09:41:36 -04:00
Gregory Burd
4418a74183 Increase the number of queues for work to reside. Worker threads, once started, don't exit until shutdown. 2013-07-30 14:21:26 -04:00
Gregory Burd
1623d5293c Increase the max queue size. 2013-07-30 13:30:43 -04:00
Gregory Burd
cce163db9f Fix potential to use uninitialized value when branching. 2013-07-26 20:08:49 -04:00
Gregory Burd
9a5defd8c9 Merge remote-tracking branch 'origin/master' into gsb-workers-migrate
Conflicts:
	c_src/async_nif.h
2013-07-26 10:31:23 -04:00
Gregory Burd
452d7694a6 Added some sanity checking of key/value sizes. Check for EAGAIN/INVAL/NOMEM when starting worker threads. Switch back to the 1.6.3 release branch of WT. 2013-07-26 10:27:21 -04:00
Gregory Burd
3627ff8690 Ensure that on EAGAIN we continue to try to spawn a worker. When workers finish with a queue have them migrate to the other queues looking for work. 2013-07-25 13:29:16 -04:00
Gregory Burd
bbadc81d53 Queue depth and num workers can race, so make sure that we start at least one worker when there are none active for that queue. 2013-07-15 16:51:08 -04:00
Gregory Burd
bd0323af7a Update to WiredTiger 1.6.3. Fix a condition where a mutex was unlocked twice on eagain when queues were all full. 2013-07-15 12:21:10 -04:00
Gregory Burd
bc0f5dbfc7 Evict older half of items in the cache by removing items from the end of the list, don't waste cycles computing timestamps. 2013-07-02 22:23:32 -04:00
Gregory Burd
b727538162 Fix shutdown 2013-07-02 22:07:34 -04:00
Gregory Burd
2672bab3ea Stats overhead due to hitting the clock and pulling a mutex caused a massive slowdown so now work is assigned to a new queue only when the candidate queue is deeper than the average of the other queues and threads are created only when the depth of the queue is larger than the number of threads working on that queue. 2013-07-02 19:58:00 -04:00
Gregory Burd
00e5889ac9 Changed conditions for worker thread creation. 2013-07-02 16:46:04 -04:00
Gregory Burd
4300b3036f Working on triggers that start/stop worker threads. 2013-07-01 21:09:21 -04:00
Gregory Burd
c7b45a7c2b Still ironing out stats. 2013-06-27 10:57:41 -04:00
Gregory Burd
c41e411a92 Worker threads come and go as needed with a lower bound of 2 and an
upper bound of ASYNC_NIF_MAX_WORKERS.  Stats were improved to use
thread local storage for measures.  With stats working again wterl
uses them to determine who to evict.  Wterl's signature calculation
for an operation wasn't correct and so the cache wasn't efficient at
all, this has been fixed.
2013-06-25 13:31:43 -04:00
Gregory Burd
a3c54b1610 Cleanup a bit. 2013-06-19 14:54:27 -04:00
Gregory Burd
34e88c9234 Add some debugging output. 2013-06-18 13:12:10 -04:00
Gregory Burd
53307e8c01 A great deal of cleanup. EUnit and EQC tests pass. 2013-06-14 16:57:53 -04:00
Gregory Burd
ff7d1d6e20 WIP: further simplifying context cache 2013-06-14 10:52:45 -04:00
Gregory Burd
7952358781 WIP: cache wasn't returning items found 2013-06-12 09:08:09 -04:00
Gregory Burd
4460434db1 WIP: remove potential for infinite loops with CAS and fix a few issues in async 2013-06-12 08:09:51 -04:00
Gregory Burd
110b482962 Some paranoia and a few fixes 2013-06-11 12:13:06 -04:00
Gregory Burd
2a4b8ee7d2 WIP: simplify the cache from hash-of-lists to list; use a CAS() operation to protect the most-recently-used (mru) list. 2013-06-10 14:31:59 -04:00
Gregory Burd
0fef28de92 WIP: basho_bench tests are running fine now, need more work to ensure cache is functioning properly. 2013-06-05 11:41:41 -04:00
Gregory Burd
f1b7d8322d WIP: replcae the kbtree with khash, we don't need the tree features (yet, if ever) and hash is faster; add a most-recently-used stash for contexts as it's highly likely that worker threads will do many operations with the same shape/signature of session/cursors/tables/config and that path can be lock-free as well making it much faster (one would hope); somewhere something is stepping on read-only ErlNifBinary data and so a crc check is failing and causing the runtime to abort, that's the latest item to find/fix. 2013-06-04 14:45:23 -04:00
Gregory Burd
15fbc71ea7 WIP: pieces in place, need to work out the kinks now. 2013-05-30 14:21:34 -04:00
Gregory Burd
a2cd1d562c WIP: devising a better way to cache/reuse session/cursor pairs. 2013-05-28 16:14:19 -04:00
Gregory Burd
786142ce73 Add a bit of statistics tracking for two reasons, a) to help inform
where a request should be enqueded and b) to track request latency.
2013-05-01 22:02:37 -04:00
Gregory Burd
ae64a5e26f Move async nif struct definition back to where it belongs. 2013-05-01 22:02:21 -04:00
Gregory Burd
eafee02865 Only start 2 * num_queues worker threads initially. num_queues is generally
equal to the number of cores reported by Erlang (info.scheduler_threads) which
is either determined automatically by the Erlang BEAM runtime or via the +S
flag.  The minimum num_queues is 2, so the minimum number of workers is 4.  The
maximum number of workers is ASYNC_NIF_MAX_WORKER_QUEUE_SIZE (currently set to
128), but that would only happen if there were 64 cores (or you set +S 64:64 at
startup).
2013-04-26 10:15:15 -04:00
Gregory Burd
422dcfda89 Return 'eagain' when request queue is full and then try the request again.
In the worst case is the request queue remains full and we loop between
the NIF and Erlang forever trying over and over to enqueue the request. If
that happens we shouldn't take schedulers offline as the NIF calls are fast
and we shouldn't run out of memory as that is bounded.  CPU will show a lot
of activity, but progress will continue in Erlang.
2013-04-25 15:18:23 -04:00
Gregory Burd
6b393ac47c Keep allocated req and ErlNifEnv around for reuse rather than re-alloc'ing them on each request should save us some overhead on the hot path. 2013-04-25 11:30:11 -04:00
Gregory Burd
652771003e WIP: a good start, I need to switch over wterl_event_handler to be a
gen_server and I need to add a way to set the pid of the message handler
process to the NIF API.
2013-04-22 09:52:21 -04:00
Gregory Burd
fae6831580 Ensure that the env is clear when signaling shutdown. 2013-04-21 11:11:17 -04:00
Gregory Burd
bfe56136d8 * Be sure to release the reqs mutext on shutdown. 2013-04-20 08:28:38 -04:00
Gregory Burd
8d8ceecc8b enif_get_string can return < 1 when it copies less than the buffer size
you pass into it, that'd result in a non-zero (aka true) test when in fact
it's a problem if the argument isn't passed completely (however unlikely
that is).

enif_alloc_env() requires that later you enif_free_env() which I wasn't doing,
this seems to keep memory steady in test runs.
2013-04-19 09:11:41 -04:00
Gregory Burd
60dd048b7e Move the FIFO Queue implementation into its own file (fifo_q.h). Work
on the nif_unload path.  Free up resources owned by wterl.c when
unloading.  Continue to evolve the build script.  Add to khash the ability
to create a hash that maps from a pointer to a value. There is still a segv
due to a race wterl.c:do_unload() which needs to be addressed.
2013-04-18 10:37:36 -04:00
Gregory Burd
db953f5b39 Moved num_queue estimate earlier so as to ensure that the amount of
memory allocated, zero'ed and free'd was consistent.  Skip free'ing
async environment as Erlang will free that for us when we no longer
reference it.  Fix the memory leak (or at least one of them) by no
longer copying the Uri into the hash table.
2013-04-17 18:26:59 -04:00
Gregory Burd
1ae8e5698f Ensure that the ratio of workers to queues is 2:1 and that there are at
least 2 queues regardless.  Fix a few race conditions (h/t Sue from
WiredTiger for some nice work) and cherry pick (for now) a commit that
fixes a bug I triggered and Keith fixed (in < 10min from report) related
to WiredTiger stats.  Ensure that my guesstimate for session_max is no
larger than WiredTiger can manage.  Continue to fiddle with the build
script.
2013-04-17 16:48:23 -04:00
Gregory Burd
123dfa600e Simplified the worker look function. Added ability to pick block
compressor in config, default is snappy, off is {block_compressor, none}.
2013-04-17 13:19:06 -04:00
Gregory Burd
87f70d75a1 Inline the fifo_q functions to speed them up and silence compiler warnings
for unused API calls.  Add a fifo_q_full call to hide the details of that.
Alloc work queues along with the async_nif at the end of that memory block.
Fix a few places where things should be free'd and were not.  Change enqueue
to return 0 when shutting down.  Fix a race related to shutdown.  When I use
gdb eunit calls ?cmd() seem to fail, so I've created rmdir:path() to replace
?cmd("rm -rf path") calls.
2013-04-17 11:17:13 -04:00
Gregory Burd
1913e7fdf5 Continue to iterate on the build system to accomodate shared libs. 2013-04-16 21:46:53 -04:00
Gregory Burd
3dab6a2dc5 Clean up 2013-04-16 17:09:34 -04:00
Gregory Burd
ba41dd7fb6 Use the table name in get/put/delete calls to form an "affinity" with a
worker queue so that we spread work around and make it more likely that
work for a given table goes first to a given set of worker threads.
2013-04-15 18:46:06 -04:00
Gregory Burd
371779d14e Return to alloc'ed requests because there may be many more in flight
than those in the various queues.  Reenable the (still failing)
truncate tests (because they don't SEGV anymore).  Still might be
a memory leak, next up is valgrind.
2013-04-15 17:37:14 -04:00
Gregory Burd
668109de25 Added match/gt/lt atom return for cursor_search_near() call. Changed
the request queue over to a simple fifo queue which could (if needed)
be made lock-free.  Cursor searches can optionally now specifiy that
they are mid-scan so as not to have their cursor handles reset every
call.
2013-04-15 15:22:12 -04:00