Per-append overhead is too high #62

Open
opened 2016-03-29 11:07:08 +00:00 by slfritchie · 0 comments
slfritchie commented 2016-03-29 11:07:08 +00:00 (Migrated from github.com)

On a chain with a single FLU, the per-append overhead is far too high for production use. Current write-once enforcement and checksum management tests are all OK, but basho_bench performance measurement shows that the serialization in both the process structure and eleveldb iterator use are both too large.

For example:

cd /path/to/machi/source/repo/machi
make clean
make stagedevrel
rm -f /tmp/setup
cat <<EOF > /tmp/setup
{host, "localhost", []}.
{flu,f1,"localhost",20401,[]}.
{chain,c1,[f1],[]}.
EOF
./dev/dev1/bin/machi start
sleep 5
./dev/dev1/bin/machi-admin quick-admin-apply /tmp/setup localhost

Then use this basho_bench config, which uses 25 concurrent clients to append 4KByte chunks to a single prefix. (I.e., worst-case serialization)

%% Mandatory: adjust this code path to top of your compiled Machi source distro
{code_paths, ["/path/to/machi/source/repo/machi"]}.
{driver, machi_basho_bench_driver}.

%% Chose your maximum rate (per worker proc, see 'concurrent' below)
%{mode, {rate,10}}.
%{mode, {rate,20}}.
%{mode, {rate,20}}.
{mode, max}.

%% Runtime & reporting interval
{duration, 10}.         % minutes
{report_interval, 1}.   % seconds

%% Choose your number of worker procs
%{concurrent, 1}.
%{concurrent, 5}.
%{concurrent, 10}.
{concurrent, 25}.
%{concurrent, 100}.

%% Here's a chain of (up to) length 3, all on localhost
%% Note: if any servers are down, and your OS/TCP stack has an
%% ICMP response limit such as OS X's "net.inet.icmp.icmplim" setting,
%% then if that setting is very low (e.g., OS X's limit is 50), then
%% you can have big problems with ICMP/RST responses being delayed and
%% interactive *very* badly with your test.
%% For OS X, fix using "sudo sysctl -w net.inet.icmp.icmplim=9999"
{machi_server_info,
 [
  {p_srvr,f1,machi_flu1_client,"localhost",20401,[]}
 ]}.
{machi_ets_key_tab_type, set}.   % set | ordered_set

%% Workload-specific definitions follow....

%% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops
{operations, [{append, 10}]}.
%{operations, [{read, 10}]}.

%% For append, key = Machi file prefix name
{key_generator, {to_binstr, "prefix~w", {uniform_int, 1}}}.
%{key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}.
%{key_generator, {to_binstr, "prefix~w", {uniform_int, 200}}}.
%{key_generator, {uniform_int, 3333222111}}.

%% Increase size of value_generator_source_size if value_generator is big!!
{value_generator_source_size, 2111000}.
{value_generator, {fixed_bin, 4096}}.
%{value_generator, {fixed_bin, 32768}}.   %  32 KB
%{value_generator, {fixed_bin, 256000}}.
%{value_generator, {fixed_bin, 1011011}}.

Here is the result running on a TRIM-enabled external Thunderbolt+SSD on my MacBook. Units are seconds (elapsed & window) or microseconds (all other columns).

% head tests/current/append_latencies.csv
elapsed, window, n, min, mean, median, 95th, 99th, 99_9th, max, errors
1.000938, 1.000938, 1164, 16611, 21263.9, 18841, 26650, 93227, 103340, 104752, 0
2.002061, 1.001123, 1249, 16509, 19753.0, 18629, 27024, 32586, 34494, 34910, 0
3.000928, 0.998867, 1165, 16303, 20899.6, 19478, 29138, 32377, 33866, 33902, 0
4.001929, 1.001001, 1256, 16303, 20746.9, 19403, 28808, 31151, 35461, 35519, 0
5.001932, 1.000003, 1218, 16423, 20174.0, 19031, 26810, 34556, 39735, 40749, 0
6.001932, 1.0, 1199, 17108, 20681.5, 19256, 29062, 40275, 43867, 44428, 0
7.001991, 1.000059, 1272, 17720, 20282.8, 19305, 25013, 40179, 43867, 44428, 0
8.001922, 0.999931, 1258, 17520, 19852.3, 19050, 25236, 26580, 27807, 27860, 0
9.001923, 1.000001, 1315, 17520, 19395.7, 18472, 25271, 26381, 28369, 28691, 0

Using 1 MByte chunks, this same-load-except-for-keygen-of-30-file-prefixes is happy to sustain about 340 MByte/sec of throughput. This is happy, since that's about the maximum throughput of the Thunderbolt+SSD device combination, but it also avoids the main serialization bottleneck(s) in the workload described in detail above.

On a chain with a single FLU, the per-append overhead is far too high for production use. Current write-once enforcement and checksum management tests are all OK, but basho_bench performance measurement shows that the serialization in both the process structure and eleveldb iterator use are both too large. For example: ``` cd /path/to/machi/source/repo/machi make clean make stagedevrel rm -f /tmp/setup cat <<EOF > /tmp/setup {host, "localhost", []}. {flu,f1,"localhost",20401,[]}. {chain,c1,[f1],[]}. EOF ./dev/dev1/bin/machi start sleep 5 ./dev/dev1/bin/machi-admin quick-admin-apply /tmp/setup localhost ``` Then use this basho_bench config, which uses 25 concurrent clients to append 4KByte chunks to a single prefix. (I.e., worst-case serialization) ``` %% Mandatory: adjust this code path to top of your compiled Machi source distro {code_paths, ["/path/to/machi/source/repo/machi"]}. {driver, machi_basho_bench_driver}. %% Chose your maximum rate (per worker proc, see 'concurrent' below) %{mode, {rate,10}}. %{mode, {rate,20}}. %{mode, {rate,20}}. {mode, max}. %% Runtime & reporting interval {duration, 10}. % minutes {report_interval, 1}. % seconds %% Choose your number of worker procs %{concurrent, 1}. %{concurrent, 5}. %{concurrent, 10}. {concurrent, 25}. %{concurrent, 100}. %% Here's a chain of (up to) length 3, all on localhost %% Note: if any servers are down, and your OS/TCP stack has an %% ICMP response limit such as OS X's "net.inet.icmp.icmplim" setting, %% then if that setting is very low (e.g., OS X's limit is 50), then %% you can have big problems with ICMP/RST responses being delayed and %% interactive *very* badly with your test. %% For OS X, fix using "sudo sysctl -w net.inet.icmp.icmplim=9999" {machi_server_info, [ {p_srvr,f1,machi_flu1_client,"localhost",20401,[]} ]}. {machi_ets_key_tab_type, set}. % set | ordered_set %% Workload-specific definitions follow.... %% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops {operations, [{append, 10}]}. %{operations, [{read, 10}]}. %% For append, key = Machi file prefix name {key_generator, {to_binstr, "prefix~w", {uniform_int, 1}}}. %{key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}. %{key_generator, {to_binstr, "prefix~w", {uniform_int, 200}}}. %{key_generator, {uniform_int, 3333222111}}. %% Increase size of value_generator_source_size if value_generator is big!! {value_generator_source_size, 2111000}. {value_generator, {fixed_bin, 4096}}. %{value_generator, {fixed_bin, 32768}}. % 32 KB %{value_generator, {fixed_bin, 256000}}. %{value_generator, {fixed_bin, 1011011}}. ``` Here is the result running on a TRIM-enabled external Thunderbolt+SSD on my MacBook. Units are seconds (elapsed & window) or microseconds (all other columns). ``` % head tests/current/append_latencies.csv elapsed, window, n, min, mean, median, 95th, 99th, 99_9th, max, errors 1.000938, 1.000938, 1164, 16611, 21263.9, 18841, 26650, 93227, 103340, 104752, 0 2.002061, 1.001123, 1249, 16509, 19753.0, 18629, 27024, 32586, 34494, 34910, 0 3.000928, 0.998867, 1165, 16303, 20899.6, 19478, 29138, 32377, 33866, 33902, 0 4.001929, 1.001001, 1256, 16303, 20746.9, 19403, 28808, 31151, 35461, 35519, 0 5.001932, 1.000003, 1218, 16423, 20174.0, 19031, 26810, 34556, 39735, 40749, 0 6.001932, 1.0, 1199, 17108, 20681.5, 19256, 29062, 40275, 43867, 44428, 0 7.001991, 1.000059, 1272, 17720, 20282.8, 19305, 25013, 40179, 43867, 44428, 0 8.001922, 0.999931, 1258, 17520, 19852.3, 19050, 25236, 26580, 27807, 27860, 0 9.001923, 1.000001, 1315, 17520, 19395.7, 18472, 25271, 26381, 28369, 28691, 0 ``` Using 1 MByte chunks, this same-load-except-for-keygen-of-30-file-prefixes is happy to sustain about 340 MByte/sec of throughput. This is happy, since that's about the maximum throughput of the Thunderbolt+SSD device combination, but it also avoids the main serialization bottleneck(s) in the workload described in detail above.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/machi#62
No description provided.