From b44c88fb973dffdbd06e61e9f75707544e58aedd Mon Sep 17 00:00:00 2001 From: Scott Lystig Fritchie Date: Wed, 20 May 2015 21:03:51 +0900 Subject: [PATCH] Add README.basho_bench.md --- README.basho_bench.md | 258 +++++++++++++++++++++++++ TODO-shortterm.org | 2 +- priv/basho_bench.append-example.config | 10 +- priv/basho_bench.read-example.config | 6 +- 4 files changed, 266 insertions(+), 10 deletions(-) create mode 100644 README.basho_bench.md diff --git a/README.basho_bench.md b/README.basho_bench.md new file mode 100644 index 0000000..d083c8e --- /dev/null +++ b/README.basho_bench.md @@ -0,0 +1,258 @@ + +# Using basho_bench to twiddle with Machi + +"Twiddle"? Really, is that a word? (Yes, it is a real English word.) + +## Benchmarking Machi's performance ... no, don't do it. + +Machi isn't ready for benchmark testing. Its public-facing API isn't +finished yet. Its internal APIs aren't quite finished yet either. So +any results of "benchmarking" effort is something that has even less +value **N** months from now than the usual benchmarking effort. + +However, there are uses for a benchmark tool. For example, one of my +favorites is to put **stress** on a system. I don't care about +average or 99th-percentile latencies, but I **might** care very much +about behavior. + +* What happens if a Machi system is under moderate load, and then I + stop one of the servers? What happens? + * How quickly do the chain managers react? + * How quickly do the client libraries within Machi react? + * How quickly do the external client API libraries react? + +* What happens if a Machi system is under heavy load, for example, + 100% CPU load. Not all 100% might be the Machi services. Some CPU + consumption might be from the load generator, like `basho_bench` + itself that is running on the same machine as a Machi server. Or + perhaps it's a tiny C program that I wrote: + + main() + { while (1) { ; } } + +## An example of how adding moderate stress can find weird bugs + +The driver/plug-in module for `basho_bench` is only a few hours old. +(I'm writing on Wednesday, 2015-05-20.) But just now, I configured my +basho_bench config file to try to contact a Machi cluster of three +nodes ... but really, only one was running. The client library, +`machi_cr_client.erl`, has **an extremely simple** method for dealing +with failed servers. I know it's simple and dumb, but that's OK in +many cases. + +However, `basho_bench` and the `machi_cr_client.erl` were acting very, +very badly. I couldn't figure it out until I took a peek at my OS's +`dmesg` output, namely: `dmesg | tail`. It said things like this: + + Limiting closed port RST response from 690 to 50 packets per second + Limiting closed port RST response from 367 to 50 packets per second + Limiting closed port RST response from 101 to 50 packets per second + Limiting closed port RST response from 682 to 50 packets per second + Limiting closed port RST response from 467 to 50 packets per second + +Well, isn't that interesting? + +This system was running on a single OS X machine: my MacBook Pro +laptop, running OS X 10.10 (Yosemite). I have seen that error +before. And I know how to fix it. + +* **Option 1**: Change the client library config to ignore the Machi + servers that I know will always be down during my experiment. +* ** Option 2**: Use the following to change my OS's TCP stack RST + behavior. (If a TCP port is not being listened to, the OS will + send a RST packet to signal "connection refused".) + +On OS X, the limit for RST packets is 50/second. The +`machi_cr_client.erl` client can generate far more than 50/second, as +the `Limiting closed port RST response...` messages above show. So, I +used some brute-force to change the environment: + + sudo sysctl -w net.inet.icmp.icmplim=20000 + +... and the problem disappeared. + +## Starting with basho_bench: a step-by-step tutorial + +First, clone the `basho_bench` source code, then compile it. You will +need Erlang/OTP version R16B or later to compile. I recommend using +Erlang/OTP 17.x, because I've been doing my Machi development using +17.x. + + cd /some/nice/dev/place + git clone https://github.com/basho/basho_bench.git + cd basho_bench + make + +In order to create graphs of `basho_bench` output, you'll need +installed one of the following: + +* R (the statistics package) +* gnuplot + +If you don't have either available on the machine(s) you're testing, +but you do have R (or gnuplot) on some other machine **Y**, then you can +copy the output files to machine **Y** and generate the graphs there. + +## Compiling the Machi source + +First, clone the `basho_bench` source code, then compile it. You will +need Erlang/OTP version 17.x to compile. + + cd /some/nice/dev/place + git clone https://github.com/basho/machi.git + cd machi + make + +## Creating a basho_bench test configuration file. + +There are a couple of example `basho_bench` configuration files in the +Machi `priv` directory. + +* [basho_bench.append-example.config](priv/basho_bench.append-example.config), + an example for writing Machi files. +* [basho_bench.read-example.config](priv/basho_bench.read-example.config), + an example for reading Machi files. + +If you want a test to do both reading & writing ... well, the +driver/plug-in is not mature enough to do it **well**. If you really +want to, refer to the `basho_bench` docs for how to use the +`operations` config option. + +The `basho_bench` config file is configured in Erlang term format. +Each configuration item is a 2-tuple followed by a period. Comments +begin with a `%` character and continue to the end-of-line. + + %% Mandatory: adjust this code path to top of your compiled Machi source distro + {code_paths, ["/Users/fritchie/b/src/machi"]}. + {driver, machi_basho_bench_driver}. + + %% Chose your maximum rate (per worker proc, see 'concurrent' below) + {mode, {rate, 25}}. + + %% Runtime & reporting interval + {duration, 10}. % minutes + {report_interval, 1}. % seconds + + %% Choose your number of worker procs + {concurrent, 5}. + + %% Here's a chain of (up to) length 3, all on localhost + {machi_server_info, + [ + {p_srvr,a,machi_flu1_client,"localhost",4444,[]}, + {p_srvr,b,machi_flu1_client,"localhost",4445,[]}, + {p_srvr,c,machi_flu1_client,"localhost",4446,[]} + ]}. + {machi_ets_key_tab_type, set}. % 'set' or 'ordered_set' + + %% Workload-specific definitions follow.... + + %% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops + {operations, [{append, 10}]}. + + %% For append, key = Machi file prefix name + {key_generator, {concat_binary, <<"prefix">>, + {to_binstr, "~w", {uniform_int, 30}}}}. + + %% Increase size of value_generator_source_size if value_generator is big!! + {value_generator_source_size, 2111000}. + {value_generator, {fixed_bin, 32768}}. % 32 KB + +In summary: + +* Yes, you really need to change `code_paths` to be the same as your + `/some/nice/dev/place/basho_bench` directory ... and that directory + must be on the same machine(s) that you intend to run `basho_bench`. +* Each worker process will have a rate limit of 25 ops/sec. +* The test will run for 10 minutes and report stats every 1 second. +* There are 5 concurrent worker processes. Each worker will + concurrently issue commands from the `operations` list, within the + workload throttle limit. +* The Machi cluster is a collection of three servers, all on + "localhost", and using TCP ports 4444-4446. +* Don't change the `machi_ets_key_tab_type` +* Our workload operation mix is 100% `append` operations. +* The key generator for the `append` operation specifies the file + prefix that will be chosen (at pseudo-random). In this case, we'll + choose uniformly randomly between file prefix `prefix0` and + `prefix29`. +* The values that we append will be fixed 32KB length, but they will + be chosen from a random byte string of 2,111,000 bytes. + +There are many other options for `basho_bench`, especially for the +`key_generator` and `value_generator` options. Please see the +`basho_bench` docs for further information. + +## Running basho_bench + +You can run `basho_bench` using the command: + + /some/nice/dev/place/basho_bench/basho_bench /path/to/config/file + +... where `/path/to/config/file` is the path to your config file. (If +you use an example from the `priv` dir, we recommend that you make a +copy elsewhere, edit the copy, and then use the copy to run +`basho_bench`.) + +You'll create a stats output directory, called `tests`, in the current +working directory. (Add `{results_dir, "/some/output/dir"}.` to change +the default!) + +Each time `basho_bench` is run, a new output stats directory is +created in the `tests` directory. The symbolic link `tests/current` +will always point to the last `basho_bench` run's output. But all +prior results are always accessible! Take a look in this directory +for all of the output. + +## Generating some pretty graphs + +If you are using R, then the following command will create a graph: + + Rscript --vanilla /some/nice/dev/place/basho_bench/basho_bench/priv/summary.r -i $CWD/tests/current + +If the `tests` directory is not in your current working dir (i.e. not +in `$CWD`), then please alter the command accordingly. + +R will create the final results graph in `$CWD/tests/current/summary.png`. + +If you are using gnuplot, please look at +`/some/nice/dev/place/basho_bench/basho_bench/Makefile` to see how to +use gnuplot to create the final results graph. + +## An example graph + +So, without a lot of context about the **Machi system** or about the +**basho_bench system** or about the ops being performed, here is an +example graph that was created by R: + +![](basho.github.io/machi/images/basho_bench.example0.png) + +**Without context??* How do I remember the context? + +My recommendation is: always keep the `.config` file together with the +graph file. In the `tests` directory, `basho_bench` will always make +a copy of the config file used to generate the test data. + +This config tells you very little about the environment of the load +generator machine or the Machi cluster, but ... you need to maintain +that documentation yourself, please! You'll thank me for that advice, +someday, 11 months from now when you can't remember the details of +that important test that you ran so very long ago. + +## Conclusion + +Really, we don't recommend using `basho_bench` for any serious +performance measurement of Machi yet: Machi needs more maturity before +it's reasonable to measure & judge its performance. But stress +testing is indeed useful for reasons other than measuring +Nth-percentile latency of operation `flarfbnitz`. We hope that this +tutorial has been helpful! + +If you encounter any difficulty with this tutorial or with Machi, +please open an issue/ticket at [GH Issues for +Machi](https://github.com/basho/machi/issues) ... use the green "New +issue" button. There are bugs and misfeatures in the `basho_bench` +plugin, sorry, but please help us fix them. + +> -Scott Lystig Fritchie +> Machi Team @ Basho diff --git a/TODO-shortterm.org b/TODO-shortterm.org index 985ebf1..dbb6ee4 100644 --- a/TODO-shortterm.org +++ b/TODO-shortterm.org @@ -39,7 +39,7 @@ func, and pattern match Erlang style in that func. ** DONE Adapt the projection-aware, CR-implementing client from demo-day ** DONE Add major comment sections to the CR-impl client -** TODO Simple basho_bench driver, put some unscientific chalk on the benchtop +** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop ** TODO Create parallel PULSE test for basic API plus chain manager repair ** TODO Add client-side vs. server-side checksum type, expand client API? ** TODO Add gproc and get rid of registered name rendezvous diff --git a/priv/basho_bench.append-example.config b/priv/basho_bench.append-example.config index 01cd6e8..2cd2084 100644 --- a/priv/basho_bench.append-example.config +++ b/priv/basho_bench.append-example.config @@ -7,9 +7,9 @@ %{mode, {rate,20}}. {mode, max}. -%% Runtime & reporting interval (seconds) -{duration, 10}. -{report_interval, 1}. +%% Runtime & reporting interval +{duration, 10}. % minutes +{report_interval, 1}. % seconds %% Choose your number of worker procs %{concurrent, 1}. @@ -37,11 +37,9 @@ {operations, [{append, 10}]}. %% For append, key = Machi file prefix name -{key_generator, {concat_binary, <<"prefix">>, - {to_binstr, "~w", {uniform_int, 30}}}}. +{key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}. %% Increase size of value_generator_source_size if value_generator is big!! {value_generator_source_size, 2111000}. {value_generator, {fixed_bin, 32768}}. % 32 KB -%{value_generator, {fixed_bin, 1048576}}. % 1024 KB diff --git a/priv/basho_bench.read-example.config b/priv/basho_bench.read-example.config index 1e4eb35..1ddcede 100644 --- a/priv/basho_bench.read-example.config +++ b/priv/basho_bench.read-example.config @@ -7,9 +7,9 @@ %{mode, {rate,20}}. {mode, max}. -%% Runtime & reporting interval (seconds) -{duration, 10}. -{report_interval, 1}. +%% Runtime & reporting interval +{duration, 10}. % minutes +{report_interval, 1}. % seconds %% Choose your number of worker procs %{concurrent, 1}.