Add README.basho_bench.md

2015-05-20 21:03:51 +09:00 · 2015-05-20 21:03:51 +09:00 · b44c88fb97
commit b44c88fb97
parent 9e41162e65
4 changed files with 266 additions and 10 deletions
--- a/README.basho_bench.md
+++ b/README.basho_bench.md
@ -0,0 +1,258 @@
+
+# Using basho_bench to twiddle with Machi
+
+"Twiddle"?  Really, is that a word?  (Yes, it is a real English word.)
+
+## Benchmarking Machi's performance ... no, don't do it.
+
+Machi isn't ready for benchmark testing.  Its public-facing API isn't
+finished yet.  Its internal APIs aren't quite finished yet either.  So
+any results of "benchmarking" effort is something that has even less
+value **N** months from now than the usual benchmarking effort.
+
+However, there are uses for a benchmark tool.  For example, one of my
+favorites is to put **stress** on a system.  I don't care about
+average or 99th-percentile latencies, but I **might** care very much
+about behavior.
+
+* What happens if a Machi system is under moderate load, and then I
+  stop one of the servers?  What happens?
+    * How quickly do the chain managers react?
+    * How quickly do the client libraries within Machi react?
+    * How quickly do the external client API libraries react?
+
+* What happens if a Machi system is under heavy load, for example,
+  100% CPU load.  Not all 100% might be the Machi services.  Some CPU
+  consumption might be from the load generator, like `basho_bench`
+  itself that is running on the same machine as a Machi server.  Or
+  perhaps it's a tiny C program that I wrote:
+
+    main()
+    { while (1) { ; } }
+
+## An example of how adding moderate stress can find weird bugs
+
+The driver/plug-in module for `basho_bench` is only a few hours old.
+(I'm writing on Wednesday, 2015-05-20.)  But just now, I configured my
+basho_bench config file to try to contact a Machi cluster of three
+nodes ... but really, only one was running.  The client library,
+`machi_cr_client.erl`, has **an extremely simple** method for dealing
+with failed servers.  I know it's simple and dumb, but that's OK in
+many cases.
+
+However, `basho_bench` and the `machi_cr_client.erl` were acting very,
+very badly.  I couldn't figure it out until I took a peek at my OS's
+`dmesg` output, namely: `dmesg | tail`.  It said things like this:
+
+    Limiting closed port RST response from 690 to 50 packets per second
+    Limiting closed port RST response from 367 to 50 packets per second
+    Limiting closed port RST response from 101 to 50 packets per second
+    Limiting closed port RST response from 682 to 50 packets per second
+    Limiting closed port RST response from 467 to 50 packets per second
+
+Well, isn't that interesting?
+
+This system was running on a single OS X machine: my MacBook Pro
+laptop, running OS X 10.10 (Yosemite).  I have seen that error
+before.  And I know how to fix it.
+
+* **Option 1**: Change the client library config to ignore the Machi
+    servers that I know will always be down during my experiment.
+* ** Option 2**: Use the following to change my OS's TCP stack RST
+    behavior.  (If a TCP port is not being listened to, the OS will
+    send a RST packet to signal "connection refused".)
+
+On OS X, the limit for RST packets is 50/second.  The
+`machi_cr_client.erl` client can generate far more than 50/second, as
+the `Limiting closed port RST response...` messages above show.  So, I
+used some brute-force to change the environment:
+
+    sudo sysctl -w net.inet.icmp.icmplim=20000
+
+... and the problem disappeared.
+
+## Starting with basho_bench: a step-by-step tutorial
+
+First, clone the `basho_bench` source code, then compile it.  You will
+need Erlang/OTP version R16B or later to compile.  I recommend using
+Erlang/OTP 17.x, because I've been doing my Machi development using
+17.x.
+
+    cd /some/nice/dev/place
+    git clone https://github.com/basho/basho_bench.git
+    cd basho_bench
+    make
+
+In order to create graphs of `basho_bench` output, you'll need
+installed one of the following:
+
+* R (the statistics package)
+* gnuplot
+
+If you don't have either available on the machine(s) you're testing,
+but you do have R (or gnuplot) on some other machine **Y**, then you can
+copy the output files to machine **Y** and generate the graphs there.
+
+## Compiling the Machi source
+
+First, clone the `basho_bench` source code, then compile it.  You will
+need Erlang/OTP version 17.x to compile.
+
+    cd /some/nice/dev/place
+    git clone https://github.com/basho/machi.git
+    cd machi
+    make
+
+## Creating a basho_bench test configuration file.
+
+There are a couple of example `basho_bench` configuration files in the
+Machi `priv` directory.
+
+* [basho_bench.append-example.config](priv/basho_bench.append-example.config),
+  an example for writing Machi files.
+* [basho_bench.read-example.config](priv/basho_bench.read-example.config),
+  an example for reading Machi files.
+
+If you want a test to do both reading & writing ... well, the
+driver/plug-in is not mature enough to do it **well**.  If you really
+want to, refer to the `basho_bench` docs for how to use the
+`operations` config option.
+
+The `basho_bench` config file is configured in Erlang term format.
+Each configuration item is a 2-tuple followed by a period.  Comments
+begin with a `%` character and continue to the end-of-line.
+
+    %% Mandatory: adjust this code path to top of your compiled Machi source distro
+    {code_paths, ["/Users/fritchie/b/src/machi"]}.
+    {driver, machi_basho_bench_driver}.
+    
+    %% Chose your maximum rate (per worker proc, see 'concurrent' below)
+    {mode, {rate, 25}}.
+    
+    %% Runtime & reporting interval 
+    {duration, 10}.         % minutes
+    {report_interval, 1}.   % seconds
+    
+    %% Choose your number of worker procs
+    {concurrent, 5}.
+    
+    %% Here's a chain of (up to) length 3, all on localhost
+    {machi_server_info,
+     [
+      {p_srvr,a,machi_flu1_client,"localhost",4444,[]},
+      {p_srvr,b,machi_flu1_client,"localhost",4445,[]},
+      {p_srvr,c,machi_flu1_client,"localhost",4446,[]}
+     ]}.
+    {machi_ets_key_tab_type, set}.   % 'set' or 'ordered_set'
+    
+    %% Workload-specific definitions follow....
+    
+    %% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops
+    {operations, [{append, 10}]}.
+    
+    %% For append, key = Machi file prefix name
+    {key_generator, {concat_binary, <<"prefix">>,
+                                    {to_binstr, "~w", {uniform_int, 30}}}}.
+    
+    %% Increase size of value_generator_source_size if value_generator is big!!
+    {value_generator_source_size, 2111000}.
+    {value_generator, {fixed_bin, 32768}}.   %  32 KB
+
+In summary:
+
+* Yes, you really need to change `code_paths` to be the same as your
+  `/some/nice/dev/place/basho_bench` directory ... and that directory
+  must be on the same machine(s) that you intend to run `basho_bench`.
+* Each worker process will have a rate limit of 25 ops/sec.
+* The test will run for 10 minutes and report stats every 1 second.
+* There are 5 concurrent worker processes.  Each worker will
+  concurrently issue commands from the `operations` list, within the
+  workload throttle limit.
+* The Machi cluster is a collection of three servers, all on
+  "localhost", and using TCP ports 4444-4446.
+* Don't change the `machi_ets_key_tab_type`
+* Our workload operation mix is 100% `append` operations.
+* The key generator for the `append` operation specifies the file
+  prefix that will be chosen (at pseudo-random).  In this case, we'll
+  choose uniformly randomly between file prefix `prefix0` and
+  `prefix29`.
+* The values that we append will be fixed 32KB length, but they will
+  be chosen from a random byte string of 2,111,000 bytes.
+
+There are many other options for `basho_bench`, especially for the
+`key_generator` and `value_generator` options.  Please see the
+`basho_bench` docs for further information.
+
+## Running basho_bench
+
+You can run `basho_bench` using the command:
+
+    /some/nice/dev/place/basho_bench/basho_bench /path/to/config/file
+
+... where `/path/to/config/file` is the path to your config file. (If
+you use an example from the `priv` dir, we recommend that you make a
+copy elsewhere, edit the copy, and then use the copy to run
+`basho_bench`.)
+
+You'll create a stats output directory, called `tests`, in the current
+working directory. (Add `{results_dir, "/some/output/dir"}.` to change
+the default!)
+
+Each time `basho_bench` is run, a new output stats directory is
+created in the `tests` directory.  The symbolic link `tests/current`
+will always point to the last `basho_bench` run's output.  But all
+prior results are always accessible!  Take a look in this directory
+for all of the output.
+
+## Generating some pretty graphs
+
+If you are using R, then the following command will create a graph:
+
+    Rscript --vanilla /some/nice/dev/place/basho_bench/basho_bench/priv/summary.r -i $CWD/tests/current
+
+If the `tests` directory is not in your current working dir (i.e. not
+in `$CWD`), then please alter the command accordingly.
+
+R will create the final results graph in `$CWD/tests/current/summary.png`.
+
+If you are using gnuplot, please look at
+`/some/nice/dev/place/basho_bench/basho_bench/Makefile` to see how to
+use gnuplot to create the final results graph.
+
+## An example graph
+
+So, without a lot of context about the **Machi system** or about the
+**basho_bench system** or about the ops being performed, here is an
+example graph that was created by R:
+
+![](basho.github.io/machi/images/basho_bench.example0.png)
+
+**Without context??*  How do I remember the context?
+
+My recommendation is: always keep the `.config` file together with the
+graph file.  In the `tests` directory, `basho_bench` will always make
+a copy of the config file used to generate the test data.
+
+This config tells you very little about the environment of the load
+generator machine or the Machi cluster, but ... you need to maintain
+that documentation yourself, please!  You'll thank me for that advice,
+someday, 11 months from now when you can't remember the details of
+that important test that you ran so very long ago.
+
+## Conclusion
+
+Really, we don't recommend using `basho_bench` for any serious
+performance measurement of Machi yet: Machi needs more maturity before
+it's reasonable to measure & judge its performance.  But stress
+testing is indeed useful for reasons other than measuring
+Nth-percentile latency of operation `flarfbnitz`.  We hope that this
+tutorial has been helpful!
+
+If you encounter any difficulty with this tutorial or with Machi,
+please open an issue/ticket at [GH Issues for
+Machi](https://github.com/basho/machi/issues) ... use the green "New
+issue" button.  There are bugs and misfeatures in the `basho_bench`
+plugin, sorry, but please help us fix them.
+
+> -Scott Lystig Fritchie
+> Machi Team @ Basho
--- a/TODO-shortterm.org
+++ b/TODO-shortterm.org
@ -39,7 +39,7 @@ func, and pattern match Erlang style in that func.

 ** DONE Adapt the projection-aware, CR-implementing client from demo-day
 ** DONE Add major comment sections to the CR-impl client
-** TODO Simple basho_bench driver, put some unscientific chalk on the benchtop
+** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop
 ** TODO Create parallel PULSE test for basic API plus chain manager repair
 ** TODO Add client-side vs. server-side checksum type, expand client API?
 ** TODO Add gproc and get rid of registered name rendezvous
--- a/priv/basho_bench.append-example.config
+++ b/priv/basho_bench.append-example.config
@ -7,9 +7,9 @@
 %{mode, {rate,20}}.
 {mode, max}.

-%% Runtime & reporting interval (seconds)
-{duration, 10}.
-{report_interval, 1}.
+%% Runtime & reporting interval
+{duration, 10}.         % minutes
+{report_interval, 1}.   % seconds

 %% Choose your number of worker procs
 %{concurrent, 1}.
@ -37,11 +37,9 @@
 {operations, [{append, 10}]}.

 %% For append, key = Machi file prefix name
-{key_generator, {concat_binary, <<"prefix">>,
-                                {to_binstr, "~w", {uniform_int, 30}}}}.
+{key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}.

 %% Increase size of value_generator_source_size if value_generator is big!!
 {value_generator_source_size, 2111000}.
 {value_generator, {fixed_bin, 32768}}.   %  32 KB
-%{value_generator, {fixed_bin, 1048576}}. % 1024 KB

--- a/priv/basho_bench.read-example.config
+++ b/priv/basho_bench.read-example.config
@ -7,9 +7,9 @@
 %{mode, {rate,20}}.
 {mode, max}.

-%% Runtime & reporting interval (seconds)
-{duration, 10}.
-{report_interval, 1}.
+%% Runtime & reporting interval
+{duration, 10}.         % minutes
+{report_interval, 1}.   % seconds

 %% Choose your number of worker procs
 %{concurrent, 1}.