Add README.basho_bench.md

2015-05-20 21:03:51 +09:00 · 2015-05-20 21:03:51 +09:00 · b44c88fb97
commit b44c88fb97
parent 9e41162e65
4 changed files with 266 additions and 10 deletions
--- a/README.basho_bench.md
+++ b/README.basho_bench.md
@ -0,0 +1,258 @@
 # Using basho_bench to twiddle with Machi
 "Twiddle"?  Really, is that a word?  (Yes, it is a real English word.)
 ## Benchmarking Machi's performance ... no, don't do it.
 Machi isn't ready for benchmark testing.  Its public-facing API isn't
 finished yet.  Its internal APIs aren't quite finished yet either.  So
 any results of "benchmarking" effort is something that has even less
 value **N** months from now than the usual benchmarking effort.
 However, there are uses for a benchmark tool.  For example, one of my
 favorites is to put **stress** on a system.  I don't care about
 average or 99th-percentile latencies, but I **might** care very much
 about behavior.
 * What happens if a Machi system is under moderate load, and then I
  stop one of the servers?  What happens?
    * How quickly do the chain managers react?
    * How quickly do the client libraries within Machi react?
    * How quickly do the external client API libraries react?
 * What happens if a Machi system is under heavy load, for example,
  100% CPU load.  Not all 100% might be the Machi services.  Some CPU
  consumption might be from the load generator, like `basho_bench`
  itself that is running on the same machine as a Machi server.  Or
  perhaps it's a tiny C program that I wrote:
    main()
    { while (1) { ; } }
 ## An example of how adding moderate stress can find weird bugs
 The driver/plug-in module for `basho_bench` is only a few hours old.
 (I'm writing on Wednesday, 2015-05-20.)  But just now, I configured my
 basho_bench config file to try to contact a Machi cluster of three
 nodes ... but really, only one was running.  The client library,
 `machi_cr_client.erl`, has **an extremely simple** method for dealing
 with failed servers.  I know it's simple and dumb, but that's OK in
 many cases.
 However, `basho_bench` and the `machi_cr_client.erl` were acting very,
 very badly.  I couldn't figure it out until I took a peek at my OS's
 `dmesg` output, namely: `dmesg | tail`.  It said things like this:
    Limiting closed port RST response from 690 to 50 packets per second
    Limiting closed port RST response from 367 to 50 packets per second
    Limiting closed port RST response from 101 to 50 packets per second
    Limiting closed port RST response from 682 to 50 packets per second
    Limiting closed port RST response from 467 to 50 packets per second
 Well, isn't that interesting?
 This system was running on a single OS X machine: my MacBook Pro
 laptop, running OS X 10.10 (Yosemite).  I have seen that error
 before.  And I know how to fix it.
 * **Option 1**: Change the client library config to ignore the Machi
    servers that I know will always be down during my experiment.
 * ** Option 2**: Use the following to change my OS's TCP stack RST
    behavior.  (If a TCP port is not being listened to, the OS will
    send a RST packet to signal "connection refused".)
 On OS X, the limit for RST packets is 50/second.  The
 `machi_cr_client.erl` client can generate far more than 50/second, as
 the `Limiting closed port RST response...` messages above show.  So, I
 used some brute-force to change the environment:
    sudo sysctl -w net.inet.icmp.icmplim=20000
 ... and the problem disappeared.
 ## Starting with basho_bench: a step-by-step tutorial
 First, clone the `basho_bench` source code, then compile it.  You will
 need Erlang/OTP version R16B or later to compile.  I recommend using
 Erlang/OTP 17.x, because I've been doing my Machi development using
 17.x.
    cd /some/nice/dev/place
    git clone https://github.com/basho/basho_bench.git
    cd basho_bench
    make
 In order to create graphs of `basho_bench` output, you'll need
 installed one of the following:
 * R (the statistics package)
 * gnuplot
 If you don't have either available on the machine(s) you're testing,
 but you do have R (or gnuplot) on some other machine **Y**, then you can
 copy the output files to machine **Y** and generate the graphs there.
 ## Compiling the Machi source
 First, clone the `basho_bench` source code, then compile it.  You will
 need Erlang/OTP version 17.x to compile.
    cd /some/nice/dev/place
    git clone https://github.com/basho/machi.git
    cd machi
    make
 ## Creating a basho_bench test configuration file.
 There are a couple of example `basho_bench` configuration files in the
 Machi `priv` directory.
 * [basho_bench.append-example.config](priv/basho_bench.append-example.config),
  an example for writing Machi files.
 * [basho_bench.read-example.config](priv/basho_bench.read-example.config),
  an example for reading Machi files.
 If you want a test to do both reading & writing ... well, the
 driver/plug-in is not mature enough to do it **well**.  If you really
 want to, refer to the `basho_bench` docs for how to use the
 `operations` config option.
 The `basho_bench` config file is configured in Erlang term format.
 Each configuration item is a 2-tuple followed by a period.  Comments
 begin with a `%` character and continue to the end-of-line.
    %% Mandatory: adjust this code path to top of your compiled Machi source distro
    {code_paths, ["/Users/fritchie/b/src/machi"]}.
    {driver, machi_basho_bench_driver}.
    %% Chose your maximum rate (per worker proc, see 'concurrent' below)
    {mode, {rate, 25}}.
    %% Runtime & reporting interval 
    {duration, 10}.         % minutes
    {report_interval, 1}.   % seconds
    %% Choose your number of worker procs
    {concurrent, 5}.
    %% Here's a chain of (up to) length 3, all on localhost
    {machi_server_info,
     [
      {p_srvr,a,machi_flu1_client,"localhost",4444,[]},
      {p_srvr,b,machi_flu1_client,"localhost",4445,[]},
      {p_srvr,c,machi_flu1_client,"localhost",4446,[]}
     ]}.
    {machi_ets_key_tab_type, set}.   % 'set' or 'ordered_set'
    %% Workload-specific definitions follow....
    %% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops
    {operations, [{append, 10}]}.
    %% For append, key = Machi file prefix name
    {key_generator, {concat_binary, <<"prefix">>,
                                    {to_binstr, "~w", {uniform_int, 30}}}}.
    %% Increase size of value_generator_source_size if value_generator is big!!
    {value_generator_source_size, 2111000}.
    {value_generator, {fixed_bin, 32768}}.   %  32 KB
 In summary:
 * Yes, you really need to change `code_paths` to be the same as your
  `/some/nice/dev/place/basho_bench` directory ... and that directory
  must be on the same machine(s) that you intend to run `basho_bench`.
 * Each worker process will have a rate limit of 25 ops/sec.
 * The test will run for 10 minutes and report stats every 1 second.
 * There are 5 concurrent worker processes.  Each worker will
  concurrently issue commands from the `operations` list, within the
  workload throttle limit.
 * The Machi cluster is a collection of three servers, all on
  "localhost", and using TCP ports 4444-4446.
 * Don't change the `machi_ets_key_tab_type`
 * Our workload operation mix is 100% `append` operations.
 * The key generator for the `append` operation specifies the file
  prefix that will be chosen (at pseudo-random).  In this case, we'll
  choose uniformly randomly between file prefix `prefix0` and
  `prefix29`.
 * The values that we append will be fixed 32KB length, but they will
  be chosen from a random byte string of 2,111,000 bytes.
 There are many other options for `basho_bench`, especially for the
 `key_generator` and `value_generator` options.  Please see the
 `basho_bench` docs for further information.
 ## Running basho_bench
 You can run `basho_bench` using the command:
    /some/nice/dev/place/basho_bench/basho_bench /path/to/config/file
 ... where `/path/to/config/file` is the path to your config file. (If
 you use an example from the `priv` dir, we recommend that you make a
 copy elsewhere, edit the copy, and then use the copy to run
 `basho_bench`.)
 You'll create a stats output directory, called `tests`, in the current
 working directory. (Add `{results_dir, "/some/output/dir"}.` to change
 the default!)
 Each time `basho_bench` is run, a new output stats directory is
 created in the `tests` directory.  The symbolic link `tests/current`
 will always point to the last `basho_bench` run's output.  But all
 prior results are always accessible!  Take a look in this directory
 for all of the output.
 ## Generating some pretty graphs
 If you are using R, then the following command will create a graph:
    Rscript --vanilla /some/nice/dev/place/basho_bench/basho_bench/priv/summary.r -i $CWD/tests/current
 If the `tests` directory is not in your current working dir (i.e. not
 in `$CWD`), then please alter the command accordingly.
 R will create the final results graph in `$CWD/tests/current/summary.png`.
 If you are using gnuplot, please look at
 `/some/nice/dev/place/basho_bench/basho_bench/Makefile` to see how to
 use gnuplot to create the final results graph.
 ## An example graph
 So, without a lot of context about the **Machi system** or about the
 **basho_bench system** or about the ops being performed, here is an
 example graph that was created by R:
 ![](basho.github.io/machi/images/basho_bench.example0.png)
 **Without context??*  How do I remember the context?
 My recommendation is: always keep the `.config` file together with the
 graph file.  In the `tests` directory, `basho_bench` will always make
 a copy of the config file used to generate the test data.
 This config tells you very little about the environment of the load
 generator machine or the Machi cluster, but ... you need to maintain
 that documentation yourself, please!  You'll thank me for that advice,
 someday, 11 months from now when you can't remember the details of
 that important test that you ran so very long ago.
 ## Conclusion
 Really, we don't recommend using `basho_bench` for any serious
 performance measurement of Machi yet: Machi needs more maturity before
 it's reasonable to measure & judge its performance.  But stress
 testing is indeed useful for reasons other than measuring
 Nth-percentile latency of operation `flarfbnitz`.  We hope that this
 tutorial has been helpful!
 If you encounter any difficulty with this tutorial or with Machi,
 please open an issue/ticket at [GH Issues for
 Machi](https://github.com/basho/machi/issues) ... use the green "New
 issue" button.  There are bugs and misfeatures in the `basho_bench`
 plugin, sorry, but please help us fix them.
 > -Scott Lystig Fritchie
 > Machi Team @ Basho
--- a/TODO-shortterm.org
+++ b/TODO-shortterm.org
@ -39,7 +39,7 @@ func, and pattern match Erlang style in that func.
 ** DONE Adapt the projection-aware, CR-implementing client from demo-day
 ** DONE Add major comment sections to the CR-impl client
-** TODO Simple basho_bench driver, put some unscientific chalk on the benchtop
+** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop
 ** TODO Create parallel PULSE test for basic API plus chain manager repair
 ** TODO Add client-side vs. server-side checksum type, expand client API?
 ** TODO Add gproc and get rid of registered name rendezvous
--- a/priv/basho_bench.append-example.config
+++ b/priv/basho_bench.append-example.config
@ -7,9 +7,9 @@
 %{mode, {rate,20}}.
 {mode, max}.
-%% Runtime & reporting interval (seconds)
+%% Runtime & reporting interval
-{duration, 10}.
+{duration, 10}.         % minutes
-{report_interval, 1}.
+{report_interval, 1}.   % seconds
 %% Choose your number of worker procs
 %{concurrent, 1}.
@ -37,11 +37,9 @@
 {operations, [{append, 10}]}.
 %% For append, key = Machi file prefix name
-{key_generator, {concat_binary, <<"prefix">>,
+{key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}.
                                {to_binstr, "~w", {uniform_int, 30}}}}.
 %% Increase size of value_generator_source_size if value_generator is big!!
 {value_generator_source_size, 2111000}.
 {value_generator, {fixed_bin, 32768}}.   %  32 KB
 %{value_generator, {fixed_bin, 1048576}}. % 1024 KB
--- a/priv/basho_bench.read-example.config
+++ b/priv/basho_bench.read-example.config
@ -7,9 +7,9 @@
 %{mode, {rate,20}}.
 {mode, max}.
-%% Runtime & reporting interval (seconds)
+%% Runtime & reporting interval
-{duration, 10}.
+{duration, 10}.         % minutes
-{report_interval, 1}.
+{report_interval, 1}.   % seconds
 %% Choose your number of worker procs
 %{concurrent, 1}.