machi/doc/Using-Basho-Bench.md
2015-05-21 15:12:46 +09:00

10 KiB

Using basho_bench to twiddle with Machi

"Twiddle"? Really, is that a word? (Yes, it is a real English word.)

Benchmarking Machi's performance ... no, don't do it.

Machi isn't ready for benchmark testing. Its public-facing API isn't finished yet. Its internal APIs aren't quite finished yet either. So any results of "benchmarking" effort is something that has even less value N months from now than the usual benchmarking effort.

However, there are uses for a benchmark tool. For example, one of my favorites is to put stress on a system. I don't care about average or 99th-percentile latencies, but I might care very much about behavior.

  • What happens if a Machi system is under moderate load, and then I stop one of the servers? What happens?

    • How quickly do the chain managers react?
    • How quickly do the client libraries within Machi react?
    • How quickly do the external client API libraries react?
  • What happens if a Machi system is under heavy load, for example, 100% CPU load. Not all 100% might be the Machi services. Some CPU consumption might be from the load generator, like basho_bench itself that is running on the same machine as a Machi server. Or perhaps it's a tiny C program that I wrote:

    main() { while (1) { ; } }

An example of how adding moderate stress can find weird bugs

The driver/plug-in module for basho_bench is only a few hours old. (I'm writing on Wednesday, 2015-05-20.) But just now, I configured my basho_bench config file to try to contact a Machi cluster of three nodes ... but really, only one was running. The client library, machi_cr_client.erl, has an extremely simple method for dealing with failed servers. I know it's simple and dumb, but that's OK in many cases.

However, basho_bench and the machi_cr_client.erl were acting very, very badly. I couldn't figure it out until I took a peek at my OS's dmesg output, namely: dmesg | tail. It said things like this:

Limiting closed port RST response from 690 to 50 packets per second
Limiting closed port RST response from 367 to 50 packets per second
Limiting closed port RST response from 101 to 50 packets per second
Limiting closed port RST response from 682 to 50 packets per second
Limiting closed port RST response from 467 to 50 packets per second

Well, isn't that interesting?

This system was running on a single OS X machine: my MacBook Pro laptop, running OS X 10.10 (Yosemite). I have seen that error before. And I know how to fix it.

  • Option 1: Change the client library config to ignore the Machi servers that I know will always be down during my experiment.
  • ** Option 2**: Use the following to change my OS's TCP stack RST behavior. (If a TCP port is not being listened to, the OS will send a RST packet to signal "connection refused".)

On OS X, the limit for RST packets is 50/second. The machi_cr_client.erl client can generate far more than 50/second, as the Limiting closed port RST response... messages above show. So, I used some brute-force to change the environment:

sudo sysctl -w net.inet.icmp.icmplim=20000

... and the problem disappeared.

Starting with basho_bench: a step-by-step tutorial

First, clone the basho_bench source code, then compile it. You will need Erlang/OTP version R16B or later to compile. I recommend using Erlang/OTP 17.x, because I've been doing my Machi development using 17.x.

cd /some/nice/dev/place
git clone https://github.com/basho/basho_bench.git
cd basho_bench
make

In order to create graphs of basho_bench output, you'll need installed one of the following:

  • R (the statistics package)
  • gnuplot

If you don't have either available on the machine(s) you're testing, but you do have R (or gnuplot) on some other machine Y, then you can copy the output files to machine Y and generate the graphs there.

Compiling the Machi source

First, clone the Machi source code, then compile it. You will need Erlang/OTP version 17.x to compile.

cd /some/nice/dev/place
git clone https://github.com/basho/machi.git
cd machi
make

Creating a basho_bench test configuration file.

There are a couple of example basho_bench configuration files in the Machi priv directory.

If you want a test to do both reading & writing ... well, the driver/plug-in is not mature enough to do it well. If you really want to, refer to the basho_bench docs for how to use the operations config option.

The basho_bench config file is configured in Erlang term format. Each configuration item is a 2-tuple followed by a period. Comments begin with a % character and continue to the end-of-line.

%% Mandatory: adjust this code path to top of your compiled Machi source distro
{code_paths, ["/Users/fritchie/b/src/machi"]}.
{driver, machi_basho_bench_driver}.

%% Chose your maximum rate (per worker proc, see 'concurrent' below)
{mode, {rate, 25}}.

%% Runtime & reporting interval 
{duration, 10}.         % minutes
{report_interval, 1}.   % seconds

%% Choose your number of worker procs
{concurrent, 5}.

%% Here's a chain of (up to) length 3, all on localhost
{machi_server_info,
 [
  {p_srvr,a,machi_flu1_client,"localhost",4444,[]},
  {p_srvr,b,machi_flu1_client,"localhost",4445,[]},
  {p_srvr,c,machi_flu1_client,"localhost",4446,[]}
 ]}.
{machi_ets_key_tab_type, set}.   % 'set' or 'ordered_set'

%% Workload-specific definitions follow....

%% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops
{operations, [{append, 10}]}.

%% For append, key = Machi file prefix name
{key_generator, {concat_binary, <<"prefix">>,
                                {to_binstr, "~w", {uniform_int, 30}}}}.

%% Increase size of value_generator_source_size if value_generator is big!!
{value_generator_source_size, 2111000}.
{value_generator, {fixed_bin, 32768}}.   %  32 KB

In summary:

  • Yes, you really need to change code_paths to be the same as your /some/nice/dev/place/basho_bench directory ... and that directory must be on the same machine(s) that you intend to run basho_bench.
  • Each worker process will have a rate limit of 25 ops/sec.
  • The test will run for 10 minutes and report stats every 1 second.
  • There are 5 concurrent worker processes. Each worker will concurrently issue commands from the operations list, within the workload throttle limit.
  • The Machi cluster is a collection of three servers, all on "localhost", and using TCP ports 4444-4446.
  • Don't change the machi_ets_key_tab_type
  • Our workload operation mix is 100% append operations.
  • The key generator for the append operation specifies the file prefix that will be chosen (at pseudo-random). In this case, we'll choose uniformly randomly between file prefix prefix0 and prefix29.
  • The values that we append will be fixed 32KB length, but they will be chosen from a random byte string of 2,111,000 bytes.

There are many other options for basho_bench, especially for the key_generator and value_generator options. Please see the basho_bench docs for further information.

Running basho_bench

You can run basho_bench using the command:

/some/nice/dev/place/basho_bench/basho_bench /path/to/config/file

... where /path/to/config/file is the path to your config file. (If you use an example from the priv dir, we recommend that you make a copy elsewhere, edit the copy, and then use the copy to run basho_bench.)

You'll create a stats output directory, called tests, in the current working directory. (Add {results_dir, "/some/output/dir"}. to change the default!)

Each time basho_bench is run, a new output stats directory is created in the tests directory. The symbolic link tests/current will always point to the last basho_bench run's output. But all prior results are always accessible! Take a look in this directory for all of the output.

Generating some pretty graphs

If you are using R, then the following command will create a graph:

Rscript --vanilla /some/nice/dev/place/basho_bench/basho_bench/priv/summary.r -i $CWD/tests/current

If the tests directory is not in your current working dir (i.e. not in $CWD), then please alter the command accordingly.

R will create the final results graph in $CWD/tests/current/summary.png.

If you are using gnuplot, please look at /some/nice/dev/place/basho_bench/basho_bench/Makefile to see how to use gnuplot to create the final results graph.

An example graph

So, without a lot of context about the Machi system or about the basho_bench system or about the ops being performed, here is an example graph that was created by R:

*Without context?? How do I remember the context?

My recommendation is: always keep the .config file together with the graph file. In the tests directory, basho_bench will always make a copy of the config file used to generate the test data.

This config tells you very little about the environment of the load generator machine or the Machi cluster, but ... you need to maintain that documentation yourself, please! You'll thank me for that advice, someday, 11 months from now when you can't remember the details of that important test that you ran so very long ago.

Conclusion

Really, we don't recommend using basho_bench for any serious performance measurement of Machi yet: Machi needs more maturity before it's reasonable to measure & judge its performance. But stress testing is indeed useful for reasons other than measuring Nth-percentile latency of operation flarfbnitz. We hope that this tutorial has been helpful!

If you encounter any difficulty with this tutorial or with Machi, please open an issue/ticket at GH Issues for Machi ... use the green "New issue" button. There are bugs and misfeatures in the basho_bench plugin, sorry, but please help us fix them.

-Scott Lystig Fritchie, Machi Team @ Basho