Add README.basho_bench.md

This commit is contained in:
Scott Lystig Fritchie 2015-05-20 21:03:51 +09:00
parent 9e41162e65
commit b44c88fb97
4 changed files with 266 additions and 10 deletions

258
README.basho_bench.md Normal file
View file

@ -0,0 +1,258 @@
# Using basho_bench to twiddle with Machi
"Twiddle"? Really, is that a word? (Yes, it is a real English word.)
## Benchmarking Machi's performance ... no, don't do it.
Machi isn't ready for benchmark testing. Its public-facing API isn't
finished yet. Its internal APIs aren't quite finished yet either. So
any results of "benchmarking" effort is something that has even less
value **N** months from now than the usual benchmarking effort.
However, there are uses for a benchmark tool. For example, one of my
favorites is to put **stress** on a system. I don't care about
average or 99th-percentile latencies, but I **might** care very much
about behavior.
* What happens if a Machi system is under moderate load, and then I
stop one of the servers? What happens?
* How quickly do the chain managers react?
* How quickly do the client libraries within Machi react?
* How quickly do the external client API libraries react?
* What happens if a Machi system is under heavy load, for example,
100% CPU load. Not all 100% might be the Machi services. Some CPU
consumption might be from the load generator, like `basho_bench`
itself that is running on the same machine as a Machi server. Or
perhaps it's a tiny C program that I wrote:
main()
{ while (1) { ; } }
## An example of how adding moderate stress can find weird bugs
The driver/plug-in module for `basho_bench` is only a few hours old.
(I'm writing on Wednesday, 2015-05-20.) But just now, I configured my
basho_bench config file to try to contact a Machi cluster of three
nodes ... but really, only one was running. The client library,
`machi_cr_client.erl`, has **an extremely simple** method for dealing
with failed servers. I know it's simple and dumb, but that's OK in
many cases.
However, `basho_bench` and the `machi_cr_client.erl` were acting very,
very badly. I couldn't figure it out until I took a peek at my OS's
`dmesg` output, namely: `dmesg | tail`. It said things like this:
Limiting closed port RST response from 690 to 50 packets per second
Limiting closed port RST response from 367 to 50 packets per second
Limiting closed port RST response from 101 to 50 packets per second
Limiting closed port RST response from 682 to 50 packets per second
Limiting closed port RST response from 467 to 50 packets per second
Well, isn't that interesting?
This system was running on a single OS X machine: my MacBook Pro
laptop, running OS X 10.10 (Yosemite). I have seen that error
before. And I know how to fix it.
* **Option 1**: Change the client library config to ignore the Machi
servers that I know will always be down during my experiment.
* ** Option 2**: Use the following to change my OS's TCP stack RST
behavior. (If a TCP port is not being listened to, the OS will
send a RST packet to signal "connection refused".)
On OS X, the limit for RST packets is 50/second. The
`machi_cr_client.erl` client can generate far more than 50/second, as
the `Limiting closed port RST response...` messages above show. So, I
used some brute-force to change the environment:
sudo sysctl -w net.inet.icmp.icmplim=20000
... and the problem disappeared.
## Starting with basho_bench: a step-by-step tutorial
First, clone the `basho_bench` source code, then compile it. You will
need Erlang/OTP version R16B or later to compile. I recommend using
Erlang/OTP 17.x, because I've been doing my Machi development using
17.x.
cd /some/nice/dev/place
git clone https://github.com/basho/basho_bench.git
cd basho_bench
make
In order to create graphs of `basho_bench` output, you'll need
installed one of the following:
* R (the statistics package)
* gnuplot
If you don't have either available on the machine(s) you're testing,
but you do have R (or gnuplot) on some other machine **Y**, then you can
copy the output files to machine **Y** and generate the graphs there.
## Compiling the Machi source
First, clone the `basho_bench` source code, then compile it. You will
need Erlang/OTP version 17.x to compile.
cd /some/nice/dev/place
git clone https://github.com/basho/machi.git
cd machi
make
## Creating a basho_bench test configuration file.
There are a couple of example `basho_bench` configuration files in the
Machi `priv` directory.
* [basho_bench.append-example.config](priv/basho_bench.append-example.config),
an example for writing Machi files.
* [basho_bench.read-example.config](priv/basho_bench.read-example.config),
an example for reading Machi files.
If you want a test to do both reading & writing ... well, the
driver/plug-in is not mature enough to do it **well**. If you really
want to, refer to the `basho_bench` docs for how to use the
`operations` config option.
The `basho_bench` config file is configured in Erlang term format.
Each configuration item is a 2-tuple followed by a period. Comments
begin with a `%` character and continue to the end-of-line.
%% Mandatory: adjust this code path to top of your compiled Machi source distro
{code_paths, ["/Users/fritchie/b/src/machi"]}.
{driver, machi_basho_bench_driver}.
%% Chose your maximum rate (per worker proc, see 'concurrent' below)
{mode, {rate, 25}}.
%% Runtime & reporting interval
{duration, 10}. % minutes
{report_interval, 1}. % seconds
%% Choose your number of worker procs
{concurrent, 5}.
%% Here's a chain of (up to) length 3, all on localhost
{machi_server_info,
[
{p_srvr,a,machi_flu1_client,"localhost",4444,[]},
{p_srvr,b,machi_flu1_client,"localhost",4445,[]},
{p_srvr,c,machi_flu1_client,"localhost",4446,[]}
]}.
{machi_ets_key_tab_type, set}. % 'set' or 'ordered_set'
%% Workload-specific definitions follow....
%% 10 parts 'append' operation + 0 parts anything else = 100% 'append' ops
{operations, [{append, 10}]}.
%% For append, key = Machi file prefix name
{key_generator, {concat_binary, <<"prefix">>,
{to_binstr, "~w", {uniform_int, 30}}}}.
%% Increase size of value_generator_source_size if value_generator is big!!
{value_generator_source_size, 2111000}.
{value_generator, {fixed_bin, 32768}}. % 32 KB
In summary:
* Yes, you really need to change `code_paths` to be the same as your
`/some/nice/dev/place/basho_bench` directory ... and that directory
must be on the same machine(s) that you intend to run `basho_bench`.
* Each worker process will have a rate limit of 25 ops/sec.
* The test will run for 10 minutes and report stats every 1 second.
* There are 5 concurrent worker processes. Each worker will
concurrently issue commands from the `operations` list, within the
workload throttle limit.
* The Machi cluster is a collection of three servers, all on
"localhost", and using TCP ports 4444-4446.
* Don't change the `machi_ets_key_tab_type`
* Our workload operation mix is 100% `append` operations.
* The key generator for the `append` operation specifies the file
prefix that will be chosen (at pseudo-random). In this case, we'll
choose uniformly randomly between file prefix `prefix0` and
`prefix29`.
* The values that we append will be fixed 32KB length, but they will
be chosen from a random byte string of 2,111,000 bytes.
There are many other options for `basho_bench`, especially for the
`key_generator` and `value_generator` options. Please see the
`basho_bench` docs for further information.
## Running basho_bench
You can run `basho_bench` using the command:
/some/nice/dev/place/basho_bench/basho_bench /path/to/config/file
... where `/path/to/config/file` is the path to your config file. (If
you use an example from the `priv` dir, we recommend that you make a
copy elsewhere, edit the copy, and then use the copy to run
`basho_bench`.)
You'll create a stats output directory, called `tests`, in the current
working directory. (Add `{results_dir, "/some/output/dir"}.` to change
the default!)
Each time `basho_bench` is run, a new output stats directory is
created in the `tests` directory. The symbolic link `tests/current`
will always point to the last `basho_bench` run's output. But all
prior results are always accessible! Take a look in this directory
for all of the output.
## Generating some pretty graphs
If you are using R, then the following command will create a graph:
Rscript --vanilla /some/nice/dev/place/basho_bench/basho_bench/priv/summary.r -i $CWD/tests/current
If the `tests` directory is not in your current working dir (i.e. not
in `$CWD`), then please alter the command accordingly.
R will create the final results graph in `$CWD/tests/current/summary.png`.
If you are using gnuplot, please look at
`/some/nice/dev/place/basho_bench/basho_bench/Makefile` to see how to
use gnuplot to create the final results graph.
## An example graph
So, without a lot of context about the **Machi system** or about the
**basho_bench system** or about the ops being performed, here is an
example graph that was created by R:
![](basho.github.io/machi/images/basho_bench.example0.png)
**Without context??* How do I remember the context?
My recommendation is: always keep the `.config` file together with the
graph file. In the `tests` directory, `basho_bench` will always make
a copy of the config file used to generate the test data.
This config tells you very little about the environment of the load
generator machine or the Machi cluster, but ... you need to maintain
that documentation yourself, please! You'll thank me for that advice,
someday, 11 months from now when you can't remember the details of
that important test that you ran so very long ago.
## Conclusion
Really, we don't recommend using `basho_bench` for any serious
performance measurement of Machi yet: Machi needs more maturity before
it's reasonable to measure & judge its performance. But stress
testing is indeed useful for reasons other than measuring
Nth-percentile latency of operation `flarfbnitz`. We hope that this
tutorial has been helpful!
If you encounter any difficulty with this tutorial or with Machi,
please open an issue/ticket at [GH Issues for
Machi](https://github.com/basho/machi/issues) ... use the green "New
issue" button. There are bugs and misfeatures in the `basho_bench`
plugin, sorry, but please help us fix them.
> -Scott Lystig Fritchie
> Machi Team @ Basho

View file

@ -39,7 +39,7 @@ func, and pattern match Erlang style in that func.
** DONE Adapt the projection-aware, CR-implementing client from demo-day ** DONE Adapt the projection-aware, CR-implementing client from demo-day
** DONE Add major comment sections to the CR-impl client ** DONE Add major comment sections to the CR-impl client
** TODO Simple basho_bench driver, put some unscientific chalk on the benchtop ** DONE Simple basho_bench driver, put some unscientific chalk on the benchtop
** TODO Create parallel PULSE test for basic API plus chain manager repair ** TODO Create parallel PULSE test for basic API plus chain manager repair
** TODO Add client-side vs. server-side checksum type, expand client API? ** TODO Add client-side vs. server-side checksum type, expand client API?
** TODO Add gproc and get rid of registered name rendezvous ** TODO Add gproc and get rid of registered name rendezvous

View file

@ -7,9 +7,9 @@
%{mode, {rate,20}}. %{mode, {rate,20}}.
{mode, max}. {mode, max}.
%% Runtime & reporting interval (seconds) %% Runtime & reporting interval
{duration, 10}. {duration, 10}. % minutes
{report_interval, 1}. {report_interval, 1}. % seconds
%% Choose your number of worker procs %% Choose your number of worker procs
%{concurrent, 1}. %{concurrent, 1}.
@ -37,11 +37,9 @@
{operations, [{append, 10}]}. {operations, [{append, 10}]}.
%% For append, key = Machi file prefix name %% For append, key = Machi file prefix name
{key_generator, {concat_binary, <<"prefix">>, {key_generator, {to_binstr, "prefix~w", {uniform_int, 30}}}.
{to_binstr, "~w", {uniform_int, 30}}}}.
%% Increase size of value_generator_source_size if value_generator is big!! %% Increase size of value_generator_source_size if value_generator is big!!
{value_generator_source_size, 2111000}. {value_generator_source_size, 2111000}.
{value_generator, {fixed_bin, 32768}}. % 32 KB {value_generator, {fixed_bin, 32768}}. % 32 KB
%{value_generator, {fixed_bin, 1048576}}. % 1024 KB

View file

@ -7,9 +7,9 @@
%{mode, {rate,20}}. %{mode, {rate,20}}.
{mode, max}. {mode, max}.
%% Runtime & reporting interval (seconds) %% Runtime & reporting interval
{duration, 10}. {duration, 10}. % minutes
{report_interval, 1}. {report_interval, 1}. % seconds
%% Choose your number of worker procs %% Choose your number of worker procs
%{concurrent, 1}. %{concurrent, 1}.