Add file0_test.escript (and big squash)

Small cleanups

Small cleanups

Refactoring argnames & order for more consistency

Add server-side-calculated MD5 checksum + logging

file:consult() style checksum management, too slow! 513K csums = 105 seconds, ouch

Much faster checksum recording

Add checksum_list. Alas, line-by-line I/O is slow, neh?

Much faster checksum listing

Add file0_verify_checksums.escript and supporting code

Adjust escript +A and -smp flags

Add file0_compare_filelists.escript

First draft of file0_repair_server.escript

First draft of file0_repair_server.escript, part 2

WIP of file0_repair_server.escript, part 3

WIP of file0_repair_server.escript, part 4

Basic repair works, it seems, hooray!

When checksum file ordering is different, try a cheap(?) 'cmp' on sorted results instead

Add README.md

Initial import of szone_chash.erl

Add file0_cc_make_projection.escript and supporting code

Add file0_cc_map_prefix.escript and supporting code

Change think-o: hash output is a chain, silly boy

Add file0_cc_1file_write_redundant.escript and support

Add file0_cc_read_client.escript and supporting code

Add examples/servers.map & file0_start_servers.escript

WIP: working on file0_cc_migrate_files.escript

File migration finished, works, yay!

Add basic 'what am I' docs to each script

Add file0_server_daemon.escript

Minor fixes

Fix broken unit test

Add basho_bench run() commands for append & read ops with projection

Add to examples dir

WIP: erasure coding hack, part 1

Fix broken unit test

WIP: erasure coding hack, part 2

WIP: erasure coding hack, part 3, EC data write is finished!

WIP: erasure coding hack, part 4, EC data read still in progress

WIP: erasure coding hack, part 5, EC data read still in progress

WIP: erasure coding hack, part 5b, EC data read still in progress

WIP: erasure coding hack, EC data read finished!

README update, part 1

README update, part 2

Oops, put back the printed ouput for file-write-client and 1file-write-redundant-client

README update, part 3

Fix 'user' output bug in list-client

Ugly hacks to get output/no-output from write clients

Clean up minor output bugs

Clean up minor output bugs, part 2

README update, part 4

Clean up minor output bugs, part 3

Clean up minor output bugs, part 5

Clean up minor output bugs, part 6

README update, part 6

README update, part 7

README update, part 7

README update, part 8

Final edits/fixes for demo day

Fix another oops in the README/demo day script
This commit is contained in:
Scott Lystig Fritchie 2014-12-21 17:52:38 +09:00
parent ed762b71b3
commit 29868678a4
35 changed files with 2799 additions and 145 deletions

View file

@ -0,0 +1,233 @@
## Basic glossary/summary/review for the impatient
* **Machi**: An immutable, append-only distributed file storage service.
* For more detail, see: [https://github.com/basho/internal_wiki/wiki/RFCs#riak-cs-rfcs](https://github.com/basho/internal_wiki/wiki/RFCs#riak-cs-rfcs)
* **Machi cluster**: A small cluster of machines, e.g. 2 or 3, which form a single Machi cluster.
* All cluster data is replicated via Chain Replication.
* In nominal/full-repair state, all servers store exactly the same copies of all files.
* **Strong Consistency** and **CP-style operation**: Not here, please look elsewhere. This is an eventual consistency joint.
* **Sequencer**: Logical component that specifies where any new data will be stored.
* For each chunk of data written to a Machi server, the sequencer is solely responsible for assigning a file name and byte offset within that file.
* [http://martinfowler.com/bliki/TwoHardThings.html](http://martinfowler.com/bliki/TwoHardThings.html)
* **Server**: A mostly-dumb file server, with fewer total functions than an NFS version 2 file server.
* Compare with [https://tools.ietf.org/html/rfc1094#section-2.2](https://tools.ietf.org/html/rfc1094#section-2.2)
* **Chunk**: The basic storage unit of Machi: an ordered sequence of bytes. Each chunk is stored together with its MD5 checksum.
* Think of a single Erlang binary term...
* **File**: A strictly-ordered collection of chunks.
* **Chain Replication**: A variation of master/slave replication where writes and reads are strictly ordered in a "chain-like" manner
* **Chain Repair**: Identify chunks that are missing between servers in a chain and then fix them.
* **Cluster of clusters**: A desirable rainbow unicorn with purple fur and files automagically distributed across many Machi clusters, scalable to infinite size within a single data center.
* **Projection**: A data structure which specifies file distribution and file migration across a cluster of clusters.
* Also used by a single Machi cluster to determine the current membership & status of the cluster & its chain.
* **Erasure Coding**: A family of techniques of encoding data with redundant data. The redundant data is smaller than the original data, yet can be used to reconstruct the original data if some original parts are lost/destroyed.
## Present in this POC
* Basic sequencer and file server
* Sequencer and file server are combined for efficiency
* Basic ops: append chunk to file (with implicit sequencer assignment), read chunk, list files, fetch chunk checksums (by file)
* Chain replication ops: write chunk (location already determined by sequencer), delete file (cluster-of-clusters file migration use only), truncate file (cluster-of-clusters + erasure coding use only)
* Basic file clients
* Write chunks to a single server (via sequencer) and read from a single server.
* Write and read chunks to multiple servers, using both projection and chain replication
* All chain replication and projection features are implemented on the client side.
* Basic cluster-of-cluster functions
* Hibari style consistent hashing for file uploads & downloads
* There is no ring.
* Arbitrary weighted distribution of files across multiple clusters
* Append/write/read ops are automatically routed to the correct chain/cluster.
* File migration/rebalancing
* Administrative features
* Verify chunk checksums in a file.
* Find and fix missing chunks between any two servers in a single cluster.
* Normal cluster operations are uninterrupted.
* Create new projection definitions
* Demonstrate file -> cluster mapping via projection
* Identify files in current projection that must be moved in the new projection
* ... and move them to the new projection
* Normal cluster operations are uninterrupted.
* Erasure coding implementation
* Inefficient & just barely enough to work
* No NIF or Erlang port integration
* Reed-Solomon 10,4 encoding only
* 10 independent sub-chunks of original data
* 4 independent sub-chunks of parity data
* Use any 10 sub-chunks to reconstruct all original and parity data
* Encode an entire file and write sub-chunks to a secondary projection of 14 servers
* Download chunks automatically via regular projection (for replicated chunks) or a secondary projection (for EC-coded sub-chunks).
## Missing from this POC
* Chunk immutability is not enforced
* Read-repair ... no good reason, I just ran out of 2014 to implement it
* Per-chunk metadata
* Repair of erasure coded missing blocks: all major pieces are there, but I ran out of 2014 here, also.
* Automation: all chain management functions are manual, no health/status monitoring, etc.
## Simple examples
Run a server:
./file0_server.escript file0_server 7071 /tmp/path-to-data-dir
Upload chunks of a local file in 1 million byte chunks:
./file0_write_client.escript localhost 7071 1000000 someprefix /path/to/some/local/file
To fetch the chunks, we need a list of filenames, offsets, and chunk
sizes. The easiest way to get such a list is to save the output from
`./file0_write_client.escript` when uploading the file in the first
place. For example, save the following command's output into some
file e.g. `/tmp/input`. Then:
./file0_read_client.escript localhost 7071 /tmp/input
## More details
### Server protocol
All negative server responses start with `ERROR` unless otherwise noted.
* No distributed Erlang message passing ... "just ASCII on a socket".
* `A LenHex Prefix` ... Append a chunk of data to a file with name prefix `Prefix`. File name & offset will be assigned by server-side sequencer.
* `OK OffsetHex Filename`
* `R OffsetHex LenHex Filename` ... Read a chunk from `OffsetHex` for length `Lenhex` from file `FileName`.
* `OK [chunk follows]`
* `L` ... List all files stored on the server, one per line
* `OK [list of files & file sizes, ending with .]`
* `C Filename` ... Fetch chunk offset & checksums
* `OK [list of chunk offset, lengths, and checksums, ending with .]`
* `QUIT` ... Close connection
* `OK`
* `W-repl OffsetHex LenHex Filename` ... Write/replicate a chunk of data to a file with an `OffsetHex` that has already been assigned by a sequencer.
* `OK`
* `TRUNC-hack--- Filename` ... Truncate a file (after erasure encoding is successful).
* `OK`
### Start a server, upload & download chunks to a single server
Mentioned above.
### Upload chunks to a cluster (members specified manually)
./file0_1file_write_redundant.escript 1000000 otherprefix /path/to/local/file localhost 7071 localhost 7072
### List files (and file sizes) on a server
./file0_list.escript localhost 7071
### List all chunk sizes & checksums for a single file
setenv M_FILE otherprefix.KCL2CC9.1
./file0_checksum_list.escript localhost 7071 $M_FILE
### Verify checksums in a file
./file0_verify_checksums.escript localhost 7071 $M_FILE
### Compare two servers and file all missing files
TODO script is broken, but is just a proof-of-concept for early repair work.
$ ./file0_compare_filelists.escript localhost 7071 localhost 7072
{legend, {file, list_of_servers_without_file}}.
{all, [{"localhost","7071"},{"localhost","7072"}]}.
### Repair server A -> server B, replicating all missing data
./file0_repair_server.escript localhost 7071 localhost 7072 verbose check
./file0_repair_server.escript localhost 7071 localhost 7072 verbose repair
./file0_repair_server.escript localhost 7071 localhost 7072 verbose repair
And then repair in the reverse direction:
./file0_repair_server.escript localhost 7072 localhost 7071 verbose check
./file0_repair_server.escript localhost 7072 localhost 7071 verbose repair
./file0_repair_server.escript localhost 7072 localhost 7071 verbose repair
### Projections: how to create & use them
* Easiest use: all projection data is stored in a single directory; use path to the directory for all PoC scripts
* Files in the projection directory:
* `chains.map`
* Define all members of all chains at the current time
* `*.weight`
* Define chain capacity weights
* `*.proj`
* Define a projection (based on previous projection file + weight file)
#### Sample chains.map file
%% Chain names (key): please use binary
%% Chain members (value): two-tuples of binary hostname and integer TCP port
{<<"chain1">>, [{<<"localhost">>, 7071}]}.
{<<"chain10">>, [{<<"localhost">>, 7071}, {<<"localhost">>, 7072}]}.
#### Sample weight file
%% Please use binaries for chain names
[
{<<"chain1">>, 1000}
].
#### Sample projection file
%% Created by:
% ./file0_cc_make_projection.escript new examples/weight_map1.weight examples/1.proj
{projection,1,0,[{<<"chain1">>,1.0}],undefined,undefined,undefined,undefined}.
### Create the `examples/1.proj` file
./file0_cc_make_projection.escript new examples/weight_map_ec1.weight examples/ec1.proj
### Demo consistent hashing of file prefixes: what chain do we map to?
./file0_cc_map_prefix.escript examples/1.proj foo-prefix
./file0_cc_map_prefix.escript examples/3.proj foo-prefix
./file0_cc_map_prefix.escript examples/56.proj foo-prefix
### Write replica chunks to a chain via projection
./file0_cc_1file_write_redundant.escript 4444 some-prefix /path/to/local/file ./examples/1.proj
### Read replica chunks to a chain via projection
./file0_cc_read_client.escript examples/1.proj /tmp/input console
### Change of projection: migrate 1 server's files from old to new projection
./file0_cc_migrate_files.escript examples/2.proj localhost 7071 verbose check
./file0_cc_migrate_files.escript examples/2.proj localhost 7071 verbose repair delete-source
### Erasure encode a file, then delete the original replicas, then fetch via EC sub-chunks
The current prototype fetches the file directly from the file system, rather than fetching the file over the network via the `RegularProjection` projection definition.
WARNING: For this part of demo to work correctly, `$M_PATH` must be a
file that exists on `localhost 7071` and not on `localhost 7072` or
any other server.
setenv M_PATH /tmp/SAM1/seq-tests/data1/otherprefix.77Z34TQ.1
./file0_cc_ec_encode.escript examples/1.proj $M_PATH /tmp/zooenc examples/ec1.proj verbose check nodelete-tmp
less $M_PATH
./file0_cc_ec_encode.escript examples/1.proj $M_PATH /tmp/zooenc examples/ec1.proj verbose repair progress delete-source
less $M_PATH
./file0_read_client.escript localhost 7071 /tmp/input
./file0_read_client.escript localhost 7072 /tmp/input
./file0_cc_read_client.escript examples/1.proj /tmp/input console examples/55.proj

View file

@ -0,0 +1,26 @@
#!/bin/sh
trap "rm -f $1.tmp $2.tmp" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
if [ $# -ne 6 ]; then
echo usage: $0 1. 2. 3. 4. 5. 6.
exit 1
fi
sort -u $1 > $1.tmp
sort -u $2 > $2.tmp
cmp -s $1.tmp $2.tmp
if [ $? -eq 0 ]; then
## Output is identical.
touch $6
exit 0
fi
# print only lines joinable on field 1 (offset)
join -1 1 -2 1 $1.tmp $2.tmp > $3
# print only lines field 1 (offset) present only in file 1
join -v 1 -1 1 -2 1 $1.tmp $2.tmp > $4
# print only lines field 1 (offset) present only in file 2
join -v 2 -1 1 -2 1 $1.tmp $2.tmp > $5

View file

@ -0,0 +1,33 @@
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-record(projection, {
%% hard state
epoch :: non_neg_integer(),
last_epoch :: non_neg_integer(),
float_map,
last_float_map,
%% soft state
migrating :: boolean(),
tree,
last_tree
}).
-define(SHA_MAX, (1 bsl (20*8))).

View file

@ -0,0 +1,4 @@
%% Created by:
% ./file0_cc_make_projection.escript new examples/weight_map1.weight examples/1.proj
{projection,1,0,[{<<"chain1">>,1.0}],undefined,undefined,undefined,undefined}.

View file

@ -0,0 +1,7 @@
% Created by:
% ./file0_cc_make_projection.escript examples/1.proj examples/weight_map2.weight examples/2.proj
{projection,2,1,
[{<<"chain1">>,0.5},{<<"chain2">>,0.5}],
[{<<"chain1">>,1.0}],
undefined,undefined,undefined}.

View file

@ -0,0 +1,10 @@
% Created by:
% ./file0_cc_make_projection.escript examples/2.proj examples/weight_map3.weight examples/3.proj
{projection,3,2,
[{<<"chain1">>,0.40816326530612246},
{<<"chain3">>,0.09183673469387754},
{<<"chain2">>,0.40816326530612246},
{<<"chain3">>,0.09183673469387754}],
[{<<"chain1">>,0.5},{<<"chain2">>,0.5}],
undefined,undefined,undefined}.

View file

@ -0,0 +1 @@
{projection,1,0,[{<<"chain10">>,1.0}],undefined,undefined,undefined,undefined}.

View file

@ -0,0 +1,5 @@
{projection,2,1,
[{<<"chain10">>,0.9090909090909091},
{<<"chain11">>,0.09090909090909094}],
[{<<"chain10">>,1.0}],
undefined,undefined,undefined}.

View file

@ -0,0 +1,27 @@
%% Chain names (key): please use binary
%% Chain members (value): two-tuples of binary hostname and integer TCP port
{<<"chain1">>, [{<<"localhost">>, 7071}]}.
{<<"chain2">>, [{<<"localhost">>, 7072}]}.
{<<"chain3">>, [{<<"localhost">>, 7073}]}.
{<<"chain4">>, [{<<"localhost">>, 7074}]}.
{<<"chain5">>, [{<<"localhost">>, 7075}]}.
{<<"chain6">>, [{<<"localhost">>, 7076}]}.
{<<"chain10">>, [{<<"localhost">>, 7071}, {<<"localhost">>, 7072}]}.
{<<"chain11">>, [{<<"localhost">>, 7073}, {<<"localhost">>, 7074}]}.
{<<"chain12">>, [{<<"localhost">>, 7075}, {<<"localhost">>, 7076}]}.
%% HACK ALERT
%% I'm being lazy here -- normally all members of a chain contain identical
%% data. In case of erasure coding, I'm using this hack to demonstrate
%% placement policy. The **caller** will interpret the chain membership
%% differently: for EC chains, the call assumes that each server in the
%% chain list will store one copy of one of the data/parity stripes.
{<<"ec-1-rs-10-4">>, [{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072},
{<<"localhost">>, 7072}, {<<"localhost">>, 7072}]}.

View file

@ -0,0 +1,6 @@
%% Created by:
%% ./file0_cc_make_projection.escript new examples/weight_map_ec1.weight examples/ec1.proj
{projection,1,0,
[{<<"ec-1-rs-10-4">>,1.0}],
undefined,undefined,undefined,undefined}.

View file

@ -0,0 +1,9 @@
%% Server names (key): please use {binary(), integer()}
%% Proplist (value): data_dir is mandatory
{ {<<"localhost">>, 7071}, [{data_dir, "/tmp/file0/server1"}] }.
{ {<<"localhost">>, 7072}, [{data_dir, "/tmp/file0/server2"}] }.
{ {<<"localhost">>, 7073}, [{data_dir, "/tmp/file0/server3"}] }.
{ {<<"localhost">>, 7074}, [{data_dir, "/tmp/file0/server4"}] }.
{ {<<"localhost">>, 7075}, [{data_dir, "/tmp/file0/server5"}] }.
{ {<<"localhost">>, 7076}, [{data_dir, "/tmp/file0/server6"}] }.

View file

@ -0,0 +1,5 @@
%% Please use binaries for chain names
[
{<<"chain1">>, 1000}
].

View file

@ -0,0 +1,6 @@
%% Please use binaries for chain names
[
{<<"chain1">>, 1000},
{<<"chain2">>, 1000}
].

View file

@ -0,0 +1,7 @@
%% Please use binaries for chain names
[
{<<"chain1">>, 1000},
{<<"chain2">>, 1000},
{<<"chain3">>, 450}
].

View file

@ -0,0 +1,5 @@
%% Please use binaries for chain names
[
{<<"chain10">>, 1000}
].

View file

@ -0,0 +1,6 @@
%% Please use binaries for chain names
[
{<<"chain10">>, 1000},
{<<"chain11">>, 100}
].

View file

@ -0,0 +1,5 @@
%% Please use binaries for chain names
[
{<<"ec-1-rs-10-4">>, 1000}
].

View file

@ -1,30 +1,40 @@
%% {mode, {rate,10}}.
{mode, max}.
{mode, {rate,50}}.
%% {mode, max}.
{duration, 1}.
%{duration, 15}.
{report_interval, 1}.
{concurrent, 46}.
%% {concurrent, 1}.
{concurrent, 32}.
{code_paths, ["/tmp"]}.
{driver, file0}.
% {file0_start_listener, 7071}.
{file0_ip_list, [{{127,0,0,1}, 7071}]}.
%%% {file0_start_listener, {7071, "/tmp/SAM1/seq-tests/data"}}.
{value_generator_source_size, 16001001}.
{key_generator, <<"32k-archive">>}.
%{key_generator, <<"32k-archive">>}.
{key_generator, {to_binstr, "32k-prefix-~w", {uniform_int, 1000}}}.
{file0_projection_path, "/Users/fritchie/b/src/misc/sequencer/examples/1.proj"}.
%% variations of put
{operations, [{keygen_valuegen_then_null, 1}]}.
%{operations, [{keygen_valuegen_then_null, 1}]}.
{value_generator, {fixed_bin, 32768}}.
%{operations, [{append_local_server, 1}]}.
{operations, [{append_remote_server, 1}]}.
%{operations, [{append_remote_server, 1}]}.
%{operations, [{cc_append_remote_server, 1}]}.
{operations, [{append_remote_server, 1}, {cc_append_remote_server, 1}]}.
%{operations, [{append_local_server, 1}, {append_remote_server, 1}]}.
%% variations of get
%% %% perl -e 'srand(time); $chunksize = 32768 ; foreach $file (@ARGV) { @fi = stat($file); $size = $fi[7]; $base = $file; $base =~ s|[^/]*/||g; for ($i = 0; $i < $size; $i += $chunksize) { printf "%d R %016x %08x %s\n", rand(999999), $i, $chunksize, $base; } }' data/* | sort -n | awk '{print $2, $3, $4, $5}' > /tmp/input.txt
%% perl -e 'srand(time); $chunksize = 32768 ; foreach $file (@ARGV) { @fi = stat($file); $size = $fi[7]; $base = $file; $base =~ s|[^/]*/||g; for ($i = 0; $i < $size - $chunksize; $i += $chunksize) { printf "%d R %016x %08x %s\n", rand(999999), $i, $chunksize, $base; } }' data/*.* | sort -n | awk '{print $2, $3, $4, $5}' > /tmp/input.txt
%% {key_generator, {file_line_bin, "/tmp/input.txt"}}.
%% {value_generator_source_size, 1}.
%% {operations, [{read_raw_line_local, 1}]}.
%% {key_generator, {file_line_bin, "/tmp/input.txt"}}.
%% {value_generator_source_size, 1}.
%% %{operations, [{cc_read_raw_line_local, 1}]}.
%% {operations, [{read_raw_line_local, 1}, {cc_read_raw_line_local, 1}]}.

File diff suppressed because it is too large Load diff

View file

@ -24,6 +24,7 @@
-module(file0_1file_write_redundant_client).
-compile(export_all).
-mode(compile).
-define(NO_MODULE, true).
-include("./file0.erl").

View file

@ -0,0 +1,32 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_cc_1file_write_redundant_client).
-compile(export_all).
-define(NO_MODULE, true).
-include("./file0.erl").
main(Args) ->
main2(["cc-1file-write-redundant-client" | Args]).

View file

@ -0,0 +1,176 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell -pz .
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_cc_ec_encode).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
-include_lib("kernel/include/file.hrl").
-define(ENCODER, "./jerasure.`uname -s`/bin/enc-dec-wrapper.sh encoder").
-define(ENCODER_RS_10_4_ARGS, "10 4 cauchy_good 8 10240 102400").
main([]) ->
io:format("Use: Erasure code a file and store all sub-chunks to a different projection (specifically for EC use)\n"),
io:format("Args: RegularProjection LocalDataPath EncodingTmpDir EC_Projection [verbose | check | repair | progress | delete-source | nodelete-tmp ]\n"),
erlang:halt(1);
main([RegularProjection, LocalDataPath, TmpDir0, EcProjection|Args]) ->
Ps = make_repair_props(Args),
TmpDir = TmpDir0 ++ "/mytmp." ++ os:getpid(),
%% Hash on entire file name, *not* prefix
FileStr = filename:basename(LocalDataPath),
{_Chain, StripeMembers0} = calc_chain(write, EcProjection, FileStr),
StripeMembers = [{binary_to_list(Host), integer_to_list(Port)} ||
{Host, Port} <- StripeMembers0],
verb("Stripe members: ~p\n", [StripeMembers]),
os:cmd("rm -rf " ++ TmpDir),
filelib:ensure_dir(TmpDir ++ "/unused"),
try
{Ks, Ms, Summary} = encode_rs_10_4(TmpDir, LocalDataPath),
verb("Work directory: ~s\n", [TmpDir]),
verb("Summary header:\n~s\n", [iolist_to_binary(Summary)]),
verb("Data files:\n"),
[verb(" ~s\n", [F]) || F <- Ks],
verb("Parity files:\n"),
[verb(" ~s\n", [F]) || F <- Ms],
case proplists:get_value(mode, Ps) of
repair ->
verb("Writing stripes to remote servers: "),
ok = write_rs_10_4_stripes(lists:zip(Ks ++ Ms, StripeMembers),
Ps),
verb("done\n"),
File = list_to_binary(filename:basename(LocalDataPath)),
ok = write_ec_summary(RegularProjection, File, Summary, Ps);
_ ->
ok
end
after
case proplists:get_value(delete_tmp, Ps) of
false ->
io:format("Not deleting data in dir ~s\n",
[TmpDir]);
_ ->
os:cmd("rm -rf " ++ TmpDir)
end
end,
erlang:halt(0).
write_rs_10_4_stripes([], _Ps) ->
ok;
write_rs_10_4_stripes([{Path, {Host, PortStr}}|T], Ps) ->
verb("."),
Sock = escript_connect(Host, PortStr),
try
File = list_to_binary(filename:basename(Path)),
ok = write_stripe_file(Path, File, Sock),
write_rs_10_4_stripes(T, Ps)
after
ok = gen_tcp:close(Sock)
end.
write_stripe_file(Path, File, Sock) ->
{ok, FH} = file:open(Path, [raw, binary, read]),
try
ok = write_stripe_file(0, file:read(FH, 1024*1024),
FH, Path, File, Sock)
after
file:close(FH)
end.
write_stripe_file(Offset, {ok, Chunk}, FH, Path, File, Sock) ->
%% OffsetHex = bin_to_hexstr(<<Offset:64/big>>),
%% Len = byte_size(Chunk),
%% LenHex = bin_to_hexstr(<<Len:32/big>>),
{_,_,_} = upload_chunk_write(Sock, Offset, File, Chunk),
write_stripe_file(Offset + byte_size(Chunk), file:read(FH, 1024*1024),
FH, Path, File, Sock);
write_stripe_file(_Offset, eof, _FH, _Path, _File, _Sock) ->
ok.
encode_rs_10_4(TmpDir, LocalDataPath) ->
Cmd = ?ENCODER ++ " " ++ TmpDir ++ " " ++ LocalDataPath ++ " " ++
?ENCODER_RS_10_4_ARGS,
case os:cmd(Cmd) of
"0\n" ->
Ks = [TmpDir ++ "/" ++ X || X <- filelib:wildcard("*_k??", TmpDir)],
Ms = [TmpDir ++ "/" ++ X || X <- filelib:wildcard("*_m??", TmpDir)],
[Meta] = filelib:wildcard("*_meta.txt", TmpDir),
{ok, MetaBin} = file:read_file(TmpDir ++ "/" ++ Meta),
Summary = make_ec_summary(rs_10_4_v1, MetaBin, Ks, Ms),
{Ks, Ms, Summary};
Else ->
io:format("XX ~s\n", [Else]),
{error, Else}
end.
make_ec_summary(rs_10_4_v1, MetaBin, Ks, _Ms) ->
[Meta1Path, OrigFileLenStr|Metas] =
string:tokens(binary_to_list(MetaBin), [10]),
Meta1 = filename:basename(Meta1Path),
OrigFileLen = list_to_integer(OrigFileLenStr),
{ok, FI} = file:read_file_info(hd(Ks)),
StripeWidth = FI#file_info.size,
Body = iolist_to_binary([string:join([Meta1,OrigFileLenStr|Metas], "\n"),
"\n"]),
BodyLen = byte_size(Body),
Hdr1 = iolist_to_binary(["a ", bin_to_hexstr(<<BodyLen:16/big>>),
" ",
bin_to_hexstr(<<StripeWidth:64/big>>),
" ",
bin_to_hexstr(<<OrigFileLen:64/big>>),
" ",
"rs_10_4_v1"
]),
SpacePadding = 80 - 1 - byte_size(Hdr1),
Hdr2 = lists:duplicate(SpacePadding, 32),
[Hdr1, Hdr2, 10, Body].
write_ec_summary(ProjectionPath, File, Summary, _Ps) ->
Prefix = re:replace(File, "\\..*", "", [{return, binary}]),
{_Chain, RawHPs} = calc_chain(write, ProjectionPath, Prefix),
HP_tuples = [{binary_to_list(Host), integer_to_list(Port)} ||
{Host, Port} <- RawHPs],
verb("Writing coding summary records: "),
[begin
verb("<"),
Sock = escript_connect(Host, Port),
try
{_,_,_} = upload_chunk_write(Sock, 0, File,
iolist_to_binary(Summary)),
ok = gen_tcp:send(Sock, [<<"TRUNC-hack--- ">>, File, <<"\n">>]),
inet:setopts(Sock, [packet, line]),
{ok, <<"OK\n">>} = gen_tcp:recv(Sock, 0)
after
verb(">"),
gen_tcp:close(Sock)
end
end || {Host, Port} <- HP_tuples],
verb(" done\n"),
ok.

View file

@ -0,0 +1,56 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell -pz .
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_cc_make_projection).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
%% LastProjFile :: "new" : "/path/to/projection/file.proj"
%% NewWeightMapFile :: "/path/to/weight/map/file.weight"
%% NewProjectionPath :: Output file path, ".proj" suffix is recommended
main([]) ->
io:format("Use: Make a projection description file.\n"),
io:format("Args: 'new'|ProjectionPath File.weight NewProjectionPath\n"),
erlang:halt(1);
main([LastProjPath, NewWeightMapPath, NewProjectionPath]) ->
LastP = read_projection_file(LastProjPath),
LastFloatMap = get_float_map(LastP),
NewWeightMap = read_weight_map_file(NewWeightMapPath),
io:format("LastFloatMap: ~p\n", [LastFloatMap]),
NewFloatMap = if LastFloatMap == undefined ->
szone_chash:make_float_map(NewWeightMap);
true ->
szone_chash:make_float_map(LastFloatMap, NewWeightMap)
end,
io:format("NewFloatMap: ~p\n", [NewFloatMap]),
NewP = #projection{epoch=LastP#projection.epoch + 1,
last_epoch=LastP#projection.epoch,
float_map=NewFloatMap,
last_float_map=LastFloatMap},
ok = write_projection(NewP, NewProjectionPath).

View file

@ -0,0 +1,54 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell -pz .
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_cc_map_prefix).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
%% Map a prefix into a projection.
%% Return a list of host/ip pairs, one per line. If migrating,
%% and if the current & last projections do not match, then the
%% servers for new, old, & new will be given.
%% The # of output lines can be limited with an optional 3rd arg.
main([]) ->
io:format("Use: Map a file prefix to a chain via projection.\n"),
io:format("Args: ProjectionPath Prefix [MaxReturnResults]\n"),
erlang:halt(1);
main([ProjectionPathOrDir, Prefix]) ->
main([ProjectionPathOrDir, Prefix, "3"]);
main([ProjectionPathOrDir, Prefix, MaxNumStr]) ->
P = read_projection_file(ProjectionPathOrDir),
Chains = lists:sublist(hash_and_query(Prefix, P),
list_to_integer(MaxNumStr)),
ChainMap = read_chain_map_file(ProjectionPathOrDir),
[io:format("~s ~s ~w\n", [Chain, Host, Port]) ||
Chain <- Chains,
{Host, Port} <- orddict:fetch(Chain, ChainMap)
].

View file

@ -0,0 +1,100 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_cc_migrate_files).
-compile(export_all).
-mode(compile).
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: Migrate files from old chain to new chain via projection.\n"),
io:format("Args: ProjectionPath Host Port [ verbose check repair delete-source noverbose nodelete-source ]\n"),
erlang:halt(1);
main([ProjectionPathOrDir, Host, PortStr|Args]) ->
Ps = make_repair_props(Args),
P = read_projection_file(ProjectionPathOrDir),
ChainMap = read_chain_map_file(ProjectionPathOrDir),
SrcS = escript_connect(Host, PortStr),
SrcS2 = escript_connect(Host, PortStr),
Items = escript_list2(SrcS, fun(_) -> ok end),
process_files(Items, SrcS, SrcS2, P, ChainMap, Ps).
process_files([], _SrcS, _SrcS2, _P, _ChainMap, _Ps) ->
verb("Done\n"),
ok;
process_files([Line|Items], SrcS, SrcS2, P, ChainMap, Ps) ->
FileLen = byte_size(Line) - 16 - 1 - 1,
<<SizeHex:16/binary, " ", File:FileLen/binary, _/binary>> = Line,
Size = binary_to_integer(SizeHex, 16),
Prefix = re:replace(File, "\\..*", "", [{return, binary}]),
%% verb(Ps, "File ~s, prefix ~s\n", [File, Prefix]),
verb("File ~s\n", [File]),
verb(" ~s MBytes, ", [mbytes(Size)]),
case calc_chain(read, P, ChainMap, Prefix) of
{[OldChain], _} ->
verb("remain in ~s\n", [OldChain]);
{[NewChain,OldChain|_], _} ->
verb("move ~s -> ~s\n", [OldChain, NewChain]),
case proplists:get_value(mode, Ps) of
repair ->
DestRawHPs = orddict:fetch(NewChain, ChainMap),
ok = migrate_a_file(SrcS, SrcS2, File, DestRawHPs, Ps),
case proplists:get_value(delete_source, Ps) of
true ->
verb(" delete source ~s\n", [File]),
ok = escript_delete(SrcS, File),
ok;
_ ->
verb(" skipping delete of source ~s\n", [File])
end;
_ ->
ok
end
end,
process_files(Items, SrcS, SrcS2, P, ChainMap, Ps).
migrate_a_file(SrcS, SrcS2, File, DestRawHPs, Ps) ->
SrcName = "hack-src",
DstName = "hack-dst",
[begin
[HostStr, PortStr] = convert_raw_hps([RawHP]),
DstS = get_cached_sock(HostStr, PortStr),
X = escript_compare_servers(SrcS, DstS,
SrcName, DstName,
fun(FName) when FName == File -> true;
(_) -> false end,
[null]),
CheckV = proplists:get_value(mode, Ps, check),
VerboseV = proplists:get_value(verbose, Ps, t),
[ok = repair(File, Size, MissingList,
CheckV, VerboseV,
SrcS, SrcS2, DstS, dst2_unused, SrcName) ||
{_File, {Size, MissingList}} <- X]
end || RawHP <- DestRawHPs],
ok.

View file

@ -0,0 +1,33 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_read_client).
-compile(export_all).
-mode(compile).
-define(NO_MODULE, true).
-include("./file0.erl").
main(Args) ->
main2(["cc-chunk-read-client" | Args]).

View file

@ -0,0 +1,37 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_checksum_list).
-compile(export_all).
-define(NO_MODULE, true).
-include("./file0.erl").
main(["line-by-line"|Args]) ->
%% This is merely a demo to show the cost of line-by-line I/O.
%% For a checksum list of 90K lines, line-by-line takes about 3 seconds.
%% Bulk I/O (used by the following clause) takes about 0.2 seconds.
main2(["checksum-list-client-line-by-line" | Args]);
main(Args) ->
main2(["checksum-list-client" | Args]).

View file

@ -0,0 +1,41 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_compare_servers).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: Compare file lists on two servers and calculate missing files.\n"),
io:format("Args: Host1, Port1, Host2, Port2 [<no arg> | 'null' | OutputPath]\n"),
erlang:halt(1);
main([Host1, PortStr1, Host2, PortStr2|Args]) ->
Sock1 = escript_connect(Host1, PortStr1),
Sock2 = escript_connect(Host2, PortStr2),
escript_compare_servers(Sock1, Sock2, {Host1, PortStr1}, {Host2, PortStr2},
Args).

View file

@ -0,0 +1,69 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_repair_server).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: Repair a server, *uni-directionally* from source -> destination.\n"),
io:format("Args: SrcHost SrcPort DstHost DstPort [ verbose check repair delete-source noverbose nodelete-source ]\n"),
erlang:halt(1);
main([SrcHost, SrcPortStr, DstHost, DstPortStr|Args]) ->
Src = {SrcHost, SrcPortStr},
SockSrc = escript_connect(SrcHost, SrcPortStr),
%% TODO is SockSrc2 necessary? Is SockDst2 necessary? If not, delete!
SockSrc2 = escript_connect(SrcHost, SrcPortStr),
ok = inet:setopts(SockSrc2, [{packet, raw}]),
SockDst = escript_connect(DstHost, DstPortStr),
SockDst2 = escript_connect(DstHost, DstPortStr),
Ps = make_repair_props(Args),
case proplists:get_value(mode, Ps) of
undefined -> io:format("NOTICE: default mode = check\n"),
timer:sleep(2*1000);
_ -> ok
end,
case proplists:get_value(verbose, Ps) of
true ->
io:format("Date & Time: ~p ~p\n", [date(), time()]),
io:format("Src: ~s ~s\n", [SrcHost, SrcPortStr]),
io:format("Dst: ~s ~s\n", [DstHost, DstPortStr]),
io:format("\n");
_ ->
ok
end,
%% Dst = {DstHost, DstPortStr},
X = escript_compare_servers(SockSrc, SockDst,
{SrcHost, SrcPortStr}, {DstHost, DstPortStr},
fun(_FileName) -> true end,
[null]),
[repair(File, Size, MissingList,
proplists:get_value(mode, Ps, check),
proplists:get_value(verbose, Ps, t),
SockSrc, SockSrc2, SockDst, SockDst2, Src) ||
{File, {Size, MissingList}} <- X].

View file

@ -0,0 +1,33 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 253 -smp enable -noinput -noshell -detached
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_server).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
main(Args) ->
main2(["server" | Args]).

View file

@ -0,0 +1,43 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_start_servers).
-compile(export_all).
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: Demonstrate the commands required to start all servers on a host.\n"),
io:format("Args: ServerMapPath Host\n"),
erlang:halt(1);
main([ServerMapPath, HostStr]) ->
Host = list_to_binary(HostStr),
{ok, Map} = file:consult(ServerMapPath),
io:format("Run the following commands to start all servers:\n\n"),
[begin
DataDir = proplists:get_value(data_dir, Ps),
io:format(" file0_server.escript file0_server ~w ~s\n",
[Port, DataDir])
end || {{HostX, Port}, Ps} <- Map, HostX == Host].

View file

@ -0,0 +1,107 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_test).
-compile(export_all).
-mode(compile).
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: unit test script.\n"),
Port1 = "7061",
Port2 = "7062",
Dir1 = "/tmp/file0_test1",
Dir2 = "/tmp/file0_test2",
Pfx1 = "testing-prefix1",
Pfx2 = "testing-prefix2",
application:set_env(kernel, verbose, false),
os:cmd("rm -rf " ++ Dir1),
ok = start_server(["file0_test", Port1, Dir1]),
os:cmd("rm -rf " ++ Dir2),
ok = start_server(["server2", Port2, Dir2]),
timer:sleep(250),
%% Pattern match assumes that /etc/hosts exists and is at least 11 bytes.
[_,_|_] = Chunks1 =
test_file_write_client(["localhost", Port1, "5", Pfx1, "/etc/hosts", "/dev/null"]),
io:format("write: pass\n"),
[_,_|_] = test_chunk_read_client(["localhost", Port1], Chunks1),
[{error,_}|_] = (catch test_chunk_read_client(["localhost", Port2], Chunks1)),
io:format("read: pass\n"),
[_] = test_list_client(["localhost", Port1, "/dev/null"]),
[] = test_list_client(["localhost", Port2, "/dev/null"]),
io:format("list: pass\n"),
%% Pattern match assumes that /etc/hosts exists and is at least 11 bytes.
[_,_,_|_] = _Chunks2 =
test_1file_write_redundant_client(
["5", Pfx2, "/etc/hosts", "silent", "localhost", Port1, "localhost", Port2]),
[_,_] = test_list_client(["localhost", Port1, "/dev/null"]),
[_] = test_list_client(["localhost", Port2, "/dev/null"]),
io:format("write-redundant: pass\n"),
[PoorChunk] = test_list_client(["localhost", Port2, "/dev/null"]),
PoorFile = re:replace(PoorChunk, ".* ", "", [{return,binary}]),
ok = test_delete_client(["localhost", Port2, PoorFile]),
error = test_delete_client(["localhost", Port2, PoorFile]),
io:format("delete: pass\n"),
ok.
start_server([_RegNameStr,_PortStr,_Dir] = Args) ->
spawn(fun() -> main2(["server" | Args]) end),
ok.
test_file_write_client([_Host,_PortStr,_ChunkSizeStr,_PrefixStr,_LocalPath,_Output] = Args) ->
main2(["file-write-client" | Args]).
test_1file_write_redundant_client([_ChunkSizeStr,_PrefixStr,_LocalPath|_] = Args) ->
main2(["1file-write-redundant-client" | Args]).
test_chunk_read_client([_Host,_PortStr] = Args, Chunks) ->
ChunkFile = "/tmp/chunkfile." ++ os:getpid(),
{ok, FH} = file:open(ChunkFile, [write]),
[begin
OffsetHex = bin_to_hexstr(<<Offset:64/big>>),
SizeHex = list_to_binary(bin_to_hexstr(<<Size:32/big>>)),
io:format(FH, "~s ~s ~s\n", [OffsetHex, SizeHex, File])
end || {Offset, Size, File} <- Chunks],
file:close(FH),
try
main2(["chunk-read-client" | Args] ++ [ChunkFile, "/dev/null"])
after
file:delete(ChunkFile)
end.
test_list_client([_Host,_PortStr,_OutputFile] = Args) ->
main2(["list-client" | Args]).
test_delete_client([_Host,_PortStr,_File] = Args) ->
main2(["delete-client" | Args]).

View file

@ -0,0 +1,83 @@
#!/usr/bin/env escript
%% -*- erlang -*-
%%! +A 0 -smp disable -noinput -noshell
%% -------------------------------------------------------------------
%%
%% Copyright (c) 2007-2014 Basho Technologies, Inc. All Rights Reserved.
%%
%% This file is provided to you under the Apache License,
%% Version 2.0 (the "License"); you may not use this file
%% except in compliance with the License. You may obtain
%% a copy of the License at
%%
%% http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing,
%% software distributed under the License is distributed on an
%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
%% KIND, either express or implied. See the License for the
%% specific language governing permissions and limitations
%% under the License.
%%
%% -------------------------------------------------------------------
-module(file0_server).
-compile(export_all).
-mode(compile). % for escript use
-define(NO_MODULE, true).
-include("./file0.erl").
main([]) ->
io:format("Use: Verify all chunk checksums of all files on a server.\n"),
io:format("Args: Host Port File\n"),
erlang:halt(1);
main([Host, PortStr, File]) ->
Sock1 = escript_connect(Host, PortStr),
Sock2 = escript_connect(Host, PortStr),
TmpFile = "/tmp/verify-checksums." ++ os:getpid(),
try
check_checksums(Sock1, Sock2, File, TmpFile)
after
_ = (catch file:delete(TmpFile))
end.
check_checksums(Sock1, Sock2, File, _TmpFile) ->
FileBin = list_to_binary(File),
put(count, 0),
Proc = fun(<<Offset:16/binary, _:1/binary, Len:8/binary, _:1/binary,
CSumHex:32/binary, "\n">>) ->
Where = <<Offset/binary, " ", Len/binary, " ",
FileBin/binary, "\n">>,
ChunkProc =
fun(Chunk2) when is_binary(Chunk2) ->
CSum = hexstr_to_bin(CSumHex),
CSum2 = checksum(Chunk2),
if CSum == CSum2 ->
put(count, get(count) + 1),
ok; %% io:format(".");
true ->
CSumNow = bin_to_hexstr(CSum2),
put(status, failed),
io:format("~s ~s ~s CHECKSUM-ERROR ~s ~s\n",
[Offset, Len, File,
CSumHex, CSumNow])
end;
(_Else) ->
ok % io:format("Chunk ~P\n", [Else, 10])
end,
escript_download_chunks(Sock2, {{{Where}}}, ChunkProc);
(<<"OK", _/binary>>) ->
ok; %% io:format("top:");
(<<".\n">>) ->
ok %% io:format("bottom\n")
end,
escript_checksum_list(Sock1, FileBin, line_by_line, Proc),
case get(status) of
ok ->
io:format("OK, ~w chunks are good\n", [get(count)]),
erlang:halt();
_ ->
erlang:halt(1)
end.

View file

@ -0,0 +1,20 @@
#!/bin/sh
ENCDEC=$1 # Should be 'encoder' or 'decoder'
shift
WORK_DIR=$1
shift
case $0 in
*/*)
MY_DIR=`dirname $0`
;;
*)
TMP=`which $0`
MY_DIR=`dirname $TMP`
;;
esac
cd $MY_DIR
env CODING_DIR=$WORK_DIR ./$ENCDEC.bin "$@" > /dev/null 2>&1
echo $?

View file

@ -0,0 +1,376 @@
%%%-------------------------------------------------------------------
%%% Copyright (c) 2007-2011 Gemini Mobile Technologies, Inc. All rights reserved.
%%% Copyright (c) 2013-2014 Basho Technologies, Inc. All rights reserved.
%%%
%%% Licensed under the Apache License, Version 2.0 (the "License");
%%% you may not use this file except in compliance with the License.
%%% You may obtain a copy of the License at
%%%
%%% http://www.apache.org/licenses/LICENSE-2.0
%%%
%%% Unless required by applicable law or agreed to in writing, software
%%% distributed under the License is distributed on an "AS IS" BASIS,
%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
%%% See the License for the specific language governing permissions and
%%% limitations under the License.
%%%
%%%-------------------------------------------------------------------
-module(szone_chash).
-define(SMALLEST_SIGNIFICANT_FLOAT_SIZE, 0.1e-12).
%% -compile(export_all).
-export([make_float_map/1, make_float_map/2,
sum_map_weights/1,
make_tree/1,
make_tree/2,
query_tree/2,
hash_string/2,
pretty_with_integers/2,
pretty_with_integers/3]).
-export([make_demo_map1/0, make_demo_map2/0]).
-type map_name() :: term().
-type float_map() :: [{map_name(), float()}].
-export_type([float_map/0]).
make_float_map(NewChainWeights) ->
make_float_map([], NewChainWeights).
make_float_map([], NewChainWeights) ->
Sum = add_all_weights(NewChainWeights),
DiffMap = [{Ch, Wt/Sum} || {Ch, Wt} <- NewChainWeights],
make_float_map2([{unused, 1.0}], DiffMap, NewChainWeights);
make_float_map(OldFloatMap, NewChainWeights) ->
NewSum = add_all_weights(NewChainWeights),
%% Normalize to unit interval
%% NewChainWeights2 = [{Ch, Wt / NewSum} || {Ch, Wt} <- NewChainWeights],
%% Reconstruct old chain weights (will be normalized to unit interval)
SumOldFloatsDict =
lists:foldl(fun({Ch, Wt}, OrdDict) ->
orddict:update_counter(Ch, Wt, OrdDict)
end, orddict:new(), OldFloatMap),
OldChainWeights = orddict:to_list(SumOldFloatsDict),
OldSum = add_all_weights(OldChainWeights),
OldChs = [Ch || {Ch, _} <- OldChainWeights],
NewChs = [Ch || {Ch, _} <- NewChainWeights],
OldChsOnly = OldChs -- NewChs,
%% Mark any space in by a deleted chain as unused.
OldFloatMap2 = lists:map(
fun({Ch, Wt} = ChWt) ->
case lists:member(Ch, OldChsOnly) of
true ->
{unused, Wt};
false ->
ChWt
end
end, OldFloatMap),
%% Create a diff map of changing chains and added chains
DiffMap = lists:map(fun({Ch, NewWt}) ->
case orddict:find(Ch, SumOldFloatsDict) of
{ok, OldWt} ->
{Ch, (NewWt / NewSum) -
(OldWt / OldSum)};
error ->
{Ch, NewWt / NewSum}
end
end, NewChainWeights),
make_float_map2(OldFloatMap2, DiffMap, NewChainWeights).
make_float_map2(OldFloatMap, DiffMap, _NewChainWeights) ->
FloatMap = apply_diffmap(DiffMap, OldFloatMap),
XX = combine_neighbors(collapse_unused_in_float_map(FloatMap)),
XX.
apply_diffmap(DiffMap, FloatMap) ->
SubtractDiff = [{Ch, abs(Diff)} || {Ch, Diff} <- DiffMap, Diff < 0],
AddDiff = [D || {_Ch, Diff} = D <- DiffMap, Diff > 0],
TmpFloatMap = iter_diffmap_subtract(SubtractDiff, FloatMap),
iter_diffmap_add(AddDiff, TmpFloatMap).
add_all_weights(ChainWeights) ->
lists:foldl(fun({_Ch, Weight}, Sum) -> Sum + Weight end, 0.0, ChainWeights).
iter_diffmap_subtract([{Ch, Diff}|T], FloatMap) ->
iter_diffmap_subtract(T, apply_diffmap_subtract(Ch, Diff, FloatMap));
iter_diffmap_subtract([], FloatMap) ->
FloatMap.
iter_diffmap_add([{Ch, Diff}|T], FloatMap) ->
iter_diffmap_add(T, apply_diffmap_add(Ch, Diff, FloatMap));
iter_diffmap_add([], FloatMap) ->
FloatMap.
apply_diffmap_subtract(Ch, Diff, [{Ch, Wt}|T]) ->
if Wt == Diff ->
[{unused, Wt}|T];
Wt > Diff ->
[{Ch, Wt - Diff}, {unused, Diff}|T];
Wt < Diff ->
[{unused, Wt}|apply_diffmap_subtract(Ch, Diff - Wt, T)]
end;
apply_diffmap_subtract(Ch, Diff, [H|T]) ->
[H|apply_diffmap_subtract(Ch, Diff, T)];
apply_diffmap_subtract(_Ch, _Diff, []) ->
[].
apply_diffmap_add(Ch, Diff, [{unused, Wt}|T]) ->
if Wt == Diff ->
[{Ch, Wt}|T];
Wt > Diff ->
[{Ch, Diff}, {unused, Wt - Diff}|T];
Wt < Diff ->
[{Ch, Wt}|apply_diffmap_add(Ch, Diff - Wt, T)]
end;
apply_diffmap_add(Ch, Diff, [H|T]) ->
[H|apply_diffmap_add(Ch, Diff, T)];
apply_diffmap_add(_Ch, _Diff, []) ->
[].
combine_neighbors([{Ch, Wt1}, {Ch, Wt2}|T]) ->
combine_neighbors([{Ch, Wt1 + Wt2}|T]);
combine_neighbors([H|T]) ->
[H|combine_neighbors(T)];
combine_neighbors([]) ->
[].
collapse_unused_in_float_map([{Ch, Wt1}, {unused, Wt2}|T]) ->
collapse_unused_in_float_map([{Ch, Wt1 + Wt2}|T]);
collapse_unused_in_float_map([{unused, _}] = L) ->
L; % Degenerate case only
collapse_unused_in_float_map([H|T]) ->
[H|collapse_unused_in_float_map(T)];
collapse_unused_in_float_map([]) ->
[].
chash_float_map_to_nextfloat_list(FloatMap) when length(FloatMap) > 0 ->
%% QuickCheck found a bug ... need to weed out stuff smaller than
%% ?SMALLEST_SIGNIFICANT_FLOAT_SIZE here.
FM1 = [P || {_X, Y} = P <- FloatMap, Y > ?SMALLEST_SIGNIFICANT_FLOAT_SIZE],
{_Sum, NFs0} = lists:foldl(fun({Name, Amount}, {Sum, List}) ->
{Sum+Amount, [{Sum+Amount, Name}|List]}
end, {0, []}, FM1),
lists:reverse(NFs0).
chash_nextfloat_list_to_gb_tree([]) ->
gb_trees:balance(gb_trees:from_orddict([]));
chash_nextfloat_list_to_gb_tree(NextFloatList) ->
{_FloatPos, Name} = lists:last(NextFloatList),
%% QuickCheck found a bug ... it really helps to add a catch-all item
%% at the far "right" of the list ... 42.0 is much greater than 1.0.
NFs = NextFloatList ++ [{42.0, Name}],
gb_trees:balance(gb_trees:from_orddict(orddict:from_list(NFs))).
chash_gb_next(X, {_, GbTree}) ->
chash_gb_next1(X, GbTree).
chash_gb_next1(X, {Key, Val, Left, _Right}) when X < Key ->
case chash_gb_next1(X, Left) of
nil ->
{Key, Val};
Res ->
Res
end;
chash_gb_next1(X, {Key, _Val, _Left, Right}) when X >= Key ->
chash_gb_next1(X, Right);
chash_gb_next1(_X, nil) ->
nil.
%% @type float_map() = list({brick(), float()}). A float_map is a
%% definition of brick assignments over the unit interval [0.0, 1.0].
%% The sum of all floats must be 1.0.
%% For example, [{{br1, nd1}, 0.25}, {{br2, nd1}, 0.5}, {{br3, nd1}, 0.25}].
%% @type nextfloat_list() = list({float(), brick()}). A nextfloat_list
%% differs from a float_map in two respects: 1) nextfloat_list contains
%% tuples with the brick name in 2nd position, 2) the float() at each
%% position I_n > I_m, for all n, m such that n > m.
%% For example, a nextfloat_list of the float_map example above,
%% [{0.25, {br1, nd1}}, {0.75, {br2, nd1}}, {1.0, {br3, nd1}].
%% @doc Not used directly, but can give a developer an idea of how well
%% chash_float_map_to_nextfloat_list will do for a given value of Max.
%%
%% For example:
%% <verbatim>
%% NewFloatMap = make_float_map([{unused, 1.0}],
%% [{a,100}, {b, 100}, {c, 10}]),
%% ChashMap = chash_scale_to_int_interval(NewFloatMap, 100),
%% io:format("QQQ: int int = ~p\n", [ChashIntInterval]),
%% -> [{a,1,47},{b,48,94},{c,94,100}]
%% </verbatim>
%%
%% Interpretation: out of the 100 slots:
%% <ul>
%% <li> 'a' uses the slots 1-47 </li>
%% <li> 'b' uses the slots 48-94 </li>
%% <li> 'c' uses the slots 95-100 </li>
%% </ul>
chash_scale_to_int_interval(NewFloatMap, Max) ->
chash_scale_to_int_interval(NewFloatMap, 0, Max).
chash_scale_to_int_interval([{Ch, _Wt}], Cur, Max) ->
[{Ch, Cur, Max}];
chash_scale_to_int_interval([{Ch, Wt}|T], Cur, Max) ->
Int = trunc(Wt * Max),
[{Ch, Cur + 1, Cur + Int}|chash_scale_to_int_interval(T, Cur + Int, Max)].
%%%%%%%%%%%%%
pretty_with_integers(Map, Scale) ->
chash_scale_to_int_interval(Map, Scale).
pretty_with_integers(OldWeights, NewWeights, Scale) ->
chash_scale_to_int_interval(
make_float_map(make_float_map(OldWeights),
NewWeights),
Scale).
make_tree(Map) ->
chash_nextfloat_list_to_gb_tree(
chash_float_map_to_nextfloat_list(Map)).
make_tree(OldWeights, NewWeights) ->
chash_nextfloat_list_to_gb_tree(
chash_float_map_to_nextfloat_list(
make_float_map(make_float_map(OldWeights), NewWeights))).
query_tree(Val, Tree) when is_float(Val), 0.0 =< Val, Val =< 1.0 ->
chash_gb_next(Val, Tree).
make_demo_map1() ->
{_, Res} = make_demo_map1_i(),
Res.
make_demo_map1_i() ->
Fail1 = {b, 100},
L1 = [{a, 100}, Fail1, {c, 100}],
L2 = L1 ++ [{d, 100}, {e, 100}],
L3 = L2 -- [Fail1],
L4 = L3 ++ [{giant, 300}],
{L4, lists:foldl(fun(New, Old) -> make_float_map(Old, New) end,
make_float_map(L1), [L2, L3, L4])}.
make_demo_map2() ->
{L0, _} = make_demo_map1_i(),
L1 = L0 ++ [{h, 100}],
L2 = L1 ++ [{i, 100}],
L3 = L2 ++ [{j, 100}],
lists:foldl(fun(New, Old) -> make_float_map(Old, New) end,
make_demo_map1(), [L1, L2, L3]).
sum_map_weights(Map) ->
L = sum_map_weights(lists:sort(Map), undefined, 0.0) -- [{undefined,0.0}],
WeightSum = lists:sum([Weight || {_, Weight} <- L]),
{{per_szone, L}, {weight_sum, WeightSum}}.
sum_map_weights([{SZ, Weight}|T], SZ, SZ_total) ->
sum_map_weights(T, SZ, SZ_total + Weight);
sum_map_weights([{SZ, Weight}|T], LastSZ, LastSZ_total) ->
[{LastSZ, LastSZ_total}|sum_map_weights(T, SZ, Weight)];
sum_map_weights([], LastSZ, LastSZ_total) ->
[{LastSZ, LastSZ_total}].
-define(SHA_MAX, (1 bsl (20*8))).
%% Optimization is certainly possible here.
hash_string(Key, Map) ->
catch crypto:start(),
Tree = make_tree(Map),
<<Int:(20*8)/unsigned>> = crypto:hash(sha, Key),
Float = Int / ?SHA_MAX,
query_tree(Float, Tree).
%%%%% Examples
%% %% Make a map. See the code for make_demo_map1() for the order of
%% %% additions & deletions. Here's a brief summary of the 4 steps.
%% %%
%% %% * 'a' through 'e' are weighted @ 100.
%% %% * 'giant' is weighted @ 300.
%% %% * 'b' is removed at step #3.
%% 40> M1 = szone_chash:make_demo_map1().
%% [{a,0.09285714285714286},
%% {giant,0.10714285714285715},
%% {d,0.026190476190476153},
%% {giant,0.10714285714285715},
%% {a,0.04999999999999999},
%% {giant,0.04999999999999999},
%% {d,0.04999999999999999},
%% {giant,0.050000000000000044},
%% {d,0.06666666666666671},
%% {e,0.009523809523809434},
%% {giant,0.05714285714285716},
%% {c,0.14285714285714285},
%% {giant,0.05714285714285716},
%% {e,0.13333333333333341}]
%% %% Map M1 onto the interval of integers 0-10,1000
%% %%
%% %% output = list({SZ_name::term(), Start::integer(), End::integer()})
%% 41> szone_chash:pretty_with_integers(M1, 10*1000).
%% [{a,1,928},
%% {giant,929,1999},
%% {d,2000,2260},
%% {giant,2261,3331},
%% {a,3332,3830},
%% {giant,3831,4329},
%% {d,4330,4828},
%% {giant,4829,5328},
%% {d,5329,5994},
%% {e,5995,6089},
%% {giant,6090,6660},
%% {c,6661,8088},
%% {giant,8089,8659},
%% {e,8659,10000}]
%% %% Sum up all of the weights, make sure it's what we expect:
%% 55> szone_chash:sum_map_weights(M1).
%% {{per_szone,[{a,0.14285714285714285},
%% {c,0.14285714285714285},
%% {d,0.14285714285714285},
%% {e,0.14285714285714285},
%% {giant,0.42857142857142866}]},
%% {weight_sum,1.0}}
%% %% Make a tree, then query it
%% %% (Hash::float(), tree()) -> {NextLargestBoundary::float(), szone()}
%% 58> T1 = szone_chash:make_tree(M1).
%% 59> szone_chash:query_tree(0.2555, T1).
%% {0.3333333333333333,giant}
%% 60> szone_chash:query_tree(0.3555, T1).
%% {0.3833333333333333,a}
%% 61> szone_chash:query_tree(0.4555, T1).
%% {0.4833333333333333,d}
%% %% How about hashing a bunch of strings and see what happens?
%% 74> Key1 = "Hello, world!".
%% "Hello, world!"
%% 75> [{K, element(2, szone_chash:hash_string(K, M1))} || K <- [lists:sublist(Key1, X) || X <- lists:seq(1, length(Key1))]].
%% [{"H",giant},
%% {"He",giant},
%% {"Hel",giant},
%% {"Hell",e},
%% {"Hello",e},
%% {"Hello,",giant},
%% {"Hello, ",e},
%% {"Hello, w",e},
%% {"Hello, wo",giant},
%% {"Hello, wor",d},
%% {"Hello, worl",giant},
%% {"Hello, world",e},
%% {"Hello, world!",d}]