Compare commits

..

No commits in common. "master" and "gsb-async-nif" have entirely different histories.

27 changed files with 4502 additions and 8072 deletions

8
.gitignore vendored
View file

@ -1,8 +0,0 @@
*.beam
.eunit
ebin
c_src/*.o
deps/
priv/
*~
.rebar

8
3rdLibs Normal file
View file

@ -0,0 +1,8 @@
o MDB: http://highlandsun.com/hyc/mdb/
source : from OpenLDAP git repository
commit : 7333b6bdc90c49c332d4728c62c7904e64c95077
license: OpenLDAP Public License
o UTHash: http://uthash.sourceforge.net/
version: 1.9.6
license: BSD revised

View file

@ -1,4 +1,6 @@
MODULE = lmdb
# See LICENSE for licensing information.
MODULE = emdb
DIALYZER = dialyzer
REBAR = rebar
@ -54,10 +56,10 @@ compile-for-triq:
@$(REBAR) -D QC -D QC_TRIQ compile triq compile_only=true
plt: compile
@$(DIALYZER) --build_plt --output_plt .$(TARGET).plt -pa --apps kernel stdlib
@$(DIALYZER) --build_plt --output_plt .$(TARGET).plt -pa deps/lager/ebin --apps kernel stdlib
analyze: compile
@$(DIALYZER) --plt .$(TARGET).plt
@$(DIALYZER) --plt .$(TARGET).plt -pa deps/lager/ebin ebin
repl:
@$(ERL) exec erl -pa $PWD/ebin -pa +B
@$(ERL) -pa ebin -pz deps/lager/ebin

130
README.md
View file

@ -1,28 +1,14 @@
An Erlang NIF for LMDB
======================
EMDB
====
This is an Erlang NIF for OpenLDAP's Lightning Memory-Mapped Database (LMDB) database library. LMDB is an ultra-fast, ultra-compact key-value data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space, (it is not limited to the size of physical RAM). LMDB was originally called MDB, but was renamed to avoid confusion with other software associated with the name MDB
EMDB is a NIF library for the [Memory-Mapped Database](http://highlandsun.com/hyc/mdb/) database, aka. MDB.
Quick Overview
--------------
LMDB is a tiny database with some excellent properties:
* Ordered-map interface (keys are always sorted, supports range lookups)
* Reader/writer transactions: readers don't block writers and writers don't block readers. Writers are fully serialized, so writes are always deadlock-free.
* Read transactions are extremely cheap, and can be performed using no mallocs or any other blocking calls.
* Supports multi-thread and multi-process concurrency, environments may be opened by multiple processes on the same host.
* Multiple sub-databases may be created with transactions covering all sub-databases.
* Memory-mapped, allowing for zero-copy lookup and iteration.
* Maintenance-free, no external process or background cleanup/compaction required.
* No application-level caching. LMDB fully exploits the operating system's buffer cache.
* 32KB of object code and 6KLOC of C.
The main purpose of this integration is to provide Erlang programmers access to this excellent, and open source friendly, BTREE implementation.
The main purpose of this package is to provide an Erlang API for this excellent BTREE implementation. Secondly to build an alternative Riak/KV [backend](http://wiki.basho.com/Storage-Backends.html) and Riak's AAE feature based on this. Finally it'd be nice to build an ETS-compatible API (ala "lets" for LevelDB) to ease adoption in other places where DETS is just not up to the task.
Requirements
------------
* Erlang R14B04+
* Clang, GCC 4.2+ or MS VisualStudio 2010+
* GCC 4.2+ or MS VisualStudio 2010+
Build
-----
@ -33,12 +19,14 @@ $ make
API
---
* `open/1`: equivalent to `lmdb:open(DirName, 10485760)`.
* `open/2`: equivalent to `lmdb:open(DirName, 10485760, 0)`.
The following functions were implemented:
* `open/1`: equivalent to `emdb:open(DirName, 10485760)`.
* `open/2`: equivalent to `emdb:open(DirName, 10485760, 0)`.
* `open/3`: creates a new MDB database. This call also re-open an already existing one. Arguments are:
* DirName: database directory name
* MapSize: database map size (see [map.hrl](http://gitorious.org/mdb/mdb/blobs/master/libraries/libmdb/mdb.h))
* EnvFlags: database environment flags (see [map.hrl](http://gitorious.org/mdb/mdb/blobs/master/libraries/libmdb/mdb.h)). The possible values are defined in **lmdb.hrl**.
* EnvFlags: database environment flags (see [map.hrl](http://gitorious.org/mdb/mdb/blobs/master/libraries/libmdb/mdb.h)). The possible values are defined in **emdb.hrl**.
* `close/2`: closes the database
* `put/2`: inserts Key with value Val into the database. Assumes that the key is not present, 'key_exit' is returned otherwise.
* `get/1`: retrieves the value stored with Key in the database.
@ -53,60 +41,64 @@ Usage
```
$ make
$ ./start.sh
%% create a new database
1> {ok, Handle} = lmdb:open("/tmp/lmdb1").
%% create a new database
1> {ok, Handle} = emdb:open("/tmp/emdb1").
%% insert the key <<"a">> with value <<"1">>
2> ok = lmdb:put(Handle, <<"a">>, <<"1">>).
%% insert the key <<"a">> with value <<"1">>
2> ok = emdb:put(Handle, <<"a">>, <<"1">>).
%% try to re-insert the same key <<"a">>
3> key_exist = lmdb:put(Handle, <<"a">>, <<"2">>).
%% try to re-insert the same key <<"a">>
3> key_exist = emdb:put(Handle, <<"a">>, <<"2">>).
%% add a new key-value pair
4> ok = lmdb:put(Handle, <<"b">>, <<"2">>).
%% add a new key-value pair
4> ok = emdb:put(Handle, <<"b">>, <<"2">>).
%% search a non-existing key <<"c">>
5> none = lmdb:get(Handle, <<"c">>).
%% search a non-existing key <<"c">>
5> none = emdb:get(Handle, <<"c">>).
%% retrieve the value for key <<"b">>
6> {ok, <<"2">>} = lmdb:get(Handle, <<"b">>).
%% retrieve the value for key <<"b">>
6> {ok, <<"2">>} = emdb:get(Handle, <<"b">>).
%% retrieve the value for key <<"a">>
7> {ok, <<"1">>} = lmdb:get(Handle, <<"a">>).
%% retrieve the value for key <<"a">>
7> {ok, <<"1">>} = emdb:get(Handle, <<"a">>).
%% delete key <<"b">>
8> ok = lmdb:del(Handle, <<"b">>).
%% delete key <<"b">>
8> ok = emdb:del(Handle, <<"b">>).
%% search a non-existing key <<"b">>
9> none = lmdb:get(Handle, <<"b">>).
%% search a non-existing key <<"b">>
9> none = emdb:get(Handle, <<"b">>).
%% delete a non-existing key <<"z">>
10> none = lmdb:del(Handle, <<"z">>).
%% delete a non-existing key <<"z">>
10> none = emdb:del(Handle, <<"z">>).
%% ensure key <<"a">>'s value is still <<"1">>
11> {ok, <<"1">>} = lmdb:get(Handle, <<"a">>).
%% ensure key <<"a">>'s value is still <<"1">>
11> {ok, <<"1">>} = emdb:get(Handle, <<"a">>).
%% update the value for key <<"a">>
12> ok = lmdb:update(Handle, <<"a">>, <<"7">>).
%% update the value for key <<"a">>
12> ok = emdb:update(Handle, <<"a">>, <<"7">>).
%% check the new value for key <<"a">>
13> {ok, <<"7">>} = lmdb:get(Handle, <<"a">>).
%% check the new value for key <<"a">>
13> {ok, <<"7">>} = emdb:get(Handle, <<"a">>).
%% delete all key-value pairs in the database
14> ok = lmdb:drop(Handle).
%% delete all key-value pairs in the database
14> ok = emdb:drop(Handle).
%% try to retrieve key <<"a">> value
15> none = lmdb:get(Handle, <<"a">>).
%% try to retrieve key <<"a">> value
15> none = emdb:get(Handle, <<"a">>).
%% close the database
16> ok = lmdb:close(Handle).
%% close the database
16> ok = emdb:close(Handle).
...
17> q().
```
#### Note:
The code below creates a new database with **80GB** MapSize, **avoids fsync** after each commit (for an "ACI" but not "D" database we trade durability for speed) and uses the experimental **MDB_FIXEDMAP**.
```
{ok, Handle} = lmdb:open("/tmp/lmdb2", 85899345920, ?MDB_NOSYNC bor ?MDB_FIXEDMAP).
{ok, Handle} = emdb:open("/tmp/emdb2", 85899345920, ?MDB_NOSYNC bor ?MDB_FIXEDMAP).
```
Performance
@ -117,10 +109,9 @@ See the [microbench](http://highlandsun.com/hyc/mdb/microbench/) against:
* SQLite3
* Kyoto TreeDB
* BerkeleyDB 5.x
* btree2n.c (?)
* WiredTiger (?)
MDB will mmap the entire database into memory which means that if your dataset is larger than 2^32 bytes you'll need to run on a 64-bit arch system.
MDB performs mmap's the database, so unless your dataset is < 2^32 bytes you'll
need to run on a 64-bit arch system.
Supported Operating Systems
@ -129,8 +120,8 @@ Supported Operating Systems
Should work on:
* Linux
* Mac OS/X
* *BSD
* OSX
* FreeBSD
* Windows
TODO
@ -143,26 +134,25 @@ TODO
* basho_bench driver
* EQC, PULSE testing
* Key expirey
* renaming
* emdb -> lmdb
* emdb.c -> lmdb_nif.c
* improve stats
* txn API
* cursor API
* config
* use async_nif affinity
Other Ideas
-----------
* Create a Riak/KV [backend](http://wiki.basho.com/Storage-Backends.html)
* use dups for siblings
* use txn for 2i
* Create a Riak/KV AAE (riak_kv/src/hashtree.erl) alternative using LMDB
* riak_kv backend
* use dups
* 2i
* aae alternative
Status
------
Work in progress, not production quality and not supported by Basho Technologies. This is an experiment at this time, nothing more. You are encouraged to contribute code, tests, etc. as you see fit.
Work in progress, not production quality and not supported by Basho Technologies.
LICENSE
-------
LMDB is Copyright (C) 2012 by Aleph Archives and (C) 2013 by Basho Technologies, Inc., and released under the [OpenLDAP](http://www.OpenLDAP.org/license.html) License.
EMDB is Copyright (C) 2012-2013 by Aleph Archives and Basho Technologies, Inc., and released under the [OpenLDAP](http://www.OpenLDAP.org/license.html) License.

View file

@ -4,16 +4,18 @@
* Copyright (c) 2012 Basho Technologies, Inc. All Rights Reserved.
* Author: Gregory Burd <greg@basho.com> <greg@burd.me>
*
* This file is provided to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
* This file is provided to you under the Apache License,
* Version 2.0 (the "License"); you may not use this file
* except in compliance with the License. You may obtain
* a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
@ -25,26 +27,18 @@ extern "C" {
#endif
#include <assert.h>
#include "fifo_q.h"
#include "stats.h"
#include "queue.h"
#ifndef UNUSED
#define UNUSED(v) ((void)(v))
#ifndef __UNUSED
#define __UNUSED(v) ((void)(v))
#endif
#define ASYNC_NIF_MAX_WORKERS 1024
#define ASYNC_NIF_MIN_WORKERS 2
#define ASYNC_NIF_WORKER_QUEUE_SIZE 8192
#define ASYNC_NIF_MAX_QUEUED_REQS ASYNC_NIF_WORKER_QUEUE_SIZE * ASYNC_NIF_MAX_WORKERS
/* Atoms (initialized in on_load) */
static ERL_NIF_TERM ATOM_EAGAIN;
static ERL_NIF_TERM ATOM_ENOMEM;
static ERL_NIF_TERM ATOM_ENQUEUED;
static ERL_NIF_TERM ATOM_ERROR;
static ERL_NIF_TERM ATOM_OK;
static ERL_NIF_TERM ATOM_SHUTDOWN;
#define ASYNC_NIF_MAX_WORKERS 128
#define ASYNC_NIF_WORKER_QUEUE_SIZE 500
#define ASYNC_NIF_MAX_QUEUED_REQS 1000 * ASYNC_NIF_MAX_WORKERS
STAT_DECL(qwait, 1000);
struct async_nif_req_entry {
ERL_NIF_TERM ref;
@ -53,17 +47,14 @@ struct async_nif_req_entry {
void *args;
void (*fn_work)(ErlNifEnv*, ERL_NIF_TERM, ErlNifPid*, unsigned int, void *);
void (*fn_post)(void *);
STAILQ_ENTRY(async_nif_req_entry) entries;
};
DECL_FIFO_QUEUE(reqs, struct async_nif_req_entry);
struct async_nif_work_queue {
unsigned int num_workers;
unsigned int depth;
STAT_DEF(qwait);
ErlNifMutex *reqs_mutex;
ErlNifCond *reqs_cnd;
struct async_nif_work_queue *next;
STAILQ_HEAD(reqs, async_nif_req_entry) reqs;
FIFO_QUEUE_TYPE(reqs) reqs;
};
struct async_nif_worker_entry {
@ -71,17 +62,16 @@ struct async_nif_worker_entry {
unsigned int worker_id;
struct async_nif_state *async_nif;
struct async_nif_work_queue *q;
SLIST_ENTRY(async_nif_worker_entry) entries;
};
struct async_nif_state {
STAT_DEF(qwait);
unsigned int shutdown;
ErlNifMutex *we_mutex;
unsigned int we_active;
SLIST_HEAD(joining, async_nif_worker_entry) we_joining;
unsigned int num_workers;
struct async_nif_worker_entry worker_entries[ASYNC_NIF_MAX_WORKERS];
unsigned int num_queues;
unsigned int next_q;
STAILQ_HEAD(recycled_reqs, async_nif_req_entry) recycled_reqs;
FIFO_QUEUE_TYPE(reqs) recycled_reqs;
unsigned int num_reqs;
ErlNifMutex *recycled_req_mutex;
struct async_nif_work_queue queues[];
@ -90,43 +80,39 @@ struct async_nif_state {
#define ASYNC_NIF_DECL(decl, frame, pre_block, work_block, post_block) \
struct decl ## _args frame; \
static void fn_work_ ## decl (ErlNifEnv *env, ERL_NIF_TERM ref, ErlNifPid *pid, unsigned int worker_id, struct decl ## _args *args) { \
UNUSED(worker_id); \
DPRINTF("async_nif: calling \"%s\"", __func__); \
__UNUSED(worker_id); \
do work_block while(0); \
DPRINTF("async_nif: returned from \"%s\"", __func__); \
} \
static void fn_post_ ## decl (struct decl ## _args *args) { \
UNUSED(args); \
DPRINTF("async_nif: calling \"fn_post_%s\"", #decl); \
__UNUSED(args); \
do post_block while(0); \
DPRINTF("async_nif: returned from \"fn_post_%s\"", #decl); \
} \
static ERL_NIF_TERM decl(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv_in[]) { \
struct decl ## _args on_stack_args; \
struct decl ## _args *args = &on_stack_args; \
struct decl ## _args *copy_of_args; \
struct async_nif_req_entry *req = NULL; \
unsigned int affinity = 0; \
const char *affinity = NULL; \
ErlNifEnv *new_env = NULL; \
/* argv[0] is a ref used for selective recv */ \
const ERL_NIF_TERM *argv = argv_in + 1; \
argc -= 1; \
/* Note: !!! this assumes that the first element of priv_data is ours */ \
struct async_nif_state *async_nif = *(struct async_nif_state**)enif_priv_data(env); \
if (async_nif->shutdown) \
return enif_make_tuple2(env, ATOM_ERROR, ATOM_SHUTDOWN); \
if (async_nif->shutdown) \
return enif_make_tuple2(env, enif_make_atom(env, "error"), \
enif_make_atom(env, "shutdown")); \
req = async_nif_reuse_req(async_nif); \
if (!req) \
return enif_make_tuple2(env, ATOM_ERROR, ATOM_ENOMEM); \
new_env = req->env; \
DPRINTF("async_nif: calling \"%s\"", __func__); \
if (!req) \
return enif_make_tuple2(env, enif_make_atom(env, "error"), \
enif_make_atom(env, "eagain")); \
do pre_block while(0); \
DPRINTF("async_nif: returned from \"%s\"", __func__); \
copy_of_args = (struct decl ## _args *)malloc(sizeof(struct decl ## _args)); \
copy_of_args = (struct decl ## _args *)enif_alloc(sizeof(struct decl ## _args)); \
if (!copy_of_args) { \
fn_post_ ## decl (args); \
async_nif_recycle_req(req, async_nif); \
return enif_make_tuple2(env, ATOM_ERROR, ATOM_ENOMEM); \
return enif_make_tuple2(env, enif_make_atom(env, "error"), \
enif_make_atom(env, "enomem")); \
} \
memcpy(copy_of_args, args, sizeof(struct decl ## _args)); \
req->ref = enif_make_copy(new_env, argv_in[0]); \
@ -136,13 +122,13 @@ struct async_nif_state {
req->fn_post = (void (*)(void *))fn_post_ ## decl; \
int h = -1; \
if (affinity) \
h = ((unsigned int)affinity) % async_nif->num_queues; \
h = async_nif_str_hash_func(affinity) % async_nif->num_queues; \
ERL_NIF_TERM reply = async_nif_enqueue_req(async_nif, req, h); \
if (!reply) { \
fn_post_ ## decl (args); \
async_nif_recycle_req(req, async_nif); \
free(copy_of_args); \
return enif_make_tuple2(env, ATOM_ERROR, ATOM_EAGAIN); \
enif_free(copy_of_args); \
return enif_make_tuple2(env, enif_make_atom(env, "error"), \
enif_make_atom(env, "shutdown")); \
} \
return reply; \
}
@ -150,16 +136,16 @@ struct async_nif_state {
#define ASYNC_NIF_INIT(name) \
static ErlNifMutex *name##_async_nif_coord = NULL;
#define ASYNC_NIF_LOAD(name, env, priv) do { \
#define ASYNC_NIF_LOAD(name, priv) do { \
if (!name##_async_nif_coord) \
name##_async_nif_coord = enif_mutex_create("nif_coord load"); \
name##_async_nif_coord = enif_mutex_create(NULL); \
enif_mutex_lock(name##_async_nif_coord); \
priv = async_nif_load(env); \
priv = async_nif_load(); \
enif_mutex_unlock(name##_async_nif_coord); \
} while(0);
#define ASYNC_NIF_UNLOAD(name, env, priv) do { \
if (!name##_async_nif_coord) \
name##_async_nif_coord = enif_mutex_create("nif_coord unload"); \
name##_async_nif_coord = enif_mutex_create(NULL); \
enif_mutex_lock(name##_async_nif_coord); \
async_nif_unload(env, priv); \
enif_mutex_unlock(name##_async_nif_coord); \
@ -168,7 +154,7 @@ struct async_nif_state {
} while(0);
#define ASYNC_NIF_UPGRADE(name, env) do { \
if (!name##_async_nif_coord) \
name##_async_nif_coord = enif_mutex_create("nif_coord upgrade"); \
name##_async_nif_coord = enif_mutex_create(NULL); \
enif_mutex_lock(name##_async_nif_coord); \
async_nif_upgrade(env); \
enif_mutex_unlock(name##_async_nif_coord); \
@ -193,26 +179,26 @@ async_nif_reuse_req(struct async_nif_state *async_nif)
ErlNifEnv *env = NULL;
enif_mutex_lock(async_nif->recycled_req_mutex);
if (STAILQ_EMPTY(&async_nif->recycled_reqs)) {
if (fifo_q_empty(reqs, async_nif->recycled_reqs)) {
if (async_nif->num_reqs < ASYNC_NIF_MAX_QUEUED_REQS) {
req = malloc(sizeof(struct async_nif_req_entry));
req = enif_alloc(sizeof(struct async_nif_req_entry));
if (req) {
memset(req, 0, sizeof(struct async_nif_req_entry));
env = enif_alloc_env();
if (env) {
req->env = env;
__sync_fetch_and_add(&async_nif->num_reqs, 1);
} else {
free(req);
if (!env) {
enif_free(req);
req = NULL;
} else {
req->env = env;
async_nif->num_reqs++;
}
}
}
} else {
req = STAILQ_FIRST(&async_nif->recycled_reqs);
STAILQ_REMOVE(&async_nif->recycled_reqs, req, async_nif_req_entry, entries);
req = fifo_q_get(reqs, async_nif->recycled_reqs);
}
enif_mutex_unlock(async_nif->recycled_req_mutex);
STAT_TICK(async_nif, qwait);
return req;
}
@ -226,59 +212,27 @@ async_nif_reuse_req(struct async_nif_state *async_nif)
void
async_nif_recycle_req(struct async_nif_req_entry *req, struct async_nif_state *async_nif)
{
ErlNifEnv *env = NULL;
STAT_TOCK(async_nif, qwait);
enif_mutex_lock(async_nif->recycled_req_mutex);
enif_clear_env(req->env);
env = req->env;
memset(req, 0, sizeof(struct async_nif_req_entry));
req->env = env;
STAILQ_INSERT_TAIL(&async_nif->recycled_reqs, req, entries);
fifo_q_put(reqs, async_nif->recycled_reqs, req);
enif_mutex_unlock(async_nif->recycled_req_mutex);
}
static void *async_nif_worker_fn(void *);
/**
* Start up a worker thread.
* A string hash function.
*
* A basic hash function for strings of characters used during the
* affinity association.
*
* s a NULL terminated set of bytes to be hashed
* -> an integer hash encoding of the bytes
*/
static int
async_nif_start_worker(struct async_nif_state *async_nif, struct async_nif_work_queue *q)
static inline unsigned int
async_nif_str_hash_func(const char *s)
{
struct async_nif_worker_entry *we;
if (0 == q)
return EINVAL;
enif_mutex_lock(async_nif->we_mutex);
we = SLIST_FIRST(&async_nif->we_joining);
while(we != NULL) {
struct async_nif_worker_entry *n = SLIST_NEXT(we, entries);
SLIST_REMOVE(&async_nif->we_joining, we, async_nif_worker_entry, entries);
void *exit_value = 0; /* We ignore the thread_join's exit value. */
enif_thread_join(we->tid, &exit_value);
free(we);
async_nif->we_active--;
we = n;
}
if (async_nif->we_active == ASYNC_NIF_MAX_WORKERS) {
enif_mutex_unlock(async_nif->we_mutex);
return EAGAIN;
}
we = malloc(sizeof(struct async_nif_worker_entry));
if (!we) {
enif_mutex_unlock(async_nif->we_mutex);
return ENOMEM;
}
memset(we, 0, sizeof(struct async_nif_worker_entry));
we->worker_id = async_nif->we_active++;
we->async_nif = async_nif;
we->q = q;
enif_mutex_unlock(async_nif->we_mutex);
return enif_thread_create(NULL,&we->tid, &async_nif_worker_fn, (void*)we, 0);
unsigned int h = (unsigned int)*s;
if (h) for (++s ; *s; ++s) h = (h << 5) - h + (unsigned int)*s;
return h;
}
/**
@ -291,9 +245,9 @@ static ERL_NIF_TERM
async_nif_enqueue_req(struct async_nif_state* async_nif, struct async_nif_req_entry *req, int hint)
{
/* Identify the most appropriate worker for this request. */
unsigned int i, last_qid, qid = 0;
unsigned int qid = 0;
struct async_nif_work_queue *q = NULL;
double avg_depth = 0.0;
unsigned int n = async_nif->num_queues;
/* Either we're choosing a queue based on some affinity/hinted value or we
need to select the next queue in the rotation and atomically update that
@ -301,76 +255,53 @@ async_nif_enqueue_req(struct async_nif_state* async_nif, struct async_nif_req_en
if (hint >= 0) {
qid = (unsigned int)hint;
} else {
do {
last_qid = __sync_fetch_and_add(&async_nif->next_q, 0);
qid = (last_qid + 1) % async_nif->num_queues;
} while (!__sync_bool_compare_and_swap(&async_nif->next_q, last_qid, qid));
qid = async_nif->next_q;
qid = (qid + 1) % async_nif->num_queues;
async_nif->next_q = qid;
}
/* Now we inspect and interate across the set of queues trying to select one
that isn't too full or too slow. */
for (i = 0; i < async_nif->num_queues; i++) {
/* Compute the average queue depth not counting queues which are empty or
the queue we're considering right now. */
unsigned int j, n = 0;
for (j = 0; j < async_nif->num_queues; j++) {
if (j != qid && async_nif->queues[j].depth != 0) {
n++;
avg_depth += async_nif->queues[j].depth;
}
}
if (avg_depth) avg_depth /= n;
/* Lock this queue under consideration, then check for shutdown. While
we hold this lock either a) we're shutting down so exit now or b) this
queue will be valid until we release the lock. */
do {
q = &async_nif->queues[qid];
enif_mutex_lock(q->reqs_mutex);
/* Try not to enqueue a request into a queue that isn't keeping up with
the request volume. */
if (q->depth <= avg_depth) break;
else {
/* Now that we hold the lock, check for shutdown. As long as we hold
this lock either a) we're shutting down so exit now or b) this queue
will be valid until we release the lock. */
if (async_nif->shutdown) {
enif_mutex_unlock(q->reqs_mutex);
return 0;
}
double await = STAT_MEAN_LOG2_SAMPLE(async_nif, qwait);
double await_inthisq = STAT_MEAN_LOG2_SAMPLE(q, qwait);
if (fifo_q_full(reqs, q->reqs) || await_inthisq > await) {
enif_mutex_unlock(q->reqs_mutex);
qid = (qid + 1) % async_nif->num_queues;
q = &async_nif->queues[qid];
} else {
break;
}
}
// TODO: at some point add in work sheading/stealing
} while(n-- > 0);
/* If the for loop finished then we didn't find a suitable queue for this
request, meaning we're backed up so trigger eagain. Note that if we left
the loop in this way we hold no lock. */
if (i == async_nif->num_queues) return 0;
/* Add the request to the queue. */
STAILQ_INSERT_TAIL(&q->reqs, req, entries);
__sync_fetch_and_add(&q->depth, 1);
/* We've selected a queue for this new request now check to make sure there are
enough workers actively processing requests on this queue. */
while (q->depth > q->num_workers) {
switch(async_nif_start_worker(async_nif, q)) {
case EINVAL: case ENOMEM: default: return 0;
case EAGAIN: continue;
case 0: __sync_fetch_and_add(&q->num_workers, 1); goto done;
}
}done:;
/* We hold the queue's lock, and we've seletect a reasonable queue for this
new request so add the request. */
STAT_TICK(q, qwait);
fifo_q_put(reqs, q->reqs, req);
/* Build the term before releasing the lock so as not to race on the use of
the req pointer (which will soon become invalid in another thread
performing the request). */
double pct_full = (double)avg_depth / (double)ASYNC_NIF_WORKER_QUEUE_SIZE;
ERL_NIF_TERM reply = enif_make_tuple2(req->env, ATOM_OK,
enif_make_tuple2(req->env, ATOM_ENQUEUED,
enif_make_double(req->env, pct_full)));
enif_cond_signal(q->reqs_cnd);
ERL_NIF_TERM reply = enif_make_tuple2(req->env, enif_make_atom(req->env, "ok"),
enif_make_atom(req->env, "enqueued"));
enif_mutex_unlock(q->reqs_mutex);
enif_cond_signal(q->reqs_cnd);
return reply;
}
/**
* Worker threads execute this function. Here each worker pulls requests of
* their respective queues, executes that work and continues doing that until
* they see the shutdown flag is set at which point they exit.
* TODO:
*/
static void *
async_nif_worker_fn(void *arg)
@ -380,7 +311,6 @@ async_nif_worker_fn(void *arg)
struct async_nif_state *async_nif = we->async_nif;
struct async_nif_work_queue *q = we->q;
struct async_nif_req_entry *req = NULL;
unsigned int tries = async_nif->num_queues;
for(;;) {
/* Examine the request queue, are there things to be done? */
@ -390,40 +320,26 @@ async_nif_worker_fn(void *arg)
enif_mutex_unlock(q->reqs_mutex);
break;
}
if (STAILQ_EMPTY(&q->reqs)) {
if (fifo_q_empty(reqs, q->reqs)) {
/* Queue is empty so we wait for more work to arrive. */
enif_mutex_unlock(q->reqs_mutex);
if (tries == 0 && q == we->q) {
if (q->num_workers > ASYNC_NIF_MIN_WORKERS) {
/* At this point we've tried to find/execute work on all queues
* and there are at least MIN_WORKERS on this queue so we
* leaving this loop (break) which leads to a thread exit/join. */
break;
} else {
enif_mutex_lock(q->reqs_mutex);
enif_cond_wait(q->reqs_cnd, q->reqs_mutex);
goto check_again_for_work;
}
} else {
tries--;
__sync_fetch_and_add(&q->num_workers, -1);
q = q->next;
__sync_fetch_and_add(&q->num_workers, 1);
continue; // try next queue
}
STAT_RESET(q, qwait);
enif_cond_wait(q->reqs_cnd, q->reqs_mutex);
goto check_again_for_work;
} else {
assert(fifo_q_size(reqs, q->reqs) > 0);
assert(fifo_q_size(reqs, q->reqs) < fifo_q_capacity(reqs, q->reqs));
/* At this point the next req is ours to process and we hold the
reqs_mutex lock. Take the request off the queue. */
req = STAILQ_FIRST(&q->reqs);
STAILQ_REMOVE(&q->reqs, req, async_nif_req_entry, entries);
__sync_fetch_and_add(&q->depth, -1);
/* Wake up other worker thread watching this queue to help process work. */
enif_cond_signal(q->reqs_cnd);
req = fifo_q_get(reqs, q->reqs);
enif_mutex_unlock(q->reqs_mutex);
/* Ensure that there is at least one other worker thread watching this
queue. */
enif_cond_signal(q->reqs_cnd);
/* Perform the work. */
req->fn_work(req->env, req->ref, &req->pid, worker_id, req->args);
STAT_TOCK(q, qwait);
/* Now call the post-work cleanup function. */
req->fn_post(req->args);
@ -432,16 +348,13 @@ async_nif_worker_fn(void *arg)
req->ref = 0;
req->fn_work = 0;
req->fn_post = 0;
free(req->args);
enif_free(req->args);
req->args = NULL;
enif_clear_env(req->env);
async_nif_recycle_req(req, async_nif);
req = NULL;
}
}
enif_mutex_lock(async_nif->we_mutex);
SLIST_INSERT_HEAD(&async_nif->we_joining, we, entries);
enif_mutex_unlock(async_nif->we_mutex);
__sync_fetch_and_add(&q->num_workers, -1);
enif_thread_exit(0);
return 0;
}
@ -453,44 +366,41 @@ async_nif_unload(ErlNifEnv *env, struct async_nif_state *async_nif)
unsigned int num_queues = async_nif->num_queues;
struct async_nif_work_queue *q = NULL;
struct async_nif_req_entry *req = NULL;
struct async_nif_worker_entry *we = NULL;
UNUSED(env);
__UNUSED(env);
/* Signal the worker threads, stop what you're doing and exit. To ensure
that we don't race with the enqueue() process we first lock all the worker
queues, then set shutdown to true, then unlock. The enqueue function will
take the queue mutex, then test for shutdown condition, then enqueue only
if not shutting down. */
STAT_PRINT(async_nif, qwait, "wterl");
/* Signal the worker threads, stop what you're doing and exit. To
ensure that we don't race with the enqueue() process we first
lock all the worker queues, then set shutdown to true, then
unlock. The enqueue function will take the queue mutex, then
test for shutdown condition, then enqueue only if not shutting
down. */
for (i = 0; i < num_queues; i++) {
q = &async_nif->queues[i];
enif_mutex_lock(q->reqs_mutex);
}
/* Set the shutdown flag so that worker threads will no continue
executing requests. */
async_nif->shutdown = 1;
for (i = 0; i < num_queues; i++) {
q = &async_nif->queues[i];
enif_cond_broadcast(q->reqs_cnd);
enif_mutex_unlock(q->reqs_mutex);
}
/* Join for the now exiting worker threads. */
while(async_nif->we_active > 0) {
for (i = 0; i < num_queues; i++)
enif_cond_broadcast(async_nif->queues[i].reqs_cnd);
enif_mutex_lock(async_nif->we_mutex);
we = SLIST_FIRST(&async_nif->we_joining);
while(we != NULL) {
struct async_nif_worker_entry *n = SLIST_NEXT(we, entries);
SLIST_REMOVE(&async_nif->we_joining, we, async_nif_worker_entry, entries);
void *exit_value = 0; /* We ignore the thread_join's exit value. */
enif_thread_join(we->tid, &exit_value);
free(we);
async_nif->we_active--;
we = n;
}
enif_mutex_unlock(async_nif->we_mutex);
for (i = 0; i < async_nif->num_workers; ++i) {
void *exit_value = 0; /* We ignore the thread_join's exit value. */
enif_thread_join(async_nif->worker_entries[i].tid, &exit_value);
}
enif_mutex_destroy(async_nif->we_mutex);
/* Free req structres sitting on the recycle queue. */
enif_mutex_lock(async_nif->recycled_req_mutex);
req = NULL;
fifo_q_foreach(reqs, async_nif->recycled_reqs, req, {
enif_free_env(req->env);
enif_free(req);
});
fifo_q_free(reqs, async_nif->recycled_reqs);
/* Cleanup in-flight requests, mutexes and conditions in each work queue. */
for (i = 0; i < num_queues; i++) {
@ -498,44 +408,32 @@ async_nif_unload(ErlNifEnv *env, struct async_nif_state *async_nif)
/* Worker threads are stopped, now toss anything left in the queue. */
req = NULL;
req = STAILQ_FIRST(&q->reqs);
while(req != NULL) {
struct async_nif_req_entry *n = STAILQ_NEXT(req, entries);
fifo_q_foreach(reqs, q->reqs, req, {
enif_clear_env(req->env);
enif_send(NULL, &req->pid, req->env,
enif_make_tuple2(req->env, ATOM_ERROR, ATOM_SHUTDOWN));
enif_make_tuple2(req->env, enif_make_atom(req->env, "error"),
enif_make_atom(req->env, "shutdown")));
req->fn_post(req->args);
enif_free_env(req->env);
free(req->args);
free(req);
req = n;
}
enif_free(req->args);
enif_free(req);
});
fifo_q_free(reqs, q->reqs);
enif_mutex_destroy(q->reqs_mutex);
enif_cond_destroy(q->reqs_cnd);
}
/* Free any req structures sitting unused on the recycle queue. */
enif_mutex_lock(async_nif->recycled_req_mutex);
req = NULL;
req = STAILQ_FIRST(&async_nif->recycled_reqs);
while(req != NULL) {
struct async_nif_req_entry *n = STAILQ_NEXT(req, entries);
enif_free_env(req->env);
free(req);
req = n;
}
enif_mutex_unlock(async_nif->recycled_req_mutex);
enif_mutex_destroy(async_nif->recycled_req_mutex);
memset(async_nif, 0, sizeof(struct async_nif_state) + (sizeof(struct async_nif_work_queue) * async_nif->num_queues));
free(async_nif);
enif_free(async_nif);
}
static void *
async_nif_load(ErlNifEnv *env)
async_nif_load()
{
static int has_init = 0;
unsigned int i, num_queues;
unsigned int i, j, num_queues;
ErlNifSysInfo info;
struct async_nif_state *async_nif;
@ -543,14 +441,6 @@ async_nif_load(ErlNifEnv *env)
if (has_init) return 0;
else has_init = 1;
/* Init some static references to commonly used atoms. */
ATOM_EAGAIN = enif_make_atom(env, "eagain");
ATOM_ENOMEM = enif_make_atom(env, "enomem");
ATOM_ENQUEUED = enif_make_atom(env, "enqueued");
ATOM_ERROR = enif_make_atom(env, "error");
ATOM_OK = enif_make_atom(env, "ok");
ATOM_SHUTDOWN = enif_make_atom(env, "shutdown");
/* Find out how many schedulers there are. */
enif_system_info(&info, sizeof(ErlNifSysInfo));
@ -568,28 +458,62 @@ async_nif_load(ErlNifEnv *env)
}
/* Init our portion of priv_data's module-specific state. */
async_nif = malloc(sizeof(struct async_nif_state) +
sizeof(struct async_nif_work_queue) * num_queues);
async_nif = enif_alloc(sizeof(struct async_nif_state) +
sizeof(struct async_nif_work_queue) * num_queues);
if (!async_nif)
return NULL;
memset(async_nif, 0, sizeof(struct async_nif_state) +
sizeof(struct async_nif_work_queue) * num_queues);
sizeof(struct async_nif_work_queue) * num_queues);
async_nif->num_queues = num_queues;
async_nif->we_active = 0;
async_nif->num_workers = 2 * num_queues;
async_nif->next_q = 0;
async_nif->shutdown = 0;
STAILQ_INIT(&async_nif->recycled_reqs);
async_nif->recycled_req_mutex = enif_mutex_create("recycled_req");
async_nif->we_mutex = enif_mutex_create("we");
SLIST_INIT(&async_nif->we_joining);
async_nif->recycled_reqs = fifo_q_new(reqs, ASYNC_NIF_MAX_QUEUED_REQS);
async_nif->recycled_req_mutex = enif_mutex_create(NULL);
STAT_INIT(async_nif, qwait);
for (i = 0; i < async_nif->num_queues; i++) {
struct async_nif_work_queue *q = &async_nif->queues[i];
STAILQ_INIT(&q->reqs);
q->reqs_mutex = enif_mutex_create("reqs");
q->reqs_cnd = enif_cond_create("reqs");
q->next = &async_nif->queues[(i + 1) % num_queues];
q->reqs = fifo_q_new(reqs, ASYNC_NIF_WORKER_QUEUE_SIZE);
q->reqs_mutex = enif_mutex_create(NULL);
q->reqs_cnd = enif_cond_create(NULL);
STAT_INIT(q, qwait);
}
/* Setup the thread pool management. */
memset(async_nif->worker_entries, 0, sizeof(struct async_nif_worker_entry) * ASYNC_NIF_MAX_WORKERS);
/* Start the worker threads. */
for (i = 0; i < async_nif->num_workers; i++) {
struct async_nif_worker_entry *we = &async_nif->worker_entries[i];
we->async_nif = async_nif;
we->worker_id = i;
we->q = &async_nif->queues[i % async_nif->num_queues];
if (enif_thread_create(NULL, &async_nif->worker_entries[i].tid,
&async_nif_worker_fn, (void*)we, NULL) != 0) {
async_nif->shutdown = 1;
for (j = 0; j < async_nif->num_queues; j++) {
struct async_nif_work_queue *q = &async_nif->queues[j];
enif_cond_broadcast(q->reqs_cnd);
}
while(i-- > 0) {
void *exit_value = 0; /* Ignore this. */
enif_thread_join(async_nif->worker_entries[i].tid, &exit_value);
}
for (j = 0; j < async_nif->num_queues; j++) {
struct async_nif_work_queue *q = &async_nif->queues[j];
enif_mutex_destroy(q->reqs_mutex);
enif_cond_destroy(q->reqs_cnd);
}
memset(async_nif->worker_entries, 0, sizeof(struct async_nif_worker_entry) * ASYNC_NIF_MAX_WORKERS);
enif_free(async_nif);
return NULL;
}
}
return async_nif;
}
@ -597,7 +521,7 @@ async_nif_load(ErlNifEnv *env)
static void
async_nif_upgrade(ErlNifEnv *env)
{
UNUSED(env);
__UNUSED(env);
// TODO:
}

View file

@ -24,28 +24,24 @@
extern "C" {
#endif
#if !(__STDC_VERSION__ >= 199901L || defined(__GNUC__))
# undef DEBUG
# define DEBUG 0
# define DPRINTF (void) /* Vararg macros may be unsupported */
#elif DEBUG
#ifdef DEBUG
#include <stdio.h>
#include <stdarg.h>
#ifndef DPRINTF
#define DPRINTF(fmt, ...) \
do { \
fprintf(stderr, "%s:%d " fmt "\n", __FILE__, __LINE__, __VA_ARGS__); \
fprintf(stderr, "%s:%d " fmt "\n", __func__, __LINE__, __VA_ARGS__); \
fflush(stderr); \
} while(0)
#define DPUTS(arg) DPRINTF("%s", arg)
#endif
#ifndef DPUTS
#define DPUTS(arg) DPRINTF("%s", arg)
#endif
#else
#define DPRINTF(fmt, ...) ((void) 0)
#define DPUTS(arg) ((void) 0)
#endif
#ifndef __UNUSED
#define __UNUSED(v) ((void)(v))
#endif
#ifndef COMPQUIET
#define COMPQUIET(n, v) do { \
(n) = (v); \
@ -53,12 +49,11 @@ extern "C" {
} while (0)
#endif
#ifdef __APPLE__
#define PRIuint64(x) (x)
#else
#define PRIuint64(x) (unsigned long long)(x)
#ifndef __UNUSED
#define __UNUSED(v) ((void)(v))
#endif
#if defined(__cplusplus)
}
#endif

98
c_src/duration.h Normal file
View file

@ -0,0 +1,98 @@
/*
* Copyright (C) 2013, all rights reserved by Gregory Burd <greg@burd.me>
*
* This Source Code Form is subject to the terms of the Mozilla Public License,
* version 2 (MPLv2). If a copy of the MPL was not distributed with this file,
* you can obtain one at: http://mozilla.org/MPL/2.0/
*
* NOTES:
* - on some platforms this will require -lrt
*/
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <sys/timeb.h>
typedef enum { ns = 0, mcs, ms, s } time_scale;
struct scale_time {
const char *abbreviation;
const char *name;
uint64_t mul, div, overhead, ticks_per;
};
static const struct scale_time scale[] = {
{ "ns", "nanosecond", 1000000000LL, 1LL, 10, 2300000000000LL },
{ "mcs", "microsecond", 1000000LL, 1000LL, 10, 2300000000LL },
{ "ms", "millisecond", 1000LL, 1000000LL, 10, 2300000LL },
{ "sec", "second", 1LL, 1000000000LL, 10, 2300LL } };
static uint64_t ts(time_scale unit)
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (((uint64_t)ts.tv_sec * scale[unit].mul) +
((uint64_t)ts.tv_nsec / scale[unit].div));
}
#if 0
//if defined(__i386__) || defined(__x86_64__)
/**
* cpu_clock_ticks()
*
* A measure provided by Intel x86 CPUs which provides the number of cycles
* (aka "ticks") executed as a counter using the RDTSC instruction.
*/
static inline uint64_t cpu_clock_ticks()
{
uint32_t lo, hi;
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx" );
return (uint64_t)hi << 32 | lo;
}
/**
* cpu_clock_ticks()
*
* An approximation of elapsed [ns, mcs, ms, s] from CPU clock ticks.
*/
static uint64_t elapsed_cpu_clock_ticks(uint64_t start, time_scale unit)
{
return (cpu_clock_ticks() - start - scale[unit].overhead) * scale[unit].ticks_per;
}
#endif
typedef struct {
uint64_t then;
time_scale unit;
} duration_t;
static inline uint64_t elapsed(duration_t *d)
{
uint64_t now = ts(d->unit);
uint64_t elapsed = now - d->then;
d->then = now;
return elapsed;
}
#define DURATION(name, resolution) duration_t name = \
{ts(resolution), resolution}
#define ELAPSED_DURING(result, resolution, block) \
do { \
DURATION(__x, resolution); \
do block while(0); \
*result = elapsed(&__x); \
} while(0);
#define CYCLES_DURING(result, block) \
do { \
uint64_t __begin = cpu_clock_ticks(); \
do block while(0); \
*result = cpu_clock_ticks() - __begin; \
} while(0);

View file

@ -1,8 +1,7 @@
/* -------------------------------------------------------------------------
* This file is part of LMDB - Erlang Lightning MDB API
* This file is part of EMDB - Erlang MDB API
*
* Copyright (c) 2012 by Aleph Archives. All rights reserved.
* Copyright (c) 2013 by Basho Technologies, Inc. All rights reserved.
*
* -------------------------------------------------------------------------
* Redistribution and use in source and binary forms, with or without
@ -28,7 +27,6 @@
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
#include <errno.h>
#include <sys/param.h>
#include <erl_nif.h>
@ -36,22 +34,30 @@
#include "common.h"
#include "async_nif.h"
#include "stats.h"
#include "lmdb.h"
static ErlNifResourceType *lmdb_RESOURCE;
struct lmdb {
STAT_DECL(emdb_get, 1000);
STAT_DECL(emdb_put, 1000);
STAT_DECL(emdb_del, 1000);
STAT_DECL(emdb_upd, 1000);
static ErlNifResourceType *emdb_RESOURCE;
struct emdb {
MDB_env *env;
MDB_txn *txn;
MDB_cursor *cursor;
MDB_dbi dbi;
STAT_DEF(emdb_get);
STAT_DEF(emdb_put);
STAT_DEF(emdb_del);
STAT_DEF(emdb_upd);
};
struct lmdb_priv_data {
struct emdb_priv_data {
void *async_nif_priv; // Note: must be first element in struct
};
/* Global init for async_nif. */
ASYNC_NIF_INIT(lmdb);
ASYNC_NIF_INIT(emdb);
/* Atoms (initialized in on_load) */
static ERL_NIF_TERM ATOM_ERROR;
@ -76,9 +82,6 @@ static ERL_NIF_TERM ATOM_MAP_RESIZED;
static ERL_NIF_TERM ATOM_INCOMPATIBLE;
static ERL_NIF_TERM ATOM_BAD_RSLOT;
static ERL_NIF_TERM ATOM_TXN_STARTED;
static ERL_NIF_TERM ATOM_TXN_NOT_STARTED;
#define CHECK(expr, label) \
if (MDB_SUCCESS != (ret = (expr))) { \
DPRINTF("CHECK(\"%s\") failed \"%s\" at %s:%d in %s()\n", \
@ -102,7 +105,7 @@ static ERL_NIF_TERM ATOM_TXN_NOT_STARTED;
static ERL_NIF_TERM
__strerror_term(ErlNifEnv* env, int err)
{
ERL_NIF_TERM term = 0;
ERL_NIF_TERM term;
if (err < MDB_LAST_ERRCODE && err > MDB_KEYEXIST) {
switch (err) {
@ -179,7 +182,7 @@ __strerror_term(ErlNifEnv* env, int err)
* argv[2] flags
*/
ASYNC_NIF_DECL(
lmdb_open,
emdb_open,
{ // struct
char dirname[MAXPATHLEN];
@ -203,12 +206,17 @@ ASYNC_NIF_DECL(
ERL_NIF_TERM err;
MDB_txn *txn;
struct lmdb *handle;
struct emdb *handle;
int ret;
if ((handle = enif_alloc_resource(lmdb_RESOURCE, sizeof(struct lmdb))) == NULL)
if ((handle = enif_alloc_resource(emdb_RESOURCE, sizeof(struct emdb))) == NULL)
FAIL_ERR(ENOMEM, err3);
STAT_INIT(handle, emdb_get);
STAT_INIT(handle, emdb_put);
STAT_INIT(handle, emdb_upd);
STAT_INIT(handle, emdb_del);
CHECK(mdb_env_create(&(handle->env)), err2);
if (mdb_env_set_mapsize(handle->env, args->mapsize)) {
@ -221,9 +229,6 @@ ASYNC_NIF_DECL(
CHECK(mdb_open(txn, NULL, 0, &(handle->dbi)), err1);
CHECK(mdb_txn_commit(txn), err1);
handle->txn = NULL;
handle->cursor = NULL;
ERL_NIF_TERM term = enif_make_resource(env, handle);
enif_release_resource(handle);
ASYNC_NIF_REPLY(enif_make_tuple(env, 2, ATOM_OK, term));
@ -248,15 +253,15 @@ ASYNC_NIF_DECL(
* argv[0] reference to the MDB handle resource
*/
ASYNC_NIF_DECL(
lmdb_close,
emdb_close,
{ // struct
struct lmdb *handle;
struct emdb *handle;
},
{ // pre
if (!(argc == 1 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle))) {
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle))) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
@ -265,7 +270,15 @@ ASYNC_NIF_DECL(
},
{ // work
STAT_PRINT(args->handle, emdb_get, "emdb");
STAT_PRINT(args->handle, emdb_put, "emdb");
STAT_PRINT(args->handle, emdb_del, "emdb");
STAT_PRINT(args->handle, emdb_upd, "emdb");
mdb_env_close(args->handle->env);
STAT_RESET(args->handle, emdb_get);
STAT_RESET(args->handle, emdb_put);
STAT_RESET(args->handle, emdb_del);
STAT_RESET(args->handle, emdb_upd);
args->handle->env = NULL;
ASYNC_NIF_REPLY(ATOM_OK);
return;
@ -284,23 +297,24 @@ ASYNC_NIF_DECL(
* argv[2] value as an Erlang binary
*/
ASYNC_NIF_DECL(
lmdb_put,
emdb_put,
{ // struct
struct lmdb *handle;
struct emdb *handle;
ERL_NIF_TERM key;
ERL_NIF_TERM val;
},
{ // pre
if (!(argc == 3 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle) &&
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle) &&
enif_is_binary(env, argv[1]) &&
enif_is_binary(env, argv[2]) )) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
STAT_TICK(args->handle, emdb_put);
enif_keep_resource((void*)args->handle);
args->key = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[1]);
args->val = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[2]);
@ -328,11 +342,7 @@ ASYNC_NIF_DECL(
mkey.mv_data = key.data;
mdata.mv_size = val.size;
mdata.mv_data = val.data;
if(args->handle->txn == NULL) {
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err2);
} else {
txn = args->handle->txn;
}
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err2);
ret = mdb_put(txn, args->handle->dbi, &mkey, &mdata, MDB_NOOVERWRITE);
if (MDB_KEYEXIST == ret) {
@ -342,8 +352,8 @@ ASYNC_NIF_DECL(
if (ret != 0)
FAIL_ERR(ret, err1);
if(args->handle->txn == NULL)
CHECK(mdb_txn_commit(txn), err1);
CHECK(mdb_txn_commit(txn), err1);
STAT_TOCK(args->handle, emdb_put);
ASYNC_NIF_REPLY(ATOM_OK);
return;
@ -358,6 +368,7 @@ ASYNC_NIF_DECL(
enif_release_resource((void*)args->handle);
});
/**
* Update and existin value indexed by key.
*
@ -366,23 +377,24 @@ ASYNC_NIF_DECL(
* argv[2] value as an Erlang binary
*/
ASYNC_NIF_DECL(
lmdb_update,
emdb_update,
{ // struct
struct lmdb *handle;
struct emdb *handle;
ERL_NIF_TERM key;
ERL_NIF_TERM val;
},
{ // pre
if (!(argc == 3 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle) &&
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle) &&
enif_is_binary(env, argv[1]) &&
enif_is_binary(env, argv[2]) )) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
STAT_TICK(args->handle, emdb_upd);
enif_keep_resource((void*)args->handle);
args->key = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[1]);
args->val = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[2]);
@ -411,16 +423,10 @@ ASYNC_NIF_DECL(
mdata.mv_size = val.size;
mdata.mv_data = val.data;
if(args->handle->txn == NULL) {
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err2);
} else {
txn = args->handle->txn;
}
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err2);
CHECK(mdb_put(txn, args->handle->dbi, &mkey, &mdata, 0), err1);
if(args->handle->txn == NULL)
CHECK(mdb_txn_commit(txn), err1);
CHECK(mdb_txn_commit(txn), err1);
STAT_TOCK(args->handle, emdb_upd);
ASYNC_NIF_REPLY(ATOM_OK);
return;
@ -443,21 +449,22 @@ ASYNC_NIF_DECL(
* argv[1] key as an Erlang binary
*/
ASYNC_NIF_DECL(
lmdb_get,
emdb_get,
{ // struct
struct lmdb *handle;
struct emdb *handle;
ERL_NIF_TERM key;
},
{ // pre
if (!(argc == 2 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle) &&
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle) &&
enif_is_binary(env, argv[1]) )) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
STAT_TICK(args->handle, emdb_get);
enif_keep_resource((void*)args->handle);
args->key = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[1]);
},
@ -480,15 +487,10 @@ ASYNC_NIF_DECL(
mkey.mv_size = key.size;
mkey.mv_data = key.data;
if(args->handle->txn == NULL) {
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err);
} else {
txn = args->handle->txn;
}
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, &txn), err);
ret = mdb_get(txn, args->handle->dbi, &mkey, &mdata);
if(args->handle->txn == NULL)
mdb_txn_abort(txn);
mdb_txn_abort(txn);
if (MDB_NOTFOUND == ret) {
ASYNC_NIF_REPLY(ATOM_NOT_FOUND);
return;
@ -502,6 +504,7 @@ ASYNC_NIF_DECL(
FAIL_ERR(ENOMEM, err);
memcpy(bin, mdata.mv_data, mdata.mv_size);
STAT_TOCK(args->handle, emdb_get);
ASYNC_NIF_REPLY(enif_make_tuple(env, 2, ATOM_OK, val));
return;
@ -522,21 +525,22 @@ ASYNC_NIF_DECL(
* argv[1] key as an Erlang binary
*/
ASYNC_NIF_DECL(
lmdb_del,
emdb_del,
{ // struct
struct lmdb *handle;
struct emdb *handle;
ERL_NIF_TERM key;
},
{ // pre
if (!(argc == 2 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle) &&
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle) &&
enif_is_binary(env, argv[1]) )) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
STAT_TICK(args->handle, emdb_del);
enif_keep_resource((void*)args->handle);
args->key = enif_make_copy(ASYNC_NIF_WORK_ENV, argv[1]);
},
@ -556,22 +560,17 @@ ASYNC_NIF_DECL(
mkey.mv_size = key.size;
mkey.mv_data = key.data;
if(args->handle->txn == NULL) {
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err);
} else {
txn = args->handle->txn;
}
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, & txn), err);
ret = mdb_del(txn, args->handle->dbi, &mkey, NULL);
if(MDB_NOTFOUND == ret) {
if(args->handle->txn == NULL)
mdb_txn_abort(txn);
mdb_txn_abort(txn);
ASYNC_NIF_REPLY(ATOM_NOT_FOUND);
return;
}
if(args->handle->txn == NULL)
CHECK(mdb_txn_commit(txn), err);
CHECK(mdb_txn_commit(txn), err);
STAT_TOCK(args->handle, emdb_del);
ASYNC_NIF_REPLY(ATOM_OK);
return;
@ -591,15 +590,15 @@ ASYNC_NIF_DECL(
* argv[0] reference to the MDB handle resource
*/
ASYNC_NIF_DECL(
lmdb_drop,
emdb_drop,
{ // struct
struct lmdb *handle;
struct emdb *handle;
},
{ // pre
if (!(argc == 1 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle))) {
enif_get_resource(env, argv[0], emdb_RESOURCE, (void**)&args->handle))) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
@ -630,124 +629,22 @@ ASYNC_NIF_DECL(
enif_release_resource((void*)args->handle);
});
ASYNC_NIF_DECL(
lmdb_txn_begin,
{ // struct
struct lmdb *handle;
},
{ // pre
if (!(argc == 1 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle))) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
enif_keep_resource((void*)args->handle);
},
{ // work
ERL_NIF_TERM err;
int ret;
if(args->handle->txn == NULL) {
CHECK(mdb_txn_begin(args->handle->env, NULL, 0, &(args->handle->txn)), err2);
ASYNC_NIF_REPLY(ATOM_OK);
} else
ASYNC_NIF_REPLY(enif_make_tuple(env, 2, ATOM_ERROR, ATOM_TXN_STARTED));
return;
err2:
ASYNC_NIF_REPLY(err);
return;
},
{ // post
enif_release_resource((void*)args->handle);
});
ASYNC_NIF_DECL(
lmdb_txn_commit,
{ // struct
struct lmdb *handle;
},
{ // pre
if (!(argc == 1 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle))) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
enif_keep_resource((void*)args->handle);
},
{ // work
ERL_NIF_TERM err;
int ret;
if(args->handle->txn != NULL) {
CHECK(mdb_txn_commit(args->handle->txn), err2);
args->handle->txn = NULL;
ASYNC_NIF_REPLY(ATOM_OK);
} else
ASYNC_NIF_REPLY(enif_make_tuple(env, 2, ATOM_ERROR, ATOM_TXN_NOT_STARTED));
return;
err2:
ASYNC_NIF_REPLY(err);
return;
},
{ // post
enif_release_resource((void*)args->handle);
});
ASYNC_NIF_DECL(
lmdb_txn_abort,
{ // struct
struct lmdb *handle;
},
{ // pre
if (!(argc == 1 &&
enif_get_resource(env, argv[0], lmdb_RESOURCE, (void**)&args->handle))) {
ASYNC_NIF_RETURN_BADARG();
}
if (!args->handle->env)
ASYNC_NIF_RETURN_BADARG();
enif_keep_resource((void*)args->handle);
},
{ // work
if(args->handle->txn != NULL) {
mdb_txn_abort(args->handle->txn);
args->handle->txn = NULL;
ASYNC_NIF_REPLY(ATOM_OK);
} else
ASYNC_NIF_REPLY(enif_make_tuple(env, 2, ATOM_ERROR, ATOM_TXN_NOT_STARTED));
return;
},
{ // post
enif_release_resource((void*)args->handle);
});
static int lmdb_load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info)
static int emdb_load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info)
{
__UNUSED(load_info);
ErlNifResourceFlags flags = ERL_NIF_RT_CREATE | ERL_NIF_RT_TAKEOVER;
struct lmdb_priv_data *priv = enif_alloc(sizeof(struct lmdb_priv_data));
struct emdb_priv_data *priv = enif_alloc(sizeof(struct emdb_priv_data));
if (!priv)
return ENOMEM;
memset(priv, 0, sizeof(struct lmdb_priv_data));
memset(priv, 0, sizeof(struct emdb_priv_data));
/* Note: !!! the first element of our priv_data struct *must* be the
pointer to the async_nif's private data which we set here. */
ASYNC_NIF_LOAD(lmdb, env, priv->async_nif_priv);
ASYNC_NIF_LOAD(emdb, priv->async_nif_priv);
if (!priv)
return ENOMEM;
*priv_data = priv;
@ -773,15 +670,12 @@ static int lmdb_load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info)
ATOM_INCOMPATIBLE = enif_make_atom(env, "incompatible");
ATOM_BAD_RSLOT = enif_make_atom(env, "bad_rslot");
ATOM_TXN_STARTED = enif_make_atom(env, "txn_started");
ATOM_TXN_NOT_STARTED = enif_make_atom(env, "txn_not_started");
lmdb_RESOURCE = enif_open_resource_type(env, NULL, "lmdb_resource",
emdb_RESOURCE = enif_open_resource_type(env, NULL, "emdb_resource",
NULL, flags, NULL);
return (0);
}
static int lmdb_reload(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM info)
static int emdb_reload(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM info)
{
__UNUSED(env);
__UNUSED(priv_data);
@ -790,47 +684,39 @@ static int lmdb_reload(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM info)
}
static int lmdb_upgrade(ErlNifEnv* env, void** priv_data, void** old_priv, ERL_NIF_TERM load_info)
static int emdb_upgrade(ErlNifEnv* env, void** priv_data, void** old_priv, ERL_NIF_TERM load_info)
{
__UNUSED(env);
__UNUSED(priv_data);
__UNUSED(old_priv);
__UNUSED(load_info);
ASYNC_NIF_UPGRADE(lmdb, env);
ASYNC_NIF_UPGRADE(emdb, env);
return (0); // TODO:
}
static void lmdb_unload(ErlNifEnv* env, void* priv_data)
static void emdb_unload(ErlNifEnv* env, void* priv_data)
{
struct lmdb_priv_data *priv = (struct lmdb_priv_data *)priv_data;
ASYNC_NIF_UNLOAD(lmdb, env, priv->async_nif_priv);
struct emdb_priv_data *priv = (struct emdb_priv_data *)priv_data;
ASYNC_NIF_UNLOAD(emdb, env, priv->async_nif_priv);
enif_free(priv);
return;
}
static ErlNifFunc nif_funcs [] = {
{"open", 4, lmdb_open},
{"close", 2, lmdb_close},
{"put", 4, lmdb_put},
{"get", 3, lmdb_get},
{"del", 3, lmdb_del},
{"update", 4, lmdb_update},
{"drop", 2, lmdb_drop},
{"txn_begin", 2, lmdb_txn_begin},
{"txn_commit", 2, lmdb_txn_commit},
{"txn_abort", 2, lmdb_txn_abort}/*,
{"cursor_open", 2, lmdb_cursor_open},
{"cursor_close", 2, lmdb_cursor_close} */
{"open", 4, emdb_open},
{"close", 2, emdb_close},
{"put", 4, emdb_put},
{"get", 3, emdb_get},
{"del", 3, emdb_del},
{"update", 4, emdb_update},
{"drop", 2, emdb_drop}
};
/* driver entry point */
ERL_NIF_INIT(lmdb,
ERL_NIF_INIT(emdb,
nif_funcs,
& lmdb_load,
& lmdb_reload,
& lmdb_upgrade,
& lmdb_unload)
& emdb_load,
& emdb_reload,
& emdb_upgrade,
& emdb_unload)

93
c_src/fifo_q.h Normal file
View file

@ -0,0 +1,93 @@
/*
* fifo_q: a macro-based implementation of a FIFO Queue
*
* Copyright (c) 2012 Basho Technologies, Inc. All Rights Reserved.
* Author: Gregory Burd <greg@basho.com> <greg@burd.me>
*
* This file is provided to you under the Apache License,
* Version 2.0 (the "License"); you may not use this file
* except in compliance with the License. You may obtain
* a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
#ifndef __FIFO_Q_H__
#define __FIFO_Q_H__
#if defined(__cplusplus)
extern "C" {
#endif
#define FIFO_QUEUE_TYPE(name) \
struct fifo_q__ ## name *
#define DECL_FIFO_QUEUE(name, type) \
struct fifo_q__ ## name { \
unsigned int h, t, s; \
type *items[]; \
}; \
static struct fifo_q__ ## name *fifo_q_ ## name ## _new(unsigned int n) { \
int sz = sizeof(struct fifo_q__ ## name) + ((n+1) * sizeof(type *));\
struct fifo_q__ ## name *q = enif_alloc(sz); \
if (!q) \
return 0; \
memset(q, 0, sz); \
q->s = n + 1; \
return q; \
} \
static inline void fifo_q_ ## name ## _free(struct fifo_q__ ## name *q) { \
memset(q, 0, sizeof(struct fifo_q__ ## name) + (q->s * sizeof(type *))); \
enif_free(q); \
} \
static inline type *fifo_q_ ## name ## _put(struct fifo_q__ ## name *q, type *n) { \
q->items[q->h] = n; \
q->h = (q->h + 1) % q->s; \
return n; \
} \
static inline type *fifo_q_ ## name ## _get(struct fifo_q__ ## name *q) { \
type *n = q->items[q->t]; \
q->items[q->t] = 0; \
q->t = (q->t + 1) % q->s; \
return n; \
} \
static inline unsigned int fifo_q_ ## name ## _size(struct fifo_q__ ## name *q) { \
return (q->h - q->t + q->s) % q->s; \
} \
static inline unsigned int fifo_q_ ## name ## _capacity(struct fifo_q__ ## name *q) { \
return q->s - 1; \
} \
static inline int fifo_q_ ## name ## _empty(struct fifo_q__ ## name *q) { \
return (q->t == q->h); \
} \
static inline int fifo_q_ ## name ## _full(struct fifo_q__ ## name *q) { \
return ((q->h + 1) % q->s) == q->t; \
}
#define fifo_q_new(name, size) fifo_q_ ## name ## _new(size)
#define fifo_q_free(name, queue) fifo_q_ ## name ## _free(queue)
#define fifo_q_get(name, queue) fifo_q_ ## name ## _get(queue)
#define fifo_q_put(name, queue, item) fifo_q_ ## name ## _put(queue, item)
#define fifo_q_size(name, queue) fifo_q_ ## name ## _size(queue)
#define fifo_q_capacity(name, queue) fifo_q_ ## name ## _capacity(queue)
#define fifo_q_empty(name, queue) fifo_q_ ## name ## _empty(queue)
#define fifo_q_full(name, queue) fifo_q_ ## name ## _full(queue)
#define fifo_q_foreach(name, queue, item, task) do { \
while(!fifo_q_ ## name ## _empty(queue)) { \
item = fifo_q_ ## name ## _get(queue); \
do task while(0); \
} \
} while(0);
#if defined(__cplusplus)
}
#endif
#endif // __FIFO_Q_H__

File diff suppressed because it is too large Load diff

View file

@ -1,10 +1,10 @@
/** @file lmdb.h
* @brief Lightning memory-mapped database library
*
* @mainpage Lightning Memory-Mapped Database Manager (LMDB)
* @mainpage Lightning Memory-Mapped Database Manager (MDB)
*
* @section intro_sec Introduction
* LMDB is a Btree-based database management library modeled loosely on the
* MDB is a Btree-based database management library modeled loosely on the
* BerkeleyDB API, but much simplified. The entire database is exposed
* in a memory map, and all data fetches return data directly
* from the mapped memory, so no malloc's or memcpy's occur during
@ -26,10 +26,10 @@
* readers, and readers don't block writers.
*
* Unlike other well-known database mechanisms which use either write-ahead
* transaction logs or append-only data writes, LMDB requires no maintenance
* transaction logs or append-only data writes, MDB requires no maintenance
* during operation. Both write-ahead loggers and append-only databases
* require periodic checkpointing and/or compaction of their log or database
* files otherwise they grow without bound. LMDB tracks free pages within
* files otherwise they grow without bound. MDB tracks free pages within
* the database and re-uses them for new write operations, so the database
* size does not grow without bound in normal use.
*
@ -40,9 +40,6 @@
* corrupt the database. Of course if your application code is known to
* be bug-free (...) then this is not an issue.
*
* If this is your first time using a transactional embedded key/value
* store, you may find the \ref starting page to be helpful.
*
* @section caveats_sec Caveats
* Troubleshooting the lock file, plus semaphores on BSD systems:
*
@ -51,17 +48,10 @@
* cause further writes to grow the database quickly, and
* stale locks can block further operation.
*
* Fix: Check for stale readers periodically, using the
* #mdb_reader_check function or the \ref mdb_stat_1 "mdb_stat" tool.
* Stale writers will be cleared automatically on most systems:
* - Windows - automatic
* - BSD, systems using SysV semaphores - automatic
* - Linux, systems using POSIX mutexes with Robust option - automatic
* Otherwise just make all programs using the database close it;
* the lockfile is always reset on first open of the environment.
* Fix: Terminate all programs using the database, or make
* them close it. Next database user will reset the lockfile.
*
* - On BSD systems or others configured with MDB_USE_SYSV_SEM or
* MDB_USE_POSIX_SEM,
* - On BSD systems or others configured with MDB_USE_POSIX_SEM,
* startup can fail due to semaphores owned by another userid.
*
* Fix: Open and close the database as the user which owns the
@ -74,32 +64,13 @@
* BSD systems or when otherwise configured with MDB_USE_POSIX_SEM.
* Multiple users can cause startup to fail later, as noted above.
*
* - There is normally no pure read-only mode, since readers need write
* access to locks and lock file. Exceptions: On read-only filesystems
* or with the #MDB_NOLOCK flag described under #mdb_env_open().
*
* - An LMDB configuration will often reserve considerable \b unused
* memory address space and maybe file size for future growth.
* This does not use actual memory or disk space, but users may need
* to understand the difference so they won't be scared off.
*
* - By default, in versions before 0.9.10, unused portions of the data
* file might receive garbage data from memory freed by other code.
* (This does not happen when using the #MDB_WRITEMAP flag.) As of
* 0.9.10 the default behavior is to initialize such memory before
* writing to the data file. Since there may be a slight performance
* cost due to this initialization, applications may disable it using
* the #MDB_NOMEMINIT flag. Applications handling sensitive data
* which must not be written should not use this flag. This flag is
* irrelevant when using #MDB_WRITEMAP.
*
* - A thread can only use one transaction at a time, plus any child
* transactions. Each transaction belongs to one thread. See below.
* The #MDB_NOTLS flag changes this for read-only transactions.
*
* - Use an MDB_env* in the process which opened it, without fork()ing.
*
* - Do not have open an LMDB database twice in the same process at
* - Do not have open an MDB database twice in the same process at
* the same time. Not even from a plain open() call - close()ing it
* breaks flock() advisory locking.
*
@ -115,17 +86,13 @@
* ...when several processes can use a database concurrently:
*
* - Avoid aborting a process with an active transaction.
* The transaction becomes "long-lived" as above until a check
* for stale readers is performed or the lockfile is reset,
* since the process may not remove it from the lockfile.
* The transaction becomes "long-lived" as above until the lockfile
* is reset, since the process may not remove it from the lockfile.
*
* This does not apply to write transactions if the system clears
* stale writers, see above.
* - If you do that anyway, close the environment once in a while,
* so the lockfile can get reset.
*
* - If you do that anyway, do a periodic check for stale readers. Or
* close the environment once in a while, so the lockfile can get reset.
*
* - Do not use LMDB databases on remote filesystems, even between
* - Do not use MDB databases on remote filesystems, even between
* processes on the same host. This breaks flock() on some OSes,
* possibly memory map sync, and certainly sync between programs
* on different hosts.
@ -135,7 +102,7 @@
*
* @author Howard Chu, Symas Corporation.
*
* @copyright Copyright 2011-2016 Howard Chu, Symas Corp. All rights reserved.
* @copyright Copyright 2011-2013 Howard Chu, Symas Corp. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted only as authorized by the OpenLDAP
@ -166,37 +133,18 @@
#define _LMDB_H_
#include <sys/types.h>
#include <inttypes.h>
#ifdef __cplusplus
extern "C" {
#endif
/** Unix permissions for creating files, or dummy definition for Windows */
#ifdef _MSC_VER
typedef int mdb_mode_t;
#else
typedef mode_t mdb_mode_t;
#endif
#ifdef MDB_VL32
typedef uint64_t mdb_size_t;
#define mdb_env_create mdb_env_create_vl32 /**< Prevent mixing with non-VL32 builds */
#else
typedef size_t mdb_size_t;
#endif
/** An abstraction for a file handle.
* On POSIX systems file handles are small integers. On Windows
* they're opaque pointers.
*/
#ifdef _WIN32
typedef void *mdb_filehandle_t;
#else
typedef int mdb_filehandle_t;
#endif
/** @defgroup mdb LMDB API
/** @defgroup mdb MDB API
* @{
* @brief OpenLDAP Lightning Memory-Mapped Database Manager
*/
@ -208,7 +156,7 @@ typedef int mdb_filehandle_t;
/** Library minor version */
#define MDB_VERSION_MINOR 9
/** Library patch version */
#define MDB_VERSION_PATCH 70
#define MDB_VERSION_PATCH 6
/** Combine args a,b,c into a single integer for easy version comparisons */
#define MDB_VERINT(a,b,c) (((a) << 24) | ((b) << 16) | (c))
@ -218,10 +166,10 @@ typedef int mdb_filehandle_t;
MDB_VERINT(MDB_VERSION_MAJOR,MDB_VERSION_MINOR,MDB_VERSION_PATCH)
/** The release date of this library version */
#define MDB_VERSION_DATE "December 19, 2015"
#define MDB_VERSION_DATE "January 10, 2013"
/** A stringifier for the version info */
#define MDB_VERSTR(a,b,c,d) "LMDB " #a "." #b "." #c ": (" d ")"
#define MDB_VERSTR(a,b,c,d) "MDB " #a "." #b "." #c ": (" d ")"
/** A helper for the stringifier macro */
#define MDB_VERFOO(a,b,c,d) MDB_VERSTR(a,b,c,d)
@ -254,13 +202,13 @@ typedef struct MDB_cursor MDB_cursor;
/** @brief Generic structure used for passing keys and data in and out
* of the database.
*
* Values returned from the database are valid only until a subsequent
* update operation, or the end of the transaction. Do not modify or
* free them, they commonly point into the database itself.
*
* Key sizes must be between 1 and #mdb_env_get_maxkeysize() inclusive.
* The same applies to data sizes in databases with the #MDB_DUPSORT flag.
* Key sizes must be between 1 and the liblmdb build-time constant
* #MDB_MAXKEYSIZE inclusive. This currently defaults to 511. The
* same applies to data sizes in databases with the #MDB_DUPSORT flag.
* Other data items can in theory be from 0 to 0xffffffff bytes long.
*
* Values returned from the database are valid only until a subsequent
* update operation, or the end of the transaction.
*/
typedef struct MDB_val {
size_t mv_size; /**< size of the data item */
@ -287,12 +235,14 @@ typedef int (MDB_cmp_func)(const MDB_val *a, const MDB_val *b);
typedef void (MDB_rel_func)(MDB_val *item, void *oldptr, void *newptr, void *relctx);
/** @defgroup mdb_env Environment Flags
*
* Values do not overlap Database Flags.
* @{
*/
/** mmap at a fixed address (experimental) */
#define MDB_FIXEDMAP 0x01
#define MDB_FIXEDMAP 0x01
/** no environment directory */
#define MDB_NOSUBDIR 0x4000
#define MDB_NOSUBDIR 0x4000
/** don't fsync after commit */
#define MDB_NOSYNC 0x10000
/** read only */
@ -301,31 +251,27 @@ typedef void (MDB_rel_func)(MDB_val *item, void *oldptr, void *newptr, void *rel
#define MDB_NOMETASYNC 0x40000
/** use writable mmap */
#define MDB_WRITEMAP 0x80000
/** use asynchronous msync when #MDB_WRITEMAP is used */
/** use asynchronous msync when MDB_WRITEMAP is used */
#define MDB_MAPASYNC 0x100000
/** tie reader locktable slots to #MDB_txn objects instead of to threads */
#define MDB_NOTLS 0x200000
/** don't do any locking, caller must manage their own locks */
#define MDB_NOLOCK 0x400000
/** don't do readahead (no effect on Windows) */
#define MDB_NORDAHEAD 0x800000
/** don't initialize malloc'd memory before writing to datafile */
#define MDB_NOMEMINIT 0x1000000
/** @} */
/** @defgroup mdb_dbi_open Database Flags
*
* Values do not overlap Environment Flags.
* @{
*/
/** use reverse string keys */
#define MDB_REVERSEKEY 0x02
/** use sorted duplicates */
#define MDB_DUPSORT 0x04
/** numeric keys in native byte order: either unsigned int or size_t.
/** numeric keys in native byte order.
* The keys must all be of the same size. */
#define MDB_INTEGERKEY 0x08
/** with #MDB_DUPSORT, sorted dup items have fixed size */
#define MDB_DUPFIXED 0x10
/** with #MDB_DUPSORT, dups are #MDB_INTEGERKEY-style integers */
/** with #MDB_DUPSORT, dups are numeric in native byte order */
#define MDB_INTEGERDUP 0x20
/** with #MDB_DUPSORT, use reverse string dups */
#define MDB_REVERSEDUP 0x40
@ -353,19 +299,10 @@ typedef void (MDB_rel_func)(MDB_val *item, void *oldptr, void *newptr, void *rel
#define MDB_APPEND 0x20000
/** Duplicate data is being appended, don't split full pages. */
#define MDB_APPENDDUP 0x40000
/** Store multiple data items in one call. Only for #MDB_DUPFIXED. */
/** Store multiple data items in one call. */
#define MDB_MULTIPLE 0x80000
/* @} */
/** @defgroup mdb_copy Copy Flags
* @{
*/
/** Compacting copy: Omit free space from copy, and renumber all
* pages sequentially.
*/
#define MDB_CP_COMPACT 0x01
/* @} */
/** @brief Cursor Get operations.
*
* This is the set of all operations for retrieving data
@ -378,28 +315,26 @@ typedef enum MDB_cursor_op {
MDB_GET_BOTH, /**< Position at key/data pair. Only for #MDB_DUPSORT */
MDB_GET_BOTH_RANGE, /**< position at key, nearest data. Only for #MDB_DUPSORT */
MDB_GET_CURRENT, /**< Return key/data at current cursor position */
MDB_GET_MULTIPLE, /**< Return key and up to a page of duplicate data items
from current cursor position. Move cursor to prepare
for #MDB_NEXT_MULTIPLE. Only for #MDB_DUPFIXED */
MDB_GET_MULTIPLE, /**< Return all the duplicate data items at the current
cursor position. Only for #MDB_DUPFIXED */
MDB_LAST, /**< Position at last key/data item */
MDB_LAST_DUP, /**< Position at last data item of current key.
Only for #MDB_DUPSORT */
MDB_NEXT, /**< Position at next data item */
MDB_NEXT_DUP, /**< Position at next data item of current key.
Only for #MDB_DUPSORT */
MDB_NEXT_MULTIPLE, /**< Return key and up to a page of duplicate data items
from next cursor position. Move cursor to prepare
for #MDB_NEXT_MULTIPLE. Only for #MDB_DUPFIXED */
MDB_NEXT_NODUP, /**< Position at first data item of next key */
MDB_NEXT_MULTIPLE, /**< Return all duplicate data items at the next
cursor position. Only for #MDB_DUPFIXED */
MDB_NEXT_NODUP, /**< Position at first data item of next key.
Only for #MDB_DUPSORT */
MDB_PREV, /**< Position at previous data item */
MDB_PREV_DUP, /**< Position at previous data item of current key.
Only for #MDB_DUPSORT */
MDB_PREV_NODUP, /**< Position at last data item of previous key */
MDB_PREV_NODUP, /**< Position at last data item of previous key.
Only for #MDB_DUPSORT */
MDB_SET, /**< Position at specified key */
MDB_SET_KEY, /**< Position at specified key, return key + data */
MDB_SET_RANGE, /**< Position at first key greater than or equal to specified key. */
MDB_PREV_MULTIPLE /**< Position at previous page and return key and up to
a page of duplicate data items. Only for #MDB_DUPFIXED */
MDB_SET_RANGE /**< Position at first key greater than or equal to specified key. */
} MDB_cursor_op;
/** @defgroup errors Return Codes
@ -417,11 +352,11 @@ typedef enum MDB_cursor_op {
#define MDB_PAGE_NOTFOUND (-30797)
/** Located page was wrong type */
#define MDB_CORRUPTED (-30796)
/** Update of meta page failed or environment had fatal error */
/** Update of meta page failed, probably I/O error */
#define MDB_PANIC (-30795)
/** Environment version mismatch */
#define MDB_VERSION_MISMATCH (-30794)
/** File is not a valid LMDB file */
/** File is not a valid MDB file */
#define MDB_INVALID (-30793)
/** Environment mapsize reached */
#define MDB_MAP_FULL (-30792)
@ -439,25 +374,11 @@ typedef enum MDB_cursor_op {
#define MDB_PAGE_FULL (-30786)
/** Database contents grew beyond environment mapsize */
#define MDB_MAP_RESIZED (-30785)
/** Operation and DB incompatible, or DB type changed. This can mean:
* <ul>
* <li>The operation expects an #MDB_DUPSORT / #MDB_DUPFIXED database.
* <li>Opening a named DB when the unnamed DB has #MDB_DUPSORT / #MDB_INTEGERKEY.
* <li>Accessing a data record as a database, or vice versa.
* <li>The database was dropped and recreated with different flags.
* </ul>
*/
/** Database flags changed or would change */
#define MDB_INCOMPATIBLE (-30784)
/** Invalid reuse of reader locktable slot */
#define MDB_BAD_RSLOT (-30783)
/** Transaction must abort, has a child, or is invalid */
#define MDB_BAD_TXN (-30782)
/** Unsupported size of key/DB name/data, or wrong DUPFIXED size */
#define MDB_BAD_VALSIZE (-30781)
/** The specified DBI was changed unexpectedly */
#define MDB_BAD_DBI (-30780)
/** The last defined error code */
#define MDB_LAST_ERRCODE MDB_BAD_DBI
#define MDB_LAST_ERRCODE MDB_BAD_RSLOT
/** @} */
/** @brief Statistics for a database in the environment */
@ -465,23 +386,23 @@ typedef struct MDB_stat {
unsigned int ms_psize; /**< Size of a database page.
This is currently the same for all databases. */
unsigned int ms_depth; /**< Depth (height) of the B-tree */
mdb_size_t ms_branch_pages; /**< Number of internal (non-leaf) pages */
mdb_size_t ms_leaf_pages; /**< Number of leaf pages */
mdb_size_t ms_overflow_pages; /**< Number of overflow pages */
mdb_size_t ms_entries; /**< Number of data items */
size_t ms_branch_pages; /**< Number of internal (non-leaf) pages */
size_t ms_leaf_pages; /**< Number of leaf pages */
size_t ms_overflow_pages; /**< Number of overflow pages */
size_t ms_entries; /**< Number of data items */
} MDB_stat;
/** @brief Information about the environment */
typedef struct MDB_envinfo {
void *me_mapaddr; /**< Address of map, if fixed */
mdb_size_t me_mapsize; /**< Size of the data memory map */
mdb_size_t me_last_pgno; /**< ID of the last used page */
mdb_size_t me_last_txnid; /**< ID of the last committed transaction */
size_t me_mapsize; /**< Size of the data memory map */
size_t me_last_pgno; /**< ID of the last used page */
size_t me_last_txnid; /**< ID of the last committed transaction */
unsigned int me_maxreaders; /**< max reader slots in the environment */
unsigned int me_numreaders; /**< max reader slots used in the environment */
} MDB_envinfo;
/** @brief Return the LMDB library version information.
/** @brief Return the mdb library version information.
*
* @param[out] major if non-NULL, the library major version number is copied here
* @param[out] minor if non-NULL, the library minor version number is copied here
@ -495,14 +416,14 @@ char *mdb_version(int *major, int *minor, int *patch);
* This function is a superset of the ANSI C X3.159-1989 (ANSI C) strerror(3)
* function. If the error code is greater than or equal to 0, then the string
* returned by the system function strerror(3) is returned. If the error code
* is less than 0, an error string corresponding to the LMDB library error is
* returned. See @ref errors for a list of LMDB-specific error codes.
* is less than 0, an error string corresponding to the MDB library error is
* returned. See @ref errors for a list of MDB-specific error codes.
* @param[in] err The error code
* @retval "error message" The description of the error
*/
char *mdb_strerror(int err);
/** @brief Create an LMDB environment handle.
/** @brief Create an MDB environment handle.
*
* This function allocates memory for a #MDB_env structure. To release
* the allocated memory and discard the handle, call #mdb_env_close().
@ -535,24 +456,20 @@ int mdb_env_create(MDB_env **env);
* how the operating system has allocated memory to shared libraries and other uses.
* The feature is highly experimental.
* <li>#MDB_NOSUBDIR
* By default, LMDB creates its environment in a directory whose
* By default, MDB creates its environment in a directory whose
* pathname is given in \b path, and creates its data and lock files
* under that directory. With this option, \b path is used as-is for
* the database main data file. The database lock file is the \b path
* with "-lock" appended.
* <li>#MDB_RDONLY
* Open the environment in read-only mode. No write operations will be
* allowed. LMDB will still modify the lock file - except on read-only
* filesystems, where LMDB does not use locks.
* allowed. MDB will still modify the lock file - except on read-only
* filesystems, where MDB does not use locks.
* <li>#MDB_WRITEMAP
* Use a writeable memory map unless MDB_RDONLY is set. This uses
* fewer mallocs but loses protection from application bugs
* Use a writeable memory map unless MDB_RDONLY is set. This is faster
* and uses fewer mallocs, but loses protection from application bugs
* like wild pointer writes and other bad updates into the database.
* This may be slightly faster for DBs that fit entirely in RAM, but
* is slower for DBs larger than RAM.
* Incompatible with nested transactions.
* Do not mix processes with and without MDB_WRITEMAP on the same
* environment. This can defeat durability (#mdb_env_sync etc).
* <li>#MDB_NOMETASYNC
* Flush system buffers to disk only once per transaction, omit the
* metadata flush. Defer that until the system flushes files to disk,
@ -589,46 +506,14 @@ int mdb_env_create(MDB_env **env);
* the user synchronizes its use. Applications that multiplex many
* user threads over individual OS threads need this option. Such an
* application must also serialize the write transactions in an OS
* thread, since LMDB's write locking is unaware of the user threads.
* <li>#MDB_NOLOCK
* Don't do any locking. If concurrent access is anticipated, the
* caller must manage all concurrency itself. For proper operation
* the caller must enforce single-writer semantics, and must ensure
* that no readers are using old transactions while a writer is
* active. The simplest approach is to use an exclusive lock so that
* no readers may be active at all when a writer begins.
* <li>#MDB_NORDAHEAD
* Turn off readahead. Most operating systems perform readahead on
* read requests by default. This option turns it off if the OS
* supports it. Turning it off may help random read performance
* when the DB is larger than RAM and system RAM is full.
* The option is not implemented on Windows.
* <li>#MDB_NOMEMINIT
* Don't initialize malloc'd memory before writing to unused spaces
* in the data file. By default, memory for pages written to the data
* file is obtained using malloc. While these pages may be reused in
* subsequent transactions, freshly malloc'd pages will be initialized
* to zeroes before use. This avoids persisting leftover data from other
* code (that used the heap and subsequently freed the memory) into the
* data file. Note that many other system libraries may allocate
* and free memory from the heap for arbitrary uses. E.g., stdio may
* use the heap for file I/O buffers. This initialization step has a
* modest performance cost so some applications may want to disable
* it using this flag. This option can be a problem for applications
* which handle sensitive data like passwords, and it makes memory
* checkers like Valgrind noisy. This flag is not needed with #MDB_WRITEMAP,
* which writes directly to the mmap instead of using malloc for pages. The
* initialization is also skipped if #MDB_RESERVE is used; the
* caller is expected to overwrite all of the memory that was
* reserved in that case.
* This flag may be changed at any time using #mdb_env_set_flags().
* thread, since MDB's write locking is unaware of the user threads.
* </ul>
* @param[in] mode The UNIX permissions to set on created files and semaphores.
* This parameter is ignored on Windows.
* @param[in] mode The UNIX permissions to set on created files. This parameter
* is ignored on Windows.
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>#MDB_VERSION_MISMATCH - the version of the LMDB library doesn't match the
* <li>#MDB_VERSION_MISMATCH - the version of the MDB library doesn't match the
* version that created the database environment.
* <li>#MDB_INVALID - the environment file headers are corrupted.
* <li>ENOENT - the directory specified by the path parameter doesn't exist.
@ -638,13 +523,9 @@ int mdb_env_create(MDB_env **env);
*/
int mdb_env_open(MDB_env *env, const char *path, unsigned int flags, mdb_mode_t mode);
/** @brief Copy an LMDB environment to the specified path.
/** @brief Copy an MDB environment to the specified path.
*
* This function may be used to make a backup of an existing environment.
* No lockfile is created, since it gets recreated at need.
* @note This call can trigger significant file size growth if run in
* parallel with write transactions, because it employs a read-only
* transaction. See long-lived transactions under @ref caveats_sec.
* @param[in] env An environment handle returned by #mdb_env_create(). It
* must have already been opened successfully.
* @param[in] path The directory in which the copy will reside. This
@ -654,65 +535,7 @@ int mdb_env_open(MDB_env *env, const char *path, unsigned int flags, mdb_mode_t
*/
int mdb_env_copy(MDB_env *env, const char *path);
/** @brief Copy an LMDB environment to the specified file descriptor.
*
* This function may be used to make a backup of an existing environment.
* No lockfile is created, since it gets recreated at need.
* @note This call can trigger significant file size growth if run in
* parallel with write transactions, because it employs a read-only
* transaction. See long-lived transactions under @ref caveats_sec.
* @param[in] env An environment handle returned by #mdb_env_create(). It
* must have already been opened successfully.
* @param[in] fd The filedescriptor to write the copy to. It must
* have already been opened for Write access.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_env_copyfd(MDB_env *env, mdb_filehandle_t fd);
/** @brief Copy an LMDB environment to the specified path, with options.
*
* This function may be used to make a backup of an existing environment.
* No lockfile is created, since it gets recreated at need.
* @note This call can trigger significant file size growth if run in
* parallel with write transactions, because it employs a read-only
* transaction. See long-lived transactions under @ref caveats_sec.
* @param[in] env An environment handle returned by #mdb_env_create(). It
* must have already been opened successfully.
* @param[in] path The directory in which the copy will reside. This
* directory must already exist and be writable but must otherwise be
* empty.
* @param[in] flags Special options for this operation. This parameter
* must be set to 0 or by bitwise OR'ing together one or more of the
* values described here.
* <ul>
* <li>#MDB_CP_COMPACT - Perform compaction while copying: omit free
* pages and sequentially renumber all pages in output. This option
* consumes more CPU and runs more slowly than the default.
* </ul>
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_env_copy2(MDB_env *env, const char *path, unsigned int flags);
/** @brief Copy an LMDB environment to the specified file descriptor,
* with options.
*
* This function may be used to make a backup of an existing environment.
* No lockfile is created, since it gets recreated at need. See
* #mdb_env_copy2() for further details.
* @note This call can trigger significant file size growth if run in
* parallel with write transactions, because it employs a read-only
* transaction. See long-lived transactions under @ref caveats_sec.
* @param[in] env An environment handle returned by #mdb_env_create(). It
* must have already been opened successfully.
* @param[in] fd The filedescriptor to write the copy to. It must
* have already been opened for Write access.
* @param[in] flags Special options for this operation.
* See #mdb_env_copy2() for options.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_env_copyfd2(MDB_env *env, mdb_filehandle_t fd, unsigned int flags);
/** @brief Return statistics about the LMDB environment.
/** @brief Return statistics about the MDB environment.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[out] stat The address of an #MDB_stat structure
@ -720,7 +543,7 @@ int mdb_env_copyfd2(MDB_env *env, mdb_filehandle_t fd, unsigned int flags);
*/
int mdb_env_stat(MDB_env *env, MDB_stat *stat);
/** @brief Return information about the LMDB environment.
/** @brief Return information about the MDB environment.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[out] stat The address of an #MDB_envinfo structure
@ -731,10 +554,9 @@ int mdb_env_info(MDB_env *env, MDB_envinfo *stat);
/** @brief Flush the data buffers to disk.
*
* Data is always written to disk when #mdb_txn_commit() is called,
* but the operating system may keep it buffered. LMDB always flushes
* but the operating system may keep it buffered. MDB always flushes
* the OS buffers upon commit as well, unless the environment was
* opened with #MDB_NOSYNC or in part #MDB_NOMETASYNC. This call is
* not valid if the environment was opened with #MDB_RDONLY.
* opened with #MDB_NOSYNC or in part #MDB_NOMETASYNC.
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] force If non-zero, force a synchronous flush. Otherwise
* if the environment has the #MDB_NOSYNC flag set the flushes
@ -742,7 +564,6 @@ int mdb_env_info(MDB_env *env, MDB_envinfo *stat);
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>EACCES - the environment is read-only.
* <li>EINVAL - an invalid parameter was specified.
* <li>EIO - an error occurred during synchronization.
* </ul>
@ -762,8 +583,7 @@ void mdb_env_close(MDB_env *env);
/** @brief Set environment flags.
*
* This may be used to set some flags in addition to those from
* #mdb_env_open(), or to unset these flags. If several threads
* change the flags at the same time, the result is undefined.
* #mdb_env_open(), or to unset these flags.
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] flags The flags to change, bitwise OR'ed together
* @param[in] onoff A non-zero value sets the flags, zero clears them.
@ -801,39 +621,14 @@ int mdb_env_get_flags(MDB_env *env, unsigned int *flags);
*/
int mdb_env_get_path(MDB_env *env, const char **path);
/** @brief Return the filedescriptor for the given environment.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[out] fd Address of a mdb_filehandle_t to contain the descriptor.
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>EINVAL - an invalid parameter was specified.
* </ul>
*/
int mdb_env_get_fd(MDB_env *env, mdb_filehandle_t *fd);
/** @brief Set the size of the memory map to use for this environment.
*
* The size should be a multiple of the OS page size. The default is
* 10485760 bytes. The size of the memory map is also the maximum size
* of the database. The value should be chosen as large as possible,
* to accommodate future growth of the database.
* This function should be called after #mdb_env_create() and before #mdb_env_open().
* It may be called at later times if no transactions are active in
* this process. Note that the library does not check for this condition,
* the caller must ensure it explicitly.
*
* The new size takes effect immediately for the current process but
* will not be persisted to any others until a write transaction has been
* committed by the current process. Also, only mapsize increases are
* persisted into the environment.
*
* If the mapsize is increased by another process, and data has grown
* beyond the range of the current mapsize, #mdb_txn_begin() will
* return #MDB_MAP_RESIZED. This function may be called with a size
* of zero to adopt the new size.
*
* This function may only be called after #mdb_env_create() and before #mdb_env_open().
* The size may be changed by closing and reopening the environment.
* Any attempt to set a size smaller than the space already consumed
* by the environment will be silently changed to the current size of the used space.
* @param[in] env An environment handle returned by #mdb_env_create()
@ -841,11 +636,10 @@ int mdb_env_get_fd(MDB_env *env, mdb_filehandle_t *fd);
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>EINVAL - an invalid parameter was specified, or the environment has
* an active write transaction.
* <li>EINVAL - an invalid parameter was specified, or the environment is already open.
* </ul>
*/
int mdb_env_set_mapsize(MDB_env *env, mdb_size_t size);
int mdb_env_set_mapsize(MDB_env *env, size_t size);
/** @brief Set the maximum number of threads/reader slots for the environment.
*
@ -884,10 +678,6 @@ int mdb_env_get_maxreaders(MDB_env *env, unsigned int *readers);
* environment. Simpler applications that use the environment as a single
* unnamed database can ignore this option.
* This function may only be called after #mdb_env_create() and before #mdb_env_open().
*
* Currently a moderate number of slots are cheap but a huge number gets
* expensive: 7-120 words per transaction, and every #mdb_dbi_open()
* does a linear search of the opened slots.
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] dbs The maximum number of databases
* @return A non-zero error value on failure and 0 on success. Some possible
@ -898,47 +688,6 @@ int mdb_env_get_maxreaders(MDB_env *env, unsigned int *readers);
*/
int mdb_env_set_maxdbs(MDB_env *env, MDB_dbi dbs);
/** @brief Get the maximum size of keys and #MDB_DUPSORT data we can write.
*
* Depends on the compile-time constant #MDB_MAXKEYSIZE. Default 511.
* See @ref MDB_val.
* @param[in] env An environment handle returned by #mdb_env_create()
* @return The maximum size of a key we can write
*/
int mdb_env_get_maxkeysize(MDB_env *env);
/** @brief Set application information associated with the #MDB_env.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] ctx An arbitrary pointer for whatever the application needs.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_env_set_userctx(MDB_env *env, void *ctx);
/** @brief Get the application information associated with the #MDB_env.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @return The pointer set by #mdb_env_set_userctx().
*/
void *mdb_env_get_userctx(MDB_env *env);
/** @brief A callback function for most LMDB assert() failures,
* called before printing the message and aborting.
*
* @param[in] env An environment handle returned by #mdb_env_create().
* @param[in] msg The assertion message, not including newline.
*/
typedef void MDB_assert_func(MDB_env *env, const char *msg);
/** Set or reset the assert() callback of the environment.
* Disabled if liblmdb is buillt with NDEBUG.
* @note This hack should become obsolete as lmdb's error handling matures.
* @param[in] env An environment handle returned by #mdb_env_create().
* @param[in] func An #MDB_assert_func function, or 0.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_env_set_assert(MDB_env *env, MDB_assert_func *func);
/** @brief Create a transaction for use with the environment.
*
* The transaction handle may be discarded using #mdb_txn_abort() or #mdb_txn_commit().
@ -950,18 +699,14 @@ int mdb_env_set_assert(MDB_env *env, MDB_assert_func *func);
* @param[in] parent If this parameter is non-NULL, the new transaction
* will be a nested transaction, with the transaction indicated by \b parent
* as its parent. Transactions may be nested to any level. A parent
* transaction and its cursors may not issue any other operations than
* mdb_txn_commit and mdb_txn_abort while it has active child transactions.
* transaction may not issue any other operations besides mdb_txn_begin,
* mdb_txn_abort, or mdb_txn_commit while it has active child transactions.
* @param[in] flags Special options for this transaction. This parameter
* must be set to 0 or by bitwise OR'ing together one or more of the
* values described here.
* <ul>
* <li>#MDB_RDONLY
* This transaction will not perform any write operations.
* <li>#MDB_NOSYNC
* Don't flush system buffers to disk when committing this transaction.
* <li>#MDB_NOMETASYNC
* Flush system buffers but omit metadata flush when committing this transaction.
* </ul>
* @param[out] txn Address where the new #MDB_txn handle will be stored
* @return A non-zero error value on failure and 0 on success. Some possible
@ -970,8 +715,7 @@ int mdb_env_set_assert(MDB_env *env, MDB_assert_func *func);
* <li>#MDB_PANIC - a fatal error occurred earlier and the environment
* must be shut down.
* <li>#MDB_MAP_RESIZED - another process wrote data beyond this MDB_env's
* mapsize and this environment's map must be resized as well.
* See #mdb_env_set_mapsize().
* mapsize and the environment must be shut down.
* <li>#MDB_READERS_FULL - a read-only transaction was requested and
* the reader lock table is full. See #mdb_env_set_maxreaders().
* <li>ENOMEM - out of memory.
@ -979,23 +723,6 @@ int mdb_env_set_assert(MDB_env *env, MDB_assert_func *func);
*/
int mdb_txn_begin(MDB_env *env, MDB_txn *parent, unsigned int flags, MDB_txn **txn);
/** @brief Returns the transaction's #MDB_env
*
* @param[in] txn A transaction handle returned by #mdb_txn_begin()
*/
MDB_env *mdb_txn_env(MDB_txn *txn);
/** @brief Return the transaction's ID.
*
* This returns the identifier associated with this transaction. For a
* read-only transaction, this corresponds to the snapshot being read;
* concurrent readers will frequently have the same transaction ID.
*
* @param[in] txn A transaction handle returned by #mdb_txn_begin()
* @return A transaction ID, valid if input is an active transaction.
*/
mdb_size_t mdb_txn_id(MDB_txn *txn);
/** @brief Commit all the operations of a transaction into the database.
*
* The transaction handle is freed. It and its cursors must not be used
@ -1070,23 +797,19 @@ int mdb_txn_renew(MDB_txn *txn);
* independently of whether such a database exists.
* The database handle may be discarded by calling #mdb_dbi_close().
* The old database handle is returned if the database was already open.
* The handle may only be closed once.
*
* The handle must only be closed once.
* The database handle will be private to the current transaction until
* the transaction is successfully committed. If the transaction is
* aborted the handle will be closed automatically.
* After a successful commit the handle will reside in the shared
* environment, and may be used by other transactions.
*
* This function must not be called from multiple concurrent
* transactions in the same process. A transaction that uses
* this function must finish (either commit or abort) before
* any other transaction in the process may use this function.
* After a successful commit the
* handle will reside in the shared environment, and may be used
* by other transactions. This function must not be called from
* multiple concurrent transactions. A transaction that uses this function
* must finish (either commit or abort) before any other transaction may
* use this function.
*
* To use named databases (with name != NULL), #mdb_env_set_maxdbs()
* must be called before opening the environment. Database names are
* keys in the unnamed database, and may be read but not written.
*
* must be called before opening the environment.
* @param[in] txn A transaction handle returned by #mdb_txn_begin()
* @param[in] name The name of the database to open. If only a single
* database is needed in the environment, this value may be NULL.
@ -1103,9 +826,9 @@ int mdb_txn_renew(MDB_txn *txn);
* keys may have multiple data items, stored in sorted order.) By default
* keys must be unique and may have only a single data item.
* <li>#MDB_INTEGERKEY
* Keys are binary integers in native byte order, either unsigned int
* or size_t, and will be sorted as such.
* The keys must all be of the same size.
* Keys are binary integers in native byte order. Setting this option
* requires all keys to be the same size, typically sizeof(int)
* or sizeof(size_t).
* <li>#MDB_DUPFIXED
* This flag may only be used in combination with #MDB_DUPSORT. This option
* tells the library that the data items for this database are all the same
@ -1113,8 +836,8 @@ int mdb_txn_renew(MDB_txn *txn);
* all data items are the same size, the #MDB_GET_MULTIPLE and #MDB_NEXT_MULTIPLE
* cursor operations may be used to retrieve multiple items at once.
* <li>#MDB_INTEGERDUP
* This option specifies that duplicate data items are binary integers,
* similar to #MDB_INTEGERKEY keys.
* This option specifies that duplicate data items are also integers, and
* should be sorted as such.
* <li>#MDB_REVERSEDUP
* This option specifies that duplicate data items should be compared as
* strings in reverse order.
@ -1147,40 +870,25 @@ int mdb_dbi_open(MDB_txn *txn, const char *name, unsigned int flags, MDB_dbi *d
*/
int mdb_stat(MDB_txn *txn, MDB_dbi dbi, MDB_stat *stat);
/** @brief Retrieve the DB flags for a database handle.
*
* @param[in] txn A transaction handle returned by #mdb_txn_begin()
* @param[in] dbi A database handle returned by #mdb_dbi_open()
* @param[out] flags Address where the flags will be returned.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_dbi_flags(MDB_txn *txn, MDB_dbi dbi, unsigned int *flags);
/** @brief Close a database handle. Normally unnecessary. Use with care:
/** @brief Close a database handle.
*
* This call is not mutex protected. Handles should only be closed by
* a single thread, and only if no other threads are going to reference
* the database handle or one of its cursors any further. Do not close
* a handle if an existing transaction has modified its database.
* Doing so can cause misbehavior from database corruption to errors
* like MDB_BAD_VALSIZE (since the DB name is gone).
*
* Closing a database handle is not necessary, but lets #mdb_dbi_open()
* reuse the handle value. Usually it's better to set a bigger
* #mdb_env_set_maxdbs(), unless that value would be large.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] dbi A database handle returned by #mdb_dbi_open()
*/
void mdb_dbi_close(MDB_env *env, MDB_dbi dbi);
/** @brief Empty or delete+close a database.
/** @brief Delete a database and/or free all its pages.
*
* See #mdb_dbi_close() for restrictions about closing the DB handle.
* If the \b del parameter is 1, the DB handle will be closed
* and the DB will be deleted.
* @param[in] txn A transaction handle returned by #mdb_txn_begin()
* @param[in] dbi A database handle returned by #mdb_dbi_open()
* @param[in] del 0 to empty the DB, 1 to delete it from the
* environment and close the DB handle.
* @param[in] del 1 to delete the DB from the environment,
* 0 to just free its pages.
* @return A non-zero error value on failure and 0 on success.
*/
int mdb_drop(MDB_txn *txn, MDB_dbi dbi, int del);
@ -1322,13 +1030,11 @@ int mdb_get(MDB_txn *txn, MDB_dbi dbi, MDB_val *key, MDB_val *data);
* reserved space, which the caller can fill in later - before
* the next update operation or the transaction ends. This saves
* an extra memcpy if the data is being generated later.
* LMDB does nothing else with this memory, the caller is expected
* to modify all of the space requested. This flag must not be
* specified if the database was opened with #MDB_DUPSORT.
* <li>#MDB_APPEND - append the given key/data pair to the end of the
* database. This option allows fast bulk loading when keys are
* already known to be in the correct order. Loading unsorted keys
* with this flag will cause a #MDB_KEYEXIST error.
* database. No key comparisons are performed. This option allows
* fast bulk loading when keys are already known to be in the
* correct order. Loading unsorted keys with this flag will cause
* data corruption.
* <li>#MDB_APPENDDUP - as above, but for sorted dup data.
* </ul>
* @return A non-zero error value on failure and 0 on success. Some possible
@ -1454,21 +1160,18 @@ int mdb_cursor_get(MDB_cursor *cursor, MDB_val *key, MDB_val *data,
/** @brief Store by cursor.
*
* This function stores key/data pairs into the database.
* The cursor is positioned at the new item, or on failure usually near it.
* @note Earlier documentation incorrectly said errors would leave the
* state of the cursor unchanged.
* If the function fails for any reason, the state of the cursor will be
* unchanged. If the function succeeds and an item is inserted into the
* database, the cursor is always positioned to refer to the newly inserted item.
* @param[in] cursor A cursor handle returned by #mdb_cursor_open()
* @param[in] key The key operated on.
* @param[in] data The data operated on.
* @param[in] flags Options for this operation. This parameter
* must be set to 0 or one of the values described here.
* <ul>
* <li>#MDB_CURRENT - replace the item at the current cursor position.
* The \b key parameter must still be provided, and must match it.
* If using sorted duplicates (#MDB_DUPSORT) the data item must still
* sort into the same place. This is intended to be used when the
* new data is the same size as the old. Otherwise it will simply
* perform a delete of the old record followed by an insert.
* <li>#MDB_CURRENT - overwrite the data of the key/data pair to which
* the cursor refers with the specified data item. The \b key
* parameter is ignored.
* <li>#MDB_NODUPDATA - enter the new key/data pair only if it does not
* already appear in the database. This flag may only be specified
* if the database was opened with #MDB_DUPSORT. The function will
@ -1480,33 +1183,21 @@ int mdb_cursor_get(MDB_cursor *cursor, MDB_val *key, MDB_val *data,
* the database supports duplicates (#MDB_DUPSORT).
* <li>#MDB_RESERVE - reserve space for data of the given size, but
* don't copy the given data. Instead, return a pointer to the
* reserved space, which the caller can fill in later - before
* the next update operation or the transaction ends. This saves
* an extra memcpy if the data is being generated later. This flag
* must not be specified if the database was opened with #MDB_DUPSORT.
* reserved space, which the caller can fill in later. This saves
* an extra memcpy if the data is being generated later.
* <li>#MDB_APPEND - append the given key/data pair to the end of the
* database. No key comparisons are performed. This option allows
* fast bulk loading when keys are already known to be in the
* correct order. Loading unsorted keys with this flag will cause
* a #MDB_KEYEXIST error.
* data corruption.
* <li>#MDB_APPENDDUP - as above, but for sorted dup data.
* <li>#MDB_MULTIPLE - store multiple contiguous data elements in a
* single request. This flag may only be specified if the database
* was opened with #MDB_DUPFIXED. The \b data argument must be an
* array of two MDB_vals. The mv_size of the first MDB_val must be
* the size of a single data element. The mv_data of the first MDB_val
* must point to the beginning of the array of contiguous data elements.
* The mv_size of the second MDB_val must be the count of the number
* of data elements to store. On return this field will be set to
* the count of the number of elements actually written. The mv_data
* of the second MDB_val is unused.
* </ul>
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>#MDB_MAP_FULL - the database is full, see #mdb_env_set_mapsize().
* <li>#MDB_TXN_FULL - the transaction has too many dirty pages.
* <li>EACCES - an attempt was made to write in a read-only transaction.
* <li>EACCES - an attempt was made to modify a read-only database.
* <li>EINVAL - an invalid parameter was specified.
* </ul>
*/
@ -1526,7 +1217,7 @@ int mdb_cursor_put(MDB_cursor *cursor, MDB_val *key, MDB_val *data,
* @return A non-zero error value on failure and 0 on success. Some possible
* errors are:
* <ul>
* <li>EACCES - an attempt was made to write in a read-only transaction.
* <li>EACCES - an attempt was made to modify a read-only database.
* <li>EINVAL - an invalid parameter was specified.
* </ul>
*/
@ -1544,7 +1235,7 @@ int mdb_cursor_del(MDB_cursor *cursor, unsigned int flags);
* <li>EINVAL - cursor is not initialized, or an invalid parameter was specified.
* </ul>
*/
int mdb_cursor_count(MDB_cursor *cursor, mdb_size_t *countp);
int mdb_cursor_count(MDB_cursor *cursor, size_t *countp);
/** @brief Compare two data items according to a particular database.
*
@ -1569,42 +1260,11 @@ int mdb_cmp(MDB_txn *txn, MDB_dbi dbi, const MDB_val *a, const MDB_val *b);
* @return < 0 if a < b, 0 if a == b, > 0 if a > b
*/
int mdb_dcmp(MDB_txn *txn, MDB_dbi dbi, const MDB_val *a, const MDB_val *b);
/** @brief A callback function used to print a message from the library.
*
* @param[in] msg The string to be printed.
* @param[in] ctx An arbitrary context pointer for the callback.
* @return < 0 on failure, >= 0 on success.
*/
typedef int (MDB_msg_func)(const char *msg, void *ctx);
/** @brief Dump the entries in the reader lock table.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[in] func A #MDB_msg_func function
* @param[in] ctx Anything the message function needs
* @return < 0 on failure, >= 0 on success.
*/
int mdb_reader_list(MDB_env *env, MDB_msg_func *func, void *ctx);
/** @brief Check for stale entries in the reader lock table.
*
* @param[in] env An environment handle returned by #mdb_env_create()
* @param[out] dead Number of stale slots that were cleared
* @return 0 on success, non-zero on failure.
*/
int mdb_reader_check(MDB_env *env, int *dead);
/** @} */
#ifdef __cplusplus
}
#endif
/** @page tools LMDB Command Line Tools
The following describes the command line tools that are available for LMDB.
\li \ref mdb_copy_1
\li \ref mdb_dump_1
\li \ref mdb_load_1
\li \ref mdb_stat_1
*/
#endif /* _LMDB_H_ */
/* * http://gitorious.org/mdb/mdb/blobs/raw/b389341b4b2413804726276d01676a6a9d05346f/libraries/liblmdb/lmdb.h */

View file

@ -3,7 +3,7 @@
/* $OpenLDAP$ */
/* This work is part of OpenLDAP Software <http://www.openldap.org/>.
*
* Copyright 2000-2016 The OpenLDAP Foundation.
* Copyright 2000-2013 The OpenLDAP Foundation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -20,9 +20,10 @@
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <assert.h>
#include "midl.h"
/** @defgroup internal LMDB Internals
/** @defgroup internal MDB Internals
* @{
*/
/** @defgroup idls ID List Management
@ -30,7 +31,8 @@
*/
#define CMP(x,y) ( (x) < (y) ? -1 : (x) > (y) )
unsigned mdb_midl_search( MDB_IDL ids, MDB_ID id )
#if 0 /* superseded by append/sort */
static unsigned mdb_midl_search( MDB_IDL ids, MDB_ID id )
{
/*
* binary search of id in ids
@ -65,11 +67,21 @@ unsigned mdb_midl_search( MDB_IDL ids, MDB_ID id )
return cursor;
}
#if 0 /* superseded by append/sort */
int mdb_midl_insert( MDB_IDL ids, MDB_ID id )
{
unsigned x, i;
if (MDB_IDL_IS_RANGE( ids )) {
/* if already in range, treat as a dup */
if (id >= MDB_IDL_RANGE_FIRST(ids) && id <= MDB_IDL_RANGE_LAST(ids))
return -1;
if (id < MDB_IDL_RANGE_FIRST(ids))
ids[1] = id;
else if (id > MDB_IDL_RANGE_LAST(ids))
ids[2] = id;
return 0;
}
x = mdb_midl_search( ids, id );
assert( x > 0 );
@ -85,9 +97,15 @@ int mdb_midl_insert( MDB_IDL ids, MDB_ID id )
}
if ( ++ids[0] >= MDB_IDL_DB_MAX ) {
/* no room */
--ids[0];
return -2;
if( id < ids[1] ) {
ids[1] = id;
ids[2] = ids[ids[0]-1];
} else if ( ids[ids[0]-1] < id ) {
ids[2] = id;
} else {
ids[2] = ids[ids[0]-1];
}
ids[0] = MDB_NOID;
} else {
/* insert id */
@ -103,10 +121,8 @@ int mdb_midl_insert( MDB_IDL ids, MDB_ID id )
MDB_IDL mdb_midl_alloc(int num)
{
MDB_IDL ids = malloc((num+2) * sizeof(MDB_ID));
if (ids) {
if (ids)
*ids++ = num;
*ids = 0;
}
return ids;
}
@ -116,18 +132,19 @@ void mdb_midl_free(MDB_IDL ids)
free(ids-1);
}
void mdb_midl_shrink( MDB_IDL *idp )
int mdb_midl_shrink( MDB_IDL *idp )
{
MDB_IDL ids = *idp;
if (*(--ids) > MDB_IDL_UM_MAX &&
(ids = realloc(ids, (MDB_IDL_UM_MAX+2) * sizeof(MDB_ID))))
{
if (*(--ids) > MDB_IDL_UM_MAX) {
ids = realloc(ids, (MDB_IDL_UM_MAX+1) * sizeof(MDB_ID));
*ids++ = MDB_IDL_UM_MAX;
*idp = ids;
return 1;
}
return 0;
}
static int mdb_midl_grow( MDB_IDL *idp, int num )
int mdb_midl_grow( MDB_IDL *idp, int num )
{
MDB_IDL idn = *idp-1;
/* grow it */
@ -139,20 +156,6 @@ static int mdb_midl_grow( MDB_IDL *idp, int num )
return 0;
}
int mdb_midl_need( MDB_IDL *idp, unsigned num )
{
MDB_IDL ids = *idp;
num += ids[0];
if (num > ids[-1]) {
num = (num + num/4 + (256 + 2)) & -256;
if (!(ids = realloc(ids-1, num * sizeof(MDB_ID))))
return ENOMEM;
*ids++ = num - 2;
*idp = ids;
}
return 0;
}
int mdb_midl_append( MDB_IDL *idp, MDB_ID id )
{
MDB_IDL ids = *idp;
@ -181,40 +184,10 @@ int mdb_midl_append_list( MDB_IDL *idp, MDB_IDL app )
return 0;
}
int mdb_midl_append_range( MDB_IDL *idp, MDB_ID id, unsigned n )
{
MDB_ID *ids = *idp, len = ids[0];
/* Too big? */
if (len + n > ids[-1]) {
if (mdb_midl_grow(idp, n | MDB_IDL_UM_MAX))
return ENOMEM;
ids = *idp;
}
ids[0] = len + n;
ids += len;
while (n)
ids[n--] = id++;
return 0;
}
void mdb_midl_xmerge( MDB_IDL idl, MDB_IDL merge )
{
MDB_ID old_id, merge_id, i = merge[0], j = idl[0], k = i+j, total = k;
idl[0] = (MDB_ID)-1; /* delimiter for idl scan below */
old_id = idl[j];
while (i) {
merge_id = merge[i--];
for (; old_id < merge_id; old_id = idl[--j])
idl[k--] = old_id;
idl[k--] = merge_id;
}
idl[0] = total;
}
/* Quicksort + Insertion sort for small arrays */
#define SMALL 8
#define MIDL_SWAP(a,b) { itmp=(a); (a)=(b); (b)=itmp; }
#define SWAP(a,b) { itmp=(a); (a)=(b); (b)=itmp; }
void
mdb_midl_sort( MDB_IDL ids )
@ -242,15 +215,15 @@ mdb_midl_sort( MDB_IDL ids )
l = istack[jstack--];
} else {
k = (l + ir) >> 1; /* Choose median of left, center, right */
MIDL_SWAP(ids[k], ids[l+1]);
SWAP(ids[k], ids[l+1]);
if (ids[l] < ids[ir]) {
MIDL_SWAP(ids[l], ids[ir]);
SWAP(ids[l], ids[ir]);
}
if (ids[l+1] < ids[ir]) {
MIDL_SWAP(ids[l+1], ids[ir]);
SWAP(ids[l+1], ids[ir]);
}
if (ids[l] < ids[l+1]) {
MIDL_SWAP(ids[l], ids[l+1]);
SWAP(ids[l], ids[l+1]);
}
i = l+1;
j = ir;
@ -259,7 +232,7 @@ mdb_midl_sort( MDB_IDL ids )
do i++; while(ids[i] > a);
do j--; while(ids[j] < a);
if (j < i) break;
MIDL_SWAP(ids[i],ids[j]);
SWAP(ids[i],ids[j]);
}
ids[l+1] = ids[j];
ids[j] = a;
@ -317,6 +290,7 @@ int mdb_mid2l_insert( MDB_ID2L ids, MDB_ID2 *id )
unsigned x, i;
x = mdb_mid2l_search( ids, id->mid );
assert( x > 0 );
if( x < 1 ) {
/* internal error */
@ -354,67 +328,7 @@ int mdb_mid2l_append( MDB_ID2L ids, MDB_ID2 *id )
return 0;
}
#ifdef MDB_VL32
unsigned mdb_mid3l_search( MDB_ID3L ids, MDB_ID id )
{
/*
* binary search of id in ids
* if found, returns position of id
* if not found, returns first position greater than id
*/
unsigned base = 0;
unsigned cursor = 1;
int val = 0;
unsigned n = (unsigned)ids[0].mid;
while( 0 < n ) {
unsigned pivot = n >> 1;
cursor = base + pivot + 1;
val = CMP( id, ids[cursor].mid );
if( val < 0 ) {
n = pivot;
} else if ( val > 0 ) {
base = cursor;
n -= pivot + 1;
} else {
return cursor;
}
}
if( val > 0 ) {
++cursor;
}
return cursor;
}
int mdb_mid3l_insert( MDB_ID3L ids, MDB_ID3 *id )
{
unsigned x, i;
x = mdb_mid3l_search( ids, id->mid );
if( x < 1 ) {
/* internal error */
return -2;
}
if ( x <= ids[0].mid && ids[x].mid == id->mid ) {
/* duplicate */
return -1;
}
/* insert id */
ids[0].mid++;
for (i=(unsigned)ids[0].mid; i>x; i--)
ids[i] = ids[i-1];
ids[x] = *id;
return 0;
}
#endif /* MDB_VL32 */
/** @} */
/** @} */
/* http://gitorious.org/mdb/mdb/blobs/raw/mdb.master/libraries/liblmdb/midl.c */

View file

@ -1,5 +1,5 @@
/** @file midl.h
* @brief LMDB ID List header file.
* @brief mdb ID List header file.
*
* This file was originally part of back-bdb but has been
* modified for use in libmdb. Most of the macros defined
@ -11,7 +11,7 @@
/* $OpenLDAP$ */
/* This work is part of OpenLDAP Software <http://www.openldap.org/>.
*
* Copyright 2000-2016 The OpenLDAP Foundation.
* Copyright 2000-2013 The OpenLDAP Foundation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -27,27 +27,22 @@
#define _MDB_MIDL_H_
#include <stddef.h>
#include <inttypes.h>
#ifdef __cplusplus
extern "C" {
#endif
/** @defgroup internal LMDB Internals
/** @defgroup internal MDB Internals
* @{
*/
/** @defgroup idls ID List Management
* @{
*/
/** A generic unsigned ID number. These were entryIDs in back-bdb.
/** A generic ID number. These were entryIDs in back-bdb.
* Preferably it should have the same size as a pointer.
*/
#ifdef MDB_VL32
typedef uint64_t MDB_ID;
#else
typedef size_t MDB_ID;
#endif
/** An IDL is an ID List, a sorted array of IDs. The first
* element of the array is a counter for how many actual
@ -57,41 +52,67 @@ typedef size_t MDB_ID;
*/
typedef MDB_ID *MDB_IDL;
#define MDB_NOID (~(MDB_ID)0)
/* IDL sizes - likely should be even bigger
* limiting factors: sizeof(ID), thread stack size
*/
#ifdef MDB_VL32
#define MDB_IDL_LOGN 10 /* DB_SIZE is 2^10, UM_SIZE is 2^11 */
#else
#define MDB_IDL_LOGN 16 /* DB_SIZE is 2^16, UM_SIZE is 2^17 */
#endif
#define MDB_IDL_DB_SIZE (1<<MDB_IDL_LOGN)
#define MDB_IDL_UM_SIZE (1<<(MDB_IDL_LOGN+1))
#define MDB_IDL_UM_SIZEOF (MDB_IDL_UM_SIZE * sizeof(MDB_ID))
#define MDB_IDL_DB_MAX (MDB_IDL_DB_SIZE-1)
#define MDB_IDL_UM_MAX (MDB_IDL_UM_SIZE-1)
#define MDB_IDL_SIZEOF(ids) (((ids)[0]+1) * sizeof(MDB_ID))
#define MDB_IDL_IS_RANGE(ids) ((ids)[0] == MDB_NOID)
#define MDB_IDL_RANGE_SIZE (3)
#define MDB_IDL_RANGE_SIZEOF (MDB_IDL_RANGE_SIZE * sizeof(MDB_ID))
#define MDB_IDL_SIZEOF(ids) ((MDB_IDL_IS_RANGE(ids) \
? MDB_IDL_RANGE_SIZE : ((ids)[0]+1)) * sizeof(MDB_ID))
#define MDB_IDL_RANGE_FIRST(ids) ((ids)[1])
#define MDB_IDL_RANGE_LAST(ids) ((ids)[2])
#define MDB_IDL_RANGE( ids, f, l ) \
do { \
(ids)[0] = MDB_NOID; \
(ids)[1] = (f); \
(ids)[2] = (l); \
} while(0)
#define MDB_IDL_ZERO(ids) \
do { \
(ids)[0] = 0; \
(ids)[1] = 0; \
(ids)[2] = 0; \
} while(0)
#define MDB_IDL_IS_ZERO(ids) ( (ids)[0] == 0 )
#define MDB_IDL_IS_ALL( range, ids ) ( (ids)[0] == MDB_NOID \
&& (ids)[1] <= (range)[1] && (range)[2] <= (ids)[2] )
#define MDB_IDL_CPY( dst, src ) (memcpy( dst, src, MDB_IDL_SIZEOF( src ) ))
#define MDB_IDL_ID( bdb, ids, id ) MDB_IDL_RANGE( ids, id, ((bdb)->bi_lastid) )
#define MDB_IDL_ALL( bdb, ids ) MDB_IDL_RANGE( ids, 1, ((bdb)->bi_lastid) )
#define MDB_IDL_FIRST( ids ) ( (ids)[1] )
#define MDB_IDL_LAST( ids ) ( (ids)[(ids)[0]] )
#define MDB_IDL_LAST( ids ) ( MDB_IDL_IS_RANGE(ids) \
? (ids)[2] : (ids)[(ids)[0]] )
/** Current max length of an #mdb_midl_alloc()ed IDL */
#define MDB_IDL_ALLOCLEN( ids ) ( (ids)[-1] )
#define MDB_IDL_N( ids ) ( MDB_IDL_IS_RANGE(ids) \
? ((ids)[2]-(ids)[1])+1 : (ids)[0] )
/** Append ID to IDL. The IDL must be big enough. */
#define mdb_midl_xappend(idl, id) do { \
MDB_ID *xidl = (idl), xlen = ++(xidl[0]); \
xidl[xlen] = (id); \
} while (0)
/** Search for an ID in an IDL.
* @param[in] ids The IDL to search.
* @param[in] id The ID to search for.
* @return The index of the first ID greater than or equal to \b id.
#if 0 /* superseded by append/sort */
/** Insert an ID into an IDL.
* @param[in,out] ids The IDL to insert into.
* @param[in] id The ID to insert.
* @return 0 on success, -1 if the ID was already present in the IDL.
*/
unsigned mdb_midl_search( MDB_IDL ids, MDB_ID id );
int mdb_midl_insert( MDB_IDL ids, MDB_ID id );
#endif
/** Allocate an IDL.
* Allocates memory for an IDL of the given size.
@ -107,44 +128,32 @@ void mdb_midl_free(MDB_IDL ids);
/** Shrink an IDL.
* Return the IDL to the default size if it has grown larger.
* @param[in,out] idp Address of the IDL to shrink.
* @return 0 on no change, non-zero if shrunk.
*/
void mdb_midl_shrink(MDB_IDL *idp);
int mdb_midl_shrink(MDB_IDL *idp);
/** Make room for num additional elements in an IDL.
* @param[in,out] idp Address of the IDL.
* @param[in] num Number of elements to make room for.
* @return 0 on success, ENOMEM on failure.
/** Grow an IDL.
* Add room for num additional elements.
* @param[in,out] idp Address of the IDL to grow.
* @param[in] num Number of elements to add.
* @return 0 on success, -1 on failure.
*/
int mdb_midl_need(MDB_IDL *idp, unsigned num);
int mdb_midl_grow(MDB_IDL *idp, int num);
/** Append an ID onto an IDL.
* @param[in,out] idp Address of the IDL to append to.
* @param[in] id The ID to append.
* @return 0 on success, ENOMEM if the IDL is too large.
* @return 0 on success, -1 if the IDL is too large.
*/
int mdb_midl_append( MDB_IDL *idp, MDB_ID id );
/** Append an IDL onto an IDL.
* @param[in,out] idp Address of the IDL to append to.
* @param[in] app The IDL to append.
* @return 0 on success, ENOMEM if the IDL is too large.
* @return 0 on success, -1 if the IDL is too large.
*/
int mdb_midl_append_list( MDB_IDL *idp, MDB_IDL app );
/** Append an ID range onto an IDL.
* @param[in,out] idp Address of the IDL to append to.
* @param[in] id The lowest ID to append.
* @param[in] n Number of IDs to append.
* @return 0 on success, ENOMEM if the IDL is too large.
*/
int mdb_midl_append_range( MDB_IDL *idp, MDB_ID id, unsigned n );
/** Merge an IDL onto an IDL. The destination IDL must be big enough.
* @param[in] idl The IDL to merge into.
* @param[in] merge The IDL to merge.
*/
void mdb_midl_xmerge( MDB_IDL idl, MDB_IDL merge );
/** Sort an IDL.
* @param[in,out] ids The IDL to sort.
*/
@ -186,23 +195,11 @@ int mdb_mid2l_insert( MDB_ID2L ids, MDB_ID2 *id );
*/
int mdb_mid2l_append( MDB_ID2L ids, MDB_ID2 *id );
#ifdef MDB_VL32
typedef struct MDB_ID3 {
MDB_ID mid; /**< The ID */
void *mptr; /**< The pointer */
unsigned int mcnt; /**< Number of pages */
unsigned int mref; /**< Refcounter */
} MDB_ID3;
typedef MDB_ID3 *MDB_ID3L;
unsigned mdb_mid3l_search( MDB_ID3L ids, MDB_ID id );
int mdb_mid3l_insert( MDB_ID3L ids, MDB_ID3 *id );
#endif /* MDB_VL32 */
/** @} */
/** @} */
#ifdef __cplusplus
}
#endif
#endif /* _MDB_MIDL_H_ */
/* http://gitorious.org/mdb/mdb/blobs/raw/mdb.master/libraries/liblmdb/midl.h */

View file

@ -1,678 +0,0 @@
/*
* Copyright (c) 1991, 1993
* The Regents of the University of California. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* @(#)queue.h 8.5 (Berkeley) 8/20/94
* $FreeBSD: src/sys/sys/queue.h,v 1.54 2002/08/05 05:18:43 alfred Exp $
*/
#ifndef _DB_QUEUE_H_
#define _DB_QUEUE_H_
#ifndef __offsetof
#define __offsetof(st, m) \
((size_t) ( (char *)&((st *)0)->m - (char *)0 ))
#endif
#ifndef __containerof
#define __containerof(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - __offsetof(type,member) );})
#endif
#if defined(__cplusplus)
extern "C" {
#endif
/*
* This file defines four types of data structures: singly-linked lists,
* singly-linked tail queues, lists and tail queues.
*
* A singly-linked list is headed by a single forward pointer. The elements
* are singly linked for minimum space and pointer manipulation overhead at
* the expense of O(n) removal for arbitrary elements. New elements can be
* added to the list after an existing element or at the head of the list.
* Elements being removed from the head of the list should use the explicit
* macro for this purpose for optimum efficiency. A singly-linked list may
* only be traversed in the forward direction. Singly-linked lists are ideal
* for applications with large datasets and few or no removals or for
* implementing a LIFO queue.
*
* A singly-linked tail queue is headed by a pair of pointers, one to the
* head of the list and the other to the tail of the list. The elements are
* singly linked for minimum space and pointer manipulation overhead at the
* expense of O(n) removal for arbitrary elements. New elements can be added
* to the list after an existing element, at the head of the list, or at the
* end of the list. Elements being removed from the head of the tail queue
* should use the explicit macro for this purpose for optimum efficiency.
* A singly-linked tail queue may only be traversed in the forward direction.
* Singly-linked tail queues are ideal for applications with large datasets
* and few or no removals or for implementing a FIFO queue.
*
* A list is headed by a single forward pointer (or an array of forward
* pointers for a hash table header). The elements are doubly linked
* so that an arbitrary element can be removed without a need to
* traverse the list. New elements can be added to the list before
* or after an existing element or at the head of the list. A list
* may only be traversed in the forward direction.
*
* A tail queue is headed by a pair of pointers, one to the head of the
* list and the other to the tail of the list. The elements are doubly
* linked so that an arbitrary element can be removed without a need to
* traverse the list. New elements can be added to the list before or
* after an existing element, at the head of the list, or at the end of
* the list. A tail queue may be traversed in either direction.
*
* For details on the use of these macros, see the queue(3) manual page.
*
*
* SLIST LIST STAILQ TAILQ
* _HEAD + + + +
* _HEAD_INITIALIZER + + + +
* _ENTRY + + + +
* _INIT + + + +
* _EMPTY + + + +
* _FIRST + + + +
* _NEXT + + + +
* _PREV - - - +
* _LAST - - + +
* _FOREACH + + + +
* _FOREACH_REVERSE - - - +
* _INSERT_HEAD + + + +
* _INSERT_BEFORE - + - +
* _INSERT_AFTER + + + +
* _INSERT_TAIL - - + +
* _CONCAT - - + +
* _REMOVE_HEAD + - + -
* _REMOVE + + + +
*
*/
/*
* XXX
* We #undef all of the macros because there are incompatible versions of this
* file and these macros on various systems. What makes the problem worse is
* they are included and/or defined by system include files which we may have
* already loaded into Berkeley DB before getting here. For example, FreeBSD's
* <rpc/rpc.h> includes its system <sys/queue.h>, and VxWorks UnixLib.h defines
* several of the LIST_XXX macros. Visual C.NET 7.0 also defines some of these
* same macros in Vc7\PlatformSDK\Include\WinNT.h. Make sure we use ours.
*/
#undef LIST_EMPTY
#undef LIST_ENTRY
#undef LIST_FIRST
#undef LIST_FOREACH
#undef LIST_HEAD
#undef LIST_HEAD_INITIALIZER
#undef LIST_INIT
#undef LIST_INSERT_AFTER
#undef LIST_INSERT_BEFORE
#undef LIST_INSERT_HEAD
#undef LIST_NEXT
#undef LIST_REMOVE
#undef QMD_TRACE_ELEM
#undef QMD_TRACE_HEAD
#undef QUEUE_MACRO_DEBUG
#undef SLIST_EMPTY
#undef SLIST_ENTRY
#undef SLIST_FIRST
#undef SLIST_FOREACH
#undef SLIST_FOREACH_PREVPTR
#undef SLIST_HEAD
#undef SLIST_HEAD_INITIALIZER
#undef SLIST_INIT
#undef SLIST_INSERT_AFTER
#undef SLIST_INSERT_HEAD
#undef SLIST_NEXT
#undef SLIST_REMOVE
#undef SLIST_REMOVE_HEAD
#undef STAILQ_CONCAT
#undef STAILQ_EMPTY
#undef STAILQ_ENTRY
#undef STAILQ_FIRST
#undef STAILQ_FOREACH
#undef STAILQ_HEAD
#undef STAILQ_HEAD_INITIALIZER
#undef STAILQ_INIT
#undef STAILQ_INSERT_AFTER
#undef STAILQ_INSERT_HEAD
#undef STAILQ_INSERT_TAIL
#undef STAILQ_LAST
#undef STAILQ_NEXT
#undef STAILQ_REMOVE
#undef STAILQ_REMOVE_HEAD
#undef STAILQ_REMOVE_HEAD_UNTIL
#undef TAILQ_CONCAT
#undef TAILQ_EMPTY
#undef TAILQ_ENTRY
#undef TAILQ_FIRST
#undef TAILQ_FOREACH
#undef TAILQ_FOREACH_REVERSE
#undef TAILQ_HEAD
#undef TAILQ_HEAD_INITIALIZER
#undef TAILQ_INIT
#undef TAILQ_INSERT_AFTER
#undef TAILQ_INSERT_BEFORE
#undef TAILQ_INSERT_HEAD
#undef TAILQ_INSERT_TAIL
#undef TAILQ_LAST
#undef TAILQ_NEXT
#undef TAILQ_PREV
#undef TAILQ_REMOVE
#undef TRACEBUF
#undef TRASHIT
#define QUEUE_MACRO_DEBUG 0
#if QUEUE_MACRO_DEBUG
/* Store the last 2 places the queue element or head was altered */
struct qm_trace {
char * lastfile;
int lastline;
char * prevfile;
int prevline;
};
#define TRACEBUF struct qm_trace trace;
#define TRASHIT(x) do {(x) = (void *)-1;} while (0)
#define QMD_TRACE_HEAD(head) do { \
(head)->trace.prevline = (head)->trace.lastline; \
(head)->trace.prevfile = (head)->trace.lastfile; \
(head)->trace.lastline = __LINE__; \
(head)->trace.lastfile = __FILE__; \
} while (0)
#define QMD_TRACE_ELEM(elem) do { \
(elem)->trace.prevline = (elem)->trace.lastline; \
(elem)->trace.prevfile = (elem)->trace.lastfile; \
(elem)->trace.lastline = __LINE__; \
(elem)->trace.lastfile = __FILE__; \
} while (0)
#else
#define QMD_TRACE_ELEM(elem)
#define QMD_TRACE_HEAD(head)
#define TRACEBUF
#define TRASHIT(x)
#endif /* QUEUE_MACRO_DEBUG */
/*
* Singly-linked List declarations.
*/
#define SLIST_HEAD(name, type) \
struct name { \
struct type *slh_first; /* first element */ \
}
#define SLIST_HEAD_INITIALIZER(head) \
{ NULL }
#define SLIST_ENTRY(type) \
struct { \
struct type *sle_next; /* next element */ \
}
/*
* Singly-linked List functions.
*/
#define SLIST_EMPTY(head) ((head)->slh_first == NULL)
#define SLIST_FIRST(head) ((head)->slh_first)
#define SLIST_FOREACH(var, head, field) \
for ((var) = SLIST_FIRST((head)); \
(var); \
(var) = SLIST_NEXT((var), field))
#define SLIST_FOREACH_PREVPTR(var, varp, head, field) \
for ((varp) = &SLIST_FIRST((head)); \
((var) = *(varp)) != NULL; \
(varp) = &SLIST_NEXT((var), field))
#define SLIST_INIT(head) do { \
SLIST_FIRST((head)) = NULL; \
} while (0)
#define SLIST_INSERT_AFTER(slistelm, elm, field) do { \
SLIST_NEXT((elm), field) = SLIST_NEXT((slistelm), field); \
SLIST_NEXT((slistelm), field) = (elm); \
} while (0)
#define SLIST_INSERT_HEAD(head, elm, field) do { \
SLIST_NEXT((elm), field) = SLIST_FIRST((head)); \
SLIST_FIRST((head)) = (elm); \
} while (0)
#define SLIST_NEXT(elm, field) ((elm)->field.sle_next)
#define SLIST_REMOVE(head, elm, type, field) do { \
if (SLIST_FIRST((head)) == (elm)) { \
SLIST_REMOVE_HEAD((head), field); \
} \
else { \
struct type *curelm = SLIST_FIRST((head)); \
while (SLIST_NEXT(curelm, field) != (elm)) \
curelm = SLIST_NEXT(curelm, field); \
SLIST_NEXT(curelm, field) = \
SLIST_NEXT(SLIST_NEXT(curelm, field), field); \
} \
} while (0)
#define SLIST_REMOVE_HEAD(head, field) do { \
SLIST_FIRST((head)) = SLIST_NEXT(SLIST_FIRST((head)), field); \
} while (0)
/*
* Singly-linked Tail queue declarations.
*/
#define STAILQ_HEAD(name, type) \
struct name { \
struct type *stqh_first;/* first element */ \
struct type **stqh_last;/* addr of last next element */ \
}
#define STAILQ_HEAD_INITIALIZER(head) \
{ NULL, &(head).stqh_first }
#define STAILQ_ENTRY(type) \
struct { \
struct type *stqe_next; /* next element */ \
}
/*
* Singly-linked Tail queue functions.
*/
#define STAILQ_CONCAT(head1, head2) do { \
if (!STAILQ_EMPTY((head2))) { \
*(head1)->stqh_last = (head2)->stqh_first; \
(head1)->stqh_last = (head2)->stqh_last; \
STAILQ_INIT((head2)); \
} \
} while (0)
#define STAILQ_EMPTY(head) ((head)->stqh_first == NULL)
#define STAILQ_FIRST(head) ((head)->stqh_first)
#define STAILQ_FOREACH(var, head, field) \
for ((var) = STAILQ_FIRST((head)); \
(var); \
(var) = STAILQ_NEXT((var), field))
#define STAILQ_INIT(head) do { \
STAILQ_FIRST((head)) = NULL; \
(head)->stqh_last = &STAILQ_FIRST((head)); \
} while (0)
#define STAILQ_INSERT_AFTER(head, tqelm, elm, field) do { \
if ((STAILQ_NEXT((elm), field) = STAILQ_NEXT((tqelm), field)) == NULL)\
(head)->stqh_last = &STAILQ_NEXT((elm), field); \
STAILQ_NEXT((tqelm), field) = (elm); \
} while (0)
#define STAILQ_INSERT_HEAD(head, elm, field) do { \
if ((STAILQ_NEXT((elm), field) = STAILQ_FIRST((head))) == NULL) \
(head)->stqh_last = &STAILQ_NEXT((elm), field); \
STAILQ_FIRST((head)) = (elm); \
} while (0)
#define STAILQ_INSERT_TAIL(head, elm, field) do { \
STAILQ_NEXT((elm), field) = NULL; \
*(head)->stqh_last = (elm); \
(head)->stqh_last = &STAILQ_NEXT((elm), field); \
} while (0)
#define STAILQ_LAST(head, type, field) \
(STAILQ_EMPTY((head)) ? \
NULL : \
((struct type *) \
((char *)((head)->stqh_last) - __offsetof(struct type, field))))
#define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next)
#define STAILQ_REMOVE(head, elm, type, field) do { \
if (STAILQ_FIRST((head)) == (elm)) { \
STAILQ_REMOVE_HEAD((head), field); \
} \
else { \
struct type *curelm = STAILQ_FIRST((head)); \
while (STAILQ_NEXT(curelm, field) != (elm)) \
curelm = STAILQ_NEXT(curelm, field); \
if ((STAILQ_NEXT(curelm, field) = \
STAILQ_NEXT(STAILQ_NEXT(curelm, field), field)) == NULL)\
(head)->stqh_last = &STAILQ_NEXT((curelm), field);\
} \
} while (0)
#define STAILQ_REMOVE_HEAD(head, field) do { \
if ((STAILQ_FIRST((head)) = \
STAILQ_NEXT(STAILQ_FIRST((head)), field)) == NULL) \
(head)->stqh_last = &STAILQ_FIRST((head)); \
} while (0)
#define STAILQ_REMOVE_HEAD_UNTIL(head, elm, field) do { \
if ((STAILQ_FIRST((head)) = STAILQ_NEXT((elm), field)) == NULL) \
(head)->stqh_last = &STAILQ_FIRST((head)); \
} while (0)
/*
* List declarations.
*/
#define LIST_HEAD(name, type) \
struct name { \
struct type *lh_first; /* first element */ \
}
#define LIST_HEAD_INITIALIZER(head) \
{ NULL }
#define LIST_ENTRY(type) \
struct { \
struct type *le_next; /* next element */ \
struct type **le_prev; /* address of previous next element */ \
}
/*
* List functions.
*/
#define LIST_EMPTY(head) ((head)->lh_first == NULL)
#define LIST_FIRST(head) ((head)->lh_first)
#define LIST_FOREACH(var, head, field) \
for ((var) = LIST_FIRST((head)); \
(var); \
(var) = LIST_NEXT((var), field))
#define LIST_INIT(head) do { \
LIST_FIRST((head)) = NULL; \
} while (0)
#define LIST_INSERT_AFTER(listelm, elm, field) do { \
if ((LIST_NEXT((elm), field) = LIST_NEXT((listelm), field)) != NULL)\
LIST_NEXT((listelm), field)->field.le_prev = \
&LIST_NEXT((elm), field); \
LIST_NEXT((listelm), field) = (elm); \
(elm)->field.le_prev = &LIST_NEXT((listelm), field); \
} while (0)
#define LIST_INSERT_BEFORE(listelm, elm, field) do { \
(elm)->field.le_prev = (listelm)->field.le_prev; \
LIST_NEXT((elm), field) = (listelm); \
*(listelm)->field.le_prev = (elm); \
(listelm)->field.le_prev = &LIST_NEXT((elm), field); \
} while (0)
#define LIST_INSERT_HEAD(head, elm, field) do { \
if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \
LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\
LIST_FIRST((head)) = (elm); \
(elm)->field.le_prev = &LIST_FIRST((head)); \
} while (0)
#define LIST_NEXT(elm, field) ((elm)->field.le_next)
#define LIST_REMOVE(elm, field) do { \
if (LIST_NEXT((elm), field) != NULL) \
LIST_NEXT((elm), field)->field.le_prev = \
(elm)->field.le_prev; \
*(elm)->field.le_prev = LIST_NEXT((elm), field); \
} while (0)
/*
* Tail queue declarations.
*/
#define TAILQ_HEAD(name, type) \
struct name { \
struct type *tqh_first; /* first element */ \
struct type **tqh_last; /* addr of last next element */ \
TRACEBUF \
}
#define TAILQ_HEAD_INITIALIZER(head) \
{ NULL, &(head).tqh_first }
#define TAILQ_ENTRY(type) \
struct { \
struct type *tqe_next; /* next element */ \
struct type **tqe_prev; /* address of previous next element */ \
TRACEBUF \
}
/*
* Tail queue functions.
*/
#define TAILQ_CONCAT(head1, head2, field) do { \
if (!TAILQ_EMPTY(head2)) { \
*(head1)->tqh_last = (head2)->tqh_first; \
(head2)->tqh_first->field.tqe_prev = (head1)->tqh_last; \
(head1)->tqh_last = (head2)->tqh_last; \
TAILQ_INIT((head2)); \
QMD_TRACE_HEAD(head); \
QMD_TRACE_HEAD(head2); \
} \
} while (0)
#define TAILQ_EMPTY(head) ((head)->tqh_first == NULL)
#define TAILQ_FIRST(head) ((head)->tqh_first)
#define TAILQ_FOREACH(var, head, field) \
for ((var) = TAILQ_FIRST((head)); \
(var); \
(var) = TAILQ_NEXT((var), field))
#define TAILQ_FOREACH_REVERSE(var, head, headname, field) \
for ((var) = TAILQ_LAST((head), headname); \
(var); \
(var) = TAILQ_PREV((var), headname, field))
#define TAILQ_INIT(head) do { \
TAILQ_FIRST((head)) = NULL; \
(head)->tqh_last = &TAILQ_FIRST((head)); \
QMD_TRACE_HEAD(head); \
} while (0)
#define TAILQ_INSERT_AFTER(head, listelm, elm, field) do { \
if ((TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field)) != NULL)\
TAILQ_NEXT((elm), field)->field.tqe_prev = \
&TAILQ_NEXT((elm), field); \
else { \
(head)->tqh_last = &TAILQ_NEXT((elm), field); \
QMD_TRACE_HEAD(head); \
} \
TAILQ_NEXT((listelm), field) = (elm); \
(elm)->field.tqe_prev = &TAILQ_NEXT((listelm), field); \
QMD_TRACE_ELEM(&(elm)->field); \
QMD_TRACE_ELEM(&listelm->field); \
} while (0)
#define TAILQ_INSERT_BEFORE(listelm, elm, field) do { \
(elm)->field.tqe_prev = (listelm)->field.tqe_prev; \
TAILQ_NEXT((elm), field) = (listelm); \
*(listelm)->field.tqe_prev = (elm); \
(listelm)->field.tqe_prev = &TAILQ_NEXT((elm), field); \
QMD_TRACE_ELEM(&(elm)->field); \
QMD_TRACE_ELEM(&listelm->field); \
} while (0)
#define TAILQ_INSERT_HEAD(head, elm, field) do { \
if ((TAILQ_NEXT((elm), field) = TAILQ_FIRST((head))) != NULL) \
TAILQ_FIRST((head))->field.tqe_prev = \
&TAILQ_NEXT((elm), field); \
else \
(head)->tqh_last = &TAILQ_NEXT((elm), field); \
TAILQ_FIRST((head)) = (elm); \
(elm)->field.tqe_prev = &TAILQ_FIRST((head)); \
QMD_TRACE_HEAD(head); \
QMD_TRACE_ELEM(&(elm)->field); \
} while (0)
#define TAILQ_INSERT_TAIL(head, elm, field) do { \
TAILQ_NEXT((elm), field) = NULL; \
(elm)->field.tqe_prev = (head)->tqh_last; \
*(head)->tqh_last = (elm); \
(head)->tqh_last = &TAILQ_NEXT((elm), field); \
QMD_TRACE_HEAD(head); \
QMD_TRACE_ELEM(&(elm)->field); \
} while (0)
#define TAILQ_LAST(head, headname) \
(*(((struct headname *)((head)->tqh_last))->tqh_last))
#define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next)
#define TAILQ_PREV(elm, headname, field) \
(*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
#define TAILQ_REMOVE(head, elm, field) do { \
if ((TAILQ_NEXT((elm), field)) != NULL) \
TAILQ_NEXT((elm), field)->field.tqe_prev = \
(elm)->field.tqe_prev; \
else { \
(head)->tqh_last = (elm)->field.tqe_prev; \
QMD_TRACE_HEAD(head); \
} \
*(elm)->field.tqe_prev = TAILQ_NEXT((elm), field); \
TRASHIT((elm)->field.tqe_next); \
TRASHIT((elm)->field.tqe_prev); \
QMD_TRACE_ELEM(&(elm)->field); \
} while (0)
/*
* Circular queue definitions.
*/
#define CIRCLEQ_HEAD(name, type) \
struct name { \
struct type *cqh_first; /* first element */ \
struct type *cqh_last; /* last element */ \
}
#define CIRCLEQ_HEAD_INITIALIZER(head) \
{ (void *)&head, (void *)&head }
#define CIRCLEQ_ENTRY(type) \
struct { \
struct type *cqe_next; /* next element */ \
struct type *cqe_prev; /* previous element */ \
}
/*
* Circular queue functions.
*/
#define CIRCLEQ_INIT(head) do { \
(head)->cqh_first = (void *)(head); \
(head)->cqh_last = (void *)(head); \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_INSERT_AFTER(head, listelm, elm, field) do { \
(elm)->field.cqe_next = (listelm)->field.cqe_next; \
(elm)->field.cqe_prev = (listelm); \
if ((listelm)->field.cqe_next == (void *)(head)) \
(head)->cqh_last = (elm); \
else \
(listelm)->field.cqe_next->field.cqe_prev = (elm); \
(listelm)->field.cqe_next = (elm); \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_INSERT_BEFORE(head, listelm, elm, field) do { \
(elm)->field.cqe_next = (listelm); \
(elm)->field.cqe_prev = (listelm)->field.cqe_prev; \
if ((listelm)->field.cqe_prev == (void *)(head)) \
(head)->cqh_first = (elm); \
else \
(listelm)->field.cqe_prev->field.cqe_next = (elm); \
(listelm)->field.cqe_prev = (elm); \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_INSERT_HEAD(head, elm, field) do { \
(elm)->field.cqe_next = (head)->cqh_first; \
(elm)->field.cqe_prev = (void *)(head); \
if ((head)->cqh_last == (void *)(head)) \
(head)->cqh_last = (elm); \
else \
(head)->cqh_first->field.cqe_prev = (elm); \
(head)->cqh_first = (elm); \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_INSERT_TAIL(head, elm, field) do { \
(elm)->field.cqe_next = (void *)(head); \
(elm)->field.cqe_prev = (head)->cqh_last; \
if ((head)->cqh_first == (void *)(head)) \
(head)->cqh_first = (elm); \
else \
(head)->cqh_last->field.cqe_next = (elm); \
(head)->cqh_last = (elm); \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_REMOVE(head, elm, field) do { \
if ((elm)->field.cqe_next == (void *)(head)) \
(head)->cqh_last = (elm)->field.cqe_prev; \
else \
(elm)->field.cqe_next->field.cqe_prev = \
(elm)->field.cqe_prev; \
if ((elm)->field.cqe_prev == (void *)(head)) \
(head)->cqh_first = (elm)->field.cqe_next; \
else \
(elm)->field.cqe_prev->field.cqe_next = \
(elm)->field.cqe_next; \
} while (/*CONSTCOND*/0)
#define CIRCLEQ_FOREACH(var, head, field) \
for ((var) = ((head)->cqh_first); \
(var) != (const void *)(head); \
(var) = ((var)->field.cqe_next))
#define CIRCLEQ_FOREACH_REVERSE(var, head, field) \
for ((var) = ((head)->cqh_last); \
(var) != (const void *)(head); \
(var) = ((var)->field.cqe_prev))
/*
* Circular queue access methods.
*/
#define CIRCLEQ_EMPTY(head) ((head)->cqh_first == (void *)(head))
#define CIRCLEQ_FIRST(head) ((head)->cqh_first)
#define CIRCLEQ_LAST(head) ((head)->cqh_last)
#define CIRCLEQ_NEXT(elm, field) ((elm)->field.cqe_next)
#define CIRCLEQ_PREV(elm, field) ((elm)->field.cqe_prev)
#define CIRCLEQ_LOOP_NEXT(head, elm, field) \
(((elm)->field.cqe_next == (void *)(head)) \
? ((head)->cqh_first) \
: (elm->field.cqe_next))
#define CIRCLEQ_LOOP_PREV(head, elm, field) \
(((elm)->field.cqe_prev == (void *)(head)) \
? ((head)->cqh_last) \
: (elm->field.cqe_prev))
#if defined(__cplusplus)
}
#endif
#endif /* !_DB_QUEUE_H_ */

213
c_src/stats.h Normal file
View file

@ -0,0 +1,213 @@
/*
* stats: measure all the things
*
* Copyright (c) 2012 Basho Technologies, Inc. All Rights Reserved.
* Author: Gregory Burd <greg@basho.com> <greg@burd.me>
*
* This file is provided to you under the Apache License,
* Version 2.0 (the "License"); you may not use this file
* except in compliance with the License. You may obtain
* a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
#ifndef __STATS_H__
#define __STATS_H__
#if defined(__cplusplus)
extern "C" {
#endif
#include "duration.h"
/**
* Calculate the log2 of 64bit unsigned integers.
*/
#ifdef __GCC__
#define LOG2(X) ((unsigned) ((8 * (sizeof(uint64_t) - 1)) - __builtin_clzll((X))))
#else
static unsigned int __log2_64(uint64_t x) {
static const int tab64[64] = {
63, 0, 58, 1, 59, 47, 53, 2,
60, 39, 48, 27, 54, 33, 42, 3,
61, 51, 37, 40, 49, 18, 28, 20,
55, 30, 34, 11, 43, 14, 22, 4,
62, 57, 46, 52, 38, 26, 32, 41,
50, 36, 17, 19, 29, 10, 13, 21,
56, 45, 25, 31, 35, 16, 9, 12,
44, 24, 15, 8, 23, 7, 6, 5};
if (x == 0) return 0;
uint64_t v = x;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v |= v >> 32;
return tab64[((uint64_t)((v - (v >> 1)) * 0x07EDD5E59A4E28C2)) >> 58];
}
#define LOG2(X) __log2_64(X)
#endif
#define STAT_DEF(name) struct name ## _stat name ## _stat;
#define STAT_DECL(name, nsamples) \
struct name ## _stat { \
duration_t d; \
uint64_t histogram[64]; \
uint32_t h, n; \
uint64_t samples[nsamples]; \
uint64_t min, max; \
double mean; \
}; \
static inline double name ## _stat_mean(struct name ## _stat *s) { \
uint32_t t = s->h; \
uint32_t h = (s->h + 1) % nsamples; \
double mean = 0; \
while (h != t) { \
mean += s->samples[h]; \
h = (h + 1) % nsamples; \
} \
if (mean > 0) \
mean /= (double)(s->n < nsamples ? s->n : nsamples); \
return mean; \
} \
static inline double name ## _stat_mean_lg2(struct name ## _stat *s) { \
uint32_t i; \
double mean = 0; \
for (i = 0; i < 64; i++) \
mean += (s->histogram[i] * i); \
if (mean > 0) \
mean /= (double)s->n; \
return mean; \
} \
static inline uint64_t name ## _stat_tick(struct name ## _stat *s) \
{ \
uint64_t t = ts(s->d.unit); \
s->d.then = t; \
return t; \
} \
static inline void name ## _stat_reset(struct name ## _stat *s) \
{ \
s->min = ~0; \
s->max = 0; \
s->h = 0; \
memset(&s->histogram, 0, sizeof(uint64_t) * 64); \
memset(&s->samples, 0, sizeof(uint64_t) * nsamples); \
} \
static inline uint64_t name ## _stat_tock(struct name ## _stat *s) \
{ \
uint64_t now = ts(s->d.unit); \
uint64_t elapsed = now - s->d.then; \
uint32_t i = s->h; \
if (s->n == nsamples) { \
s->mean = (s->mean + name ## _stat_mean(s)) / 2.0; \
if (s->n >= 4294967295) \
name ## _stat_reset(s); \
} \
s->h = (s->h + 1) % nsamples; \
s->samples[i] = elapsed; \
if (elapsed < s->min) \
s->min = elapsed; \
if (elapsed > s->max) \
s->max = elapsed; \
s->histogram[LOG2(elapsed)]++; \
s->n++; \
s->d.then = ts(s->d.unit); \
return elapsed; \
} \
static void name ## _stat_print_histogram(struct name ## _stat *s, const char *mod) \
{ \
uint8_t logs[64]; \
uint8_t i, j, max_log = 0; \
double m = (s->mean + name ## _stat_mean(s) / 2.0); \
\
fprintf(stderr, "%s:async_nif request latency histogram:\n", mod); \
for (i = 0; i < 64; i++) { \
logs[i] = LOG2(s->histogram[i]); \
if (logs[i] > max_log) \
max_log = logs[i]; \
} \
for (i = max_log; i > 0; i--) { \
if (!(i % 10)) \
fprintf(stderr, "2^%2d ", i); \
else \
fprintf(stderr, " "); \
for(j = 0; j < 64; j++) \
fprintf(stderr, logs[j] >= i ? "" : " "); \
fprintf(stderr, "\n"); \
} \
if (max_log == 0) { \
fprintf(stderr, "[empty]\n"); \
} else { \
fprintf(stderr, " ns μs ms s ks\n"); \
fprintf(stderr, "min: "); \
if (s->min < 1000) \
fprintf(stderr, "%lu (ns)", s->min); \
else if (s->min < 1000000) \
fprintf(stderr, "%.2f (μs)", s->min / 1000.0); \
else if (s->min < 1000000000) \
fprintf(stderr, "%.2f (ms)", s->min / 1000000.0); \
else if (s->min < 1000000000000) \
fprintf(stderr, "%.2f (s)", s->min / 1000000000.0); \
fprintf(stderr, " max: "); \
if (s->max < 1000) \
fprintf(stderr, "%lu (ns)", s->max); \
else if (s->max < 1000000) \
fprintf(stderr, "%.2f (μs)", s->max / 1000.0); \
else if (s->max < 1000000000) \
fprintf(stderr, "%.2f (ms)", s->max / 1000000.0); \
else if (s->max < 1000000000000) \
fprintf(stderr, "%.2f (s)", s->max / 1000000000.0); \
fprintf(stderr, " mean: "); \
if (m < 1000) \
fprintf(stderr, "%.2f (ns)", m); \
else if (m < 1000000) \
fprintf(stderr, "%.2f (μs)", m / 1000.0); \
else if (m < 1000000000) \
fprintf(stderr, "%.2f (ms)", m / 1000000.0); \
else if (m < 1000000000000) \
fprintf(stderr, "%.2f (s)", m / 1000000000.0); \
fprintf(stderr, "\n"); \
} \
fflush(stderr); \
}
#define STAT_INIT(var, name) \
var->name ## _stat.min = ~0; \
var->name ## _stat.max = 0; \
var->name ## _stat.mean = 0.0; \
var->name ## _stat.h = 0; \
var->name ## _stat.d.then = 0; \
var->name ## _stat.d.unit = ns;
#define STAT_TICK(var, name) name ## _stat_tick(&var->name ## _stat)
#define STAT_TOCK(var, name) name ## _stat_tock(&var->name ## _stat)
#define STAT_RESET(var, name) name ## _stat_reset(&var->name ## _stat)
#define STAT_MEAN_LOG2_SAMPLE(var, name) \
name ## _stat_mean_lg2(&var->name ## _stat)
#define STAT_MEAN_SAMPLE(var, name) \
name ## _stat_mean(&var->name ## _stat)
#define STAT_PRINT(var, name, mod) \
name ## _stat_print_histogram(&var->name ## _stat, mod)
#if defined(__cplusplus)
}
#endif
#endif // __STATS_H__

915
c_src/uthash.h Normal file
View file

@ -0,0 +1,915 @@
/*
Copyright (c) 2003-2012, Troy D. Hanson http://uthash.sourceforge.net
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef UTHASH_H
#define UTHASH_H
#include <string.h> /* memcmp,strlen */
#include <stddef.h> /* ptrdiff_t */
#include <stdlib.h> /* exit() */
/* These macros use decltype or the earlier __typeof GNU extension.
As decltype is only available in newer compilers (VS2010 or gcc 4.3+
when compiling c++ source) this code uses whatever method is needed
or, for VS2008 where neither is available, uses casting workarounds. */
#ifdef _MSC_VER /* MS compiler */
#if _MSC_VER >= 1600 && defined(__cplusplus) /* VS2010 or newer in C++ mode */
#define DECLTYPE(x) (decltype(x))
#else /* VS2008 or older (or VS2010 in C mode) */
#define NO_DECLTYPE
#define DECLTYPE(x)
#endif
#else /* GNU, Sun and other compilers */
#define DECLTYPE(x) (__typeof(x))
#endif
#ifdef NO_DECLTYPE
#define DECLTYPE_ASSIGN(dst,src) \
do { \
char **_da_dst = (char**)(&(dst)); \
*_da_dst = (char*)(src); \
} while(0)
#else
#define DECLTYPE_ASSIGN(dst,src) \
do { \
(dst) = DECLTYPE(dst)(src); \
} while(0)
#endif
/* a number of the hash function use uint32_t which isn't defined on win32 */
#ifdef _MSC_VER
typedef unsigned int uint32_t;
typedef unsigned char uint8_t;
#else
#include <inttypes.h> /* uint32_t */
#endif
#define UTHASH_VERSION 1.9.6
#ifndef uthash_fatal
#define uthash_fatal(msg) exit(-1) /* fatal error (out of memory,etc) */
#endif
#ifndef uthash_malloc
#define uthash_malloc(sz) malloc(sz) /* malloc fcn */
#endif
#ifndef uthash_free
#define uthash_free(ptr,sz) free(ptr) /* free fcn */
#endif
#ifndef uthash_noexpand_fyi
#define uthash_noexpand_fyi(tbl) /* can be defined to log noexpand */
#endif
#ifndef uthash_expand_fyi
#define uthash_expand_fyi(tbl) /* can be defined to log expands */
#endif
/* initial number of buckets */
#define HASH_INITIAL_NUM_BUCKETS 32 /* initial number of buckets */
#define HASH_INITIAL_NUM_BUCKETS_LOG2 5 /* lg2 of initial number of buckets */
#define HASH_BKT_CAPACITY_THRESH 10 /* expand when bucket count reaches */
/* calculate the element whose hash handle address is hhe */
#define ELMT_FROM_HH(tbl,hhp) ((void*)(((char*)(hhp)) - ((tbl)->hho)))
#define HASH_FIND(hh,head,keyptr,keylen,out) \
do { \
unsigned _hf_bkt,_hf_hashv; \
out=NULL; \
if (head) { \
HASH_FCN(keyptr,keylen, (head)->hh.tbl->num_buckets, _hf_hashv, _hf_bkt); \
if (HASH_BLOOM_TEST((head)->hh.tbl, _hf_hashv)) { \
HASH_FIND_IN_BKT((head)->hh.tbl, hh, (head)->hh.tbl->buckets[ _hf_bkt ], \
keyptr,keylen,out); \
} \
} \
} while (0)
#ifdef HASH_BLOOM
#define HASH_BLOOM_BITLEN (1ULL << HASH_BLOOM)
#define HASH_BLOOM_BYTELEN (HASH_BLOOM_BITLEN/8) + ((HASH_BLOOM_BITLEN%8) ? 1:0)
#define HASH_BLOOM_MAKE(tbl) \
do { \
(tbl)->bloom_nbits = HASH_BLOOM; \
(tbl)->bloom_bv = (uint8_t*)uthash_malloc(HASH_BLOOM_BYTELEN); \
if (!((tbl)->bloom_bv)) { uthash_fatal( "out of memory"); } \
memset((tbl)->bloom_bv, 0, HASH_BLOOM_BYTELEN); \
(tbl)->bloom_sig = HASH_BLOOM_SIGNATURE; \
} while (0)
#define HASH_BLOOM_FREE(tbl) \
do { \
uthash_free((tbl)->bloom_bv, HASH_BLOOM_BYTELEN); \
} while (0)
#define HASH_BLOOM_BITSET(bv,idx) (bv[(idx)/8] |= (1U << ((idx)%8)))
#define HASH_BLOOM_BITTEST(bv,idx) (bv[(idx)/8] & (1U << ((idx)%8)))
#define HASH_BLOOM_ADD(tbl,hashv) \
HASH_BLOOM_BITSET((tbl)->bloom_bv, (hashv & (uint32_t)((1ULL << (tbl)->bloom_nbits) - 1)))
#define HASH_BLOOM_TEST(tbl,hashv) \
HASH_BLOOM_BITTEST((tbl)->bloom_bv, (hashv & (uint32_t)((1ULL << (tbl)->bloom_nbits) - 1)))
#else
#define HASH_BLOOM_MAKE(tbl)
#define HASH_BLOOM_FREE(tbl)
#define HASH_BLOOM_ADD(tbl,hashv)
#define HASH_BLOOM_TEST(tbl,hashv) (1)
#endif
#define HASH_MAKE_TABLE(hh,head) \
do { \
(head)->hh.tbl = (UT_hash_table*)uthash_malloc( \
sizeof(UT_hash_table)); \
if (!((head)->hh.tbl)) { uthash_fatal( "out of memory"); } \
memset((head)->hh.tbl, 0, sizeof(UT_hash_table)); \
(head)->hh.tbl->tail = &((head)->hh); \
(head)->hh.tbl->num_buckets = HASH_INITIAL_NUM_BUCKETS; \
(head)->hh.tbl->log2_num_buckets = HASH_INITIAL_NUM_BUCKETS_LOG2; \
(head)->hh.tbl->hho = (char*)(&(head)->hh) - (char*)(head); \
(head)->hh.tbl->buckets = (UT_hash_bucket*)uthash_malloc( \
HASH_INITIAL_NUM_BUCKETS*sizeof(struct UT_hash_bucket)); \
if (! (head)->hh.tbl->buckets) { uthash_fatal( "out of memory"); } \
memset((head)->hh.tbl->buckets, 0, \
HASH_INITIAL_NUM_BUCKETS*sizeof(struct UT_hash_bucket)); \
HASH_BLOOM_MAKE((head)->hh.tbl); \
(head)->hh.tbl->signature = HASH_SIGNATURE; \
} while(0)
#define HASH_ADD(hh,head,fieldname,keylen_in,add) \
HASH_ADD_KEYPTR(hh,head,&((add)->fieldname),keylen_in,add)
#define HASH_ADD_KEYPTR(hh,head,keyptr,keylen_in,add) \
do { \
unsigned _ha_bkt; \
(add)->hh.next = NULL; \
(add)->hh.key = (char*)keyptr; \
(add)->hh.keylen = (unsigned)keylen_in; \
if (!(head)) { \
head = (add); \
(head)->hh.prev = NULL; \
HASH_MAKE_TABLE(hh,head); \
} else { \
(head)->hh.tbl->tail->next = (add); \
(add)->hh.prev = ELMT_FROM_HH((head)->hh.tbl, (head)->hh.tbl->tail); \
(head)->hh.tbl->tail = &((add)->hh); \
} \
(head)->hh.tbl->num_items++; \
(add)->hh.tbl = (head)->hh.tbl; \
HASH_FCN(keyptr,keylen_in, (head)->hh.tbl->num_buckets, \
(add)->hh.hashv, _ha_bkt); \
HASH_ADD_TO_BKT((head)->hh.tbl->buckets[_ha_bkt],&(add)->hh); \
HASH_BLOOM_ADD((head)->hh.tbl,(add)->hh.hashv); \
HASH_EMIT_KEY(hh,head,keyptr,keylen_in); \
HASH_FSCK(hh,head); \
} while(0)
#define HASH_TO_BKT( hashv, num_bkts, bkt ) \
do { \
bkt = ((hashv) & ((num_bkts) - 1)); \
} while(0)
/* delete "delptr" from the hash table.
* "the usual" patch-up process for the app-order doubly-linked-list.
* The use of _hd_hh_del below deserves special explanation.
* These used to be expressed using (delptr) but that led to a bug
* if someone used the same symbol for the head and deletee, like
* HASH_DELETE(hh,users,users);
* We want that to work, but by changing the head (users) below
* we were forfeiting our ability to further refer to the deletee (users)
* in the patch-up process. Solution: use scratch space to
* copy the deletee pointer, then the latter references are via that
* scratch pointer rather than through the repointed (users) symbol.
*/
#define HASH_DELETE(hh,head,delptr) \
do { \
unsigned _hd_bkt; \
struct UT_hash_handle *_hd_hh_del; \
if ( ((delptr)->hh.prev == NULL) && ((delptr)->hh.next == NULL) ) { \
uthash_free((head)->hh.tbl->buckets, \
(head)->hh.tbl->num_buckets*sizeof(struct UT_hash_bucket) ); \
HASH_BLOOM_FREE((head)->hh.tbl); \
uthash_free((head)->hh.tbl, sizeof(UT_hash_table)); \
head = NULL; \
} else { \
_hd_hh_del = &((delptr)->hh); \
if ((delptr) == ELMT_FROM_HH((head)->hh.tbl,(head)->hh.tbl->tail)) { \
(head)->hh.tbl->tail = \
(UT_hash_handle*)((char*)((delptr)->hh.prev) + \
(head)->hh.tbl->hho); \
} \
if ((delptr)->hh.prev) { \
((UT_hash_handle*)((char*)((delptr)->hh.prev) + \
(head)->hh.tbl->hho))->next = (delptr)->hh.next; \
} else { \
DECLTYPE_ASSIGN(head,(delptr)->hh.next); \
} \
if (_hd_hh_del->next) { \
((UT_hash_handle*)((char*)_hd_hh_del->next + \
(head)->hh.tbl->hho))->prev = \
_hd_hh_del->prev; \
} \
HASH_TO_BKT( _hd_hh_del->hashv, (head)->hh.tbl->num_buckets, _hd_bkt); \
HASH_DEL_IN_BKT(hh,(head)->hh.tbl->buckets[_hd_bkt], _hd_hh_del); \
(head)->hh.tbl->num_items--; \
} \
HASH_FSCK(hh,head); \
} while (0)
/* convenience forms of HASH_FIND/HASH_ADD/HASH_DEL */
#define HASH_FIND_STR(head,findstr,out) \
HASH_FIND(hh,head,findstr,strlen(findstr),out)
#define HASH_ADD_STR(head,strfield,add) \
HASH_ADD(hh,head,strfield,strlen(add->strfield),add)
#define HASH_FIND_INT(head,findint,out) \
HASH_FIND(hh,head,findint,sizeof(int),out)
#define HASH_ADD_INT(head,intfield,add) \
HASH_ADD(hh,head,intfield,sizeof(int),add)
#define HASH_FIND_PTR(head,findptr,out) \
HASH_FIND(hh,head,findptr,sizeof(void *),out)
#define HASH_ADD_PTR(head,ptrfield,add) \
HASH_ADD(hh,head,ptrfield,sizeof(void *),add)
#define HASH_DEL(head,delptr) \
HASH_DELETE(hh,head,delptr)
/* HASH_FSCK checks hash integrity on every add/delete when HASH_DEBUG is defined.
* This is for uthash developer only; it compiles away if HASH_DEBUG isn't defined.
*/
#ifdef HASH_DEBUG
#define HASH_OOPS(...) do { fprintf(stderr,__VA_ARGS__); exit(-1); } while (0)
#define HASH_FSCK(hh,head) \
do { \
unsigned _bkt_i; \
unsigned _count, _bkt_count; \
char *_prev; \
struct UT_hash_handle *_thh; \
if (head) { \
_count = 0; \
for( _bkt_i = 0; _bkt_i < (head)->hh.tbl->num_buckets; _bkt_i++) { \
_bkt_count = 0; \
_thh = (head)->hh.tbl->buckets[_bkt_i].hh_head; \
_prev = NULL; \
while (_thh) { \
if (_prev != (char*)(_thh->hh_prev)) { \
HASH_OOPS("invalid hh_prev %p, actual %p\n", \
_thh->hh_prev, _prev ); \
} \
_bkt_count++; \
_prev = (char*)(_thh); \
_thh = _thh->hh_next; \
} \
_count += _bkt_count; \
if ((head)->hh.tbl->buckets[_bkt_i].count != _bkt_count) { \
HASH_OOPS("invalid bucket count %d, actual %d\n", \
(head)->hh.tbl->buckets[_bkt_i].count, _bkt_count); \
} \
} \
if (_count != (head)->hh.tbl->num_items) { \
HASH_OOPS("invalid hh item count %d, actual %d\n", \
(head)->hh.tbl->num_items, _count ); \
} \
/* traverse hh in app order; check next/prev integrity, count */ \
_count = 0; \
_prev = NULL; \
_thh = &(head)->hh; \
while (_thh) { \
_count++; \
if (_prev !=(char*)(_thh->prev)) { \
HASH_OOPS("invalid prev %p, actual %p\n", \
_thh->prev, _prev ); \
} \
_prev = (char*)ELMT_FROM_HH((head)->hh.tbl, _thh); \
_thh = ( _thh->next ? (UT_hash_handle*)((char*)(_thh->next) + \
(head)->hh.tbl->hho) : NULL ); \
} \
if (_count != (head)->hh.tbl->num_items) { \
HASH_OOPS("invalid app item count %d, actual %d\n", \
(head)->hh.tbl->num_items, _count ); \
} \
} \
} while (0)
#else
#define HASH_FSCK(hh,head)
#endif
/* When compiled with -DHASH_EMIT_KEYS, length-prefixed keys are emitted to
* the descriptor to which this macro is defined for tuning the hash function.
* The app can #include <unistd.h> to get the prototype for write(2). */
#ifdef HASH_EMIT_KEYS
#define HASH_EMIT_KEY(hh,head,keyptr,fieldlen) \
do { \
unsigned _klen = fieldlen; \
write(HASH_EMIT_KEYS, &_klen, sizeof(_klen)); \
write(HASH_EMIT_KEYS, keyptr, fieldlen); \
} while (0)
#else
#define HASH_EMIT_KEY(hh,head,keyptr,fieldlen)
#endif
/* default to Jenkin's hash unless overridden e.g. DHASH_FUNCTION=HASH_SAX */
#ifdef HASH_FUNCTION
#define HASH_FCN HASH_FUNCTION
#else
#define HASH_FCN HASH_JEN
#endif
/* The Bernstein hash function, used in Perl prior to v5.6 */
#define HASH_BER(key,keylen,num_bkts,hashv,bkt) \
do { \
unsigned _hb_keylen=keylen; \
char *_hb_key=(char*)(key); \
(hashv) = 0; \
while (_hb_keylen--) { (hashv) = ((hashv) * 33) + *_hb_key++; } \
bkt = (hashv) & (num_bkts-1); \
} while (0)
/* SAX/FNV/OAT/JEN hash functions are macro variants of those listed at
* http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx */
#define HASH_SAX(key,keylen,num_bkts,hashv,bkt) \
do { \
unsigned _sx_i; \
char *_hs_key=(char*)(key); \
hashv = 0; \
for(_sx_i=0; _sx_i < keylen; _sx_i++) \
hashv ^= (hashv << 5) + (hashv >> 2) + _hs_key[_sx_i]; \
bkt = hashv & (num_bkts-1); \
} while (0)
#define HASH_FNV(key,keylen,num_bkts,hashv,bkt) \
do { \
unsigned _fn_i; \
char *_hf_key=(char*)(key); \
hashv = 2166136261UL; \
for(_fn_i=0; _fn_i < keylen; _fn_i++) \
hashv = (hashv * 16777619) ^ _hf_key[_fn_i]; \
bkt = hashv & (num_bkts-1); \
} while(0)
#define HASH_OAT(key,keylen,num_bkts,hashv,bkt) \
do { \
unsigned _ho_i; \
char *_ho_key=(char*)(key); \
hashv = 0; \
for(_ho_i=0; _ho_i < keylen; _ho_i++) { \
hashv += _ho_key[_ho_i]; \
hashv += (hashv << 10); \
hashv ^= (hashv >> 6); \
} \
hashv += (hashv << 3); \
hashv ^= (hashv >> 11); \
hashv += (hashv << 15); \
bkt = hashv & (num_bkts-1); \
} while(0)
#define HASH_JEN_MIX(a,b,c) \
do { \
a -= b; a -= c; a ^= ( c >> 13 ); \
b -= c; b -= a; b ^= ( a << 8 ); \
c -= a; c -= b; c ^= ( b >> 13 ); \
a -= b; a -= c; a ^= ( c >> 12 ); \
b -= c; b -= a; b ^= ( a << 16 ); \
c -= a; c -= b; c ^= ( b >> 5 ); \
a -= b; a -= c; a ^= ( c >> 3 ); \
b -= c; b -= a; b ^= ( a << 10 ); \
c -= a; c -= b; c ^= ( b >> 15 ); \
} while (0)
#define HASH_JEN(key,keylen,num_bkts,hashv,bkt) \
do { \
unsigned _hj_i,_hj_j,_hj_k; \
char *_hj_key=(char*)(key); \
hashv = 0xfeedbeef; \
_hj_i = _hj_j = 0x9e3779b9; \
_hj_k = (unsigned)keylen; \
while (_hj_k >= 12) { \
_hj_i += (_hj_key[0] + ( (unsigned)_hj_key[1] << 8 ) \
+ ( (unsigned)_hj_key[2] << 16 ) \
+ ( (unsigned)_hj_key[3] << 24 ) ); \
_hj_j += (_hj_key[4] + ( (unsigned)_hj_key[5] << 8 ) \
+ ( (unsigned)_hj_key[6] << 16 ) \
+ ( (unsigned)_hj_key[7] << 24 ) ); \
hashv += (_hj_key[8] + ( (unsigned)_hj_key[9] << 8 ) \
+ ( (unsigned)_hj_key[10] << 16 ) \
+ ( (unsigned)_hj_key[11] << 24 ) ); \
\
HASH_JEN_MIX(_hj_i, _hj_j, hashv); \
\
_hj_key += 12; \
_hj_k -= 12; \
} \
hashv += keylen; \
switch ( _hj_k ) { \
case 11: hashv += ( (unsigned)_hj_key[10] << 24 ); \
case 10: hashv += ( (unsigned)_hj_key[9] << 16 ); \
case 9: hashv += ( (unsigned)_hj_key[8] << 8 ); \
case 8: _hj_j += ( (unsigned)_hj_key[7] << 24 ); \
case 7: _hj_j += ( (unsigned)_hj_key[6] << 16 ); \
case 6: _hj_j += ( (unsigned)_hj_key[5] << 8 ); \
case 5: _hj_j += _hj_key[4]; \
case 4: _hj_i += ( (unsigned)_hj_key[3] << 24 ); \
case 3: _hj_i += ( (unsigned)_hj_key[2] << 16 ); \
case 2: _hj_i += ( (unsigned)_hj_key[1] << 8 ); \
case 1: _hj_i += _hj_key[0]; \
} \
HASH_JEN_MIX(_hj_i, _hj_j, hashv); \
bkt = hashv & (num_bkts-1); \
} while(0)
/* The Paul Hsieh hash function */
#undef get16bits
#if (defined(__GNUC__) && defined(__i386__)) || defined(__WATCOMC__) \
|| defined(_MSC_VER) || defined (__BORLANDC__) || defined (__TURBOC__)
#define get16bits(d) (*((const uint16_t *) (d)))
#endif
#if !defined (get16bits)
#define get16bits(d) ((((uint32_t)(((const uint8_t *)(d))[1])) << 8) \
+(uint32_t)(((const uint8_t *)(d))[0]) )
#endif
#define HASH_SFH(key,keylen,num_bkts,hashv,bkt) \
do { \
char *_sfh_key=(char*)(key); \
uint32_t _sfh_tmp, _sfh_len = keylen; \
\
int _sfh_rem = _sfh_len & 3; \
_sfh_len >>= 2; \
hashv = 0xcafebabe; \
\
/* Main loop */ \
for (;_sfh_len > 0; _sfh_len--) { \
hashv += get16bits (_sfh_key); \
_sfh_tmp = (get16bits (_sfh_key+2) << 11) ^ hashv; \
hashv = (hashv << 16) ^ _sfh_tmp; \
_sfh_key += 2*sizeof (uint16_t); \
hashv += hashv >> 11; \
} \
\
/* Handle end cases */ \
switch (_sfh_rem) { \
case 3: hashv += get16bits (_sfh_key); \
hashv ^= hashv << 16; \
hashv ^= _sfh_key[sizeof (uint16_t)] << 18; \
hashv += hashv >> 11; \
break; \
case 2: hashv += get16bits (_sfh_key); \
hashv ^= hashv << 11; \
hashv += hashv >> 17; \
break; \
case 1: hashv += *_sfh_key; \
hashv ^= hashv << 10; \
hashv += hashv >> 1; \
} \
\
/* Force "avalanching" of final 127 bits */ \
hashv ^= hashv << 3; \
hashv += hashv >> 5; \
hashv ^= hashv << 4; \
hashv += hashv >> 17; \
hashv ^= hashv << 25; \
hashv += hashv >> 6; \
bkt = hashv & (num_bkts-1); \
} while(0)
#ifdef HASH_USING_NO_STRICT_ALIASING
/* The MurmurHash exploits some CPU's (x86,x86_64) tolerance for unaligned reads.
* For other types of CPU's (e.g. Sparc) an unaligned read causes a bus error.
* MurmurHash uses the faster approach only on CPU's where we know it's safe.
*
* Note the preprocessor built-in defines can be emitted using:
*
* gcc -m64 -dM -E - < /dev/null (on gcc)
* cc -## a.c (where a.c is a simple test file) (Sun Studio)
*/
#if (defined(__i386__) || defined(__x86_64__))
#define MUR_GETBLOCK(p,i) p[i]
#else /* non intel */
#define MUR_PLUS0_ALIGNED(p) (((unsigned long)p & 0x3) == 0)
#define MUR_PLUS1_ALIGNED(p) (((unsigned long)p & 0x3) == 1)
#define MUR_PLUS2_ALIGNED(p) (((unsigned long)p & 0x3) == 2)
#define MUR_PLUS3_ALIGNED(p) (((unsigned long)p & 0x3) == 3)
#define WP(p) ((uint32_t*)((unsigned long)(p) & ~3UL))
#if (defined(__BIG_ENDIAN__) || defined(SPARC) || defined(__ppc__) || defined(__ppc64__))
#define MUR_THREE_ONE(p) ((((*WP(p))&0x00ffffff) << 8) | (((*(WP(p)+1))&0xff000000) >> 24))
#define MUR_TWO_TWO(p) ((((*WP(p))&0x0000ffff) <<16) | (((*(WP(p)+1))&0xffff0000) >> 16))
#define MUR_ONE_THREE(p) ((((*WP(p))&0x000000ff) <<24) | (((*(WP(p)+1))&0xffffff00) >> 8))
#else /* assume little endian non-intel */
#define MUR_THREE_ONE(p) ((((*WP(p))&0xffffff00) >> 8) | (((*(WP(p)+1))&0x000000ff) << 24))
#define MUR_TWO_TWO(p) ((((*WP(p))&0xffff0000) >>16) | (((*(WP(p)+1))&0x0000ffff) << 16))
#define MUR_ONE_THREE(p) ((((*WP(p))&0xff000000) >>24) | (((*(WP(p)+1))&0x00ffffff) << 8))
#endif
#define MUR_GETBLOCK(p,i) (MUR_PLUS0_ALIGNED(p) ? ((p)[i]) : \
(MUR_PLUS1_ALIGNED(p) ? MUR_THREE_ONE(p) : \
(MUR_PLUS2_ALIGNED(p) ? MUR_TWO_TWO(p) : \
MUR_ONE_THREE(p))))
#endif
#define MUR_ROTL32(x,r) (((x) << (r)) | ((x) >> (32 - (r))))
#define MUR_FMIX(_h) \
do { \
_h ^= _h >> 16; \
_h *= 0x85ebca6b; \
_h ^= _h >> 13; \
_h *= 0xc2b2ae35l; \
_h ^= _h >> 16; \
} while(0)
#define HASH_MUR(key,keylen,num_bkts,hashv,bkt) \
do { \
const uint8_t *_mur_data = (const uint8_t*)(key); \
const int _mur_nblocks = (keylen) / 4; \
uint32_t _mur_h1 = 0xf88D5353; \
uint32_t _mur_c1 = 0xcc9e2d51; \
uint32_t _mur_c2 = 0x1b873593; \
const uint32_t *_mur_blocks = (const uint32_t*)(_mur_data+_mur_nblocks*4); \
int _mur_i; \
for(_mur_i = -_mur_nblocks; _mur_i; _mur_i++) { \
uint32_t _mur_k1 = MUR_GETBLOCK(_mur_blocks,_mur_i); \
_mur_k1 *= _mur_c1; \
_mur_k1 = MUR_ROTL32(_mur_k1,15); \
_mur_k1 *= _mur_c2; \
\
_mur_h1 ^= _mur_k1; \
_mur_h1 = MUR_ROTL32(_mur_h1,13); \
_mur_h1 = _mur_h1*5+0xe6546b64; \
} \
const uint8_t *_mur_tail = (const uint8_t*)(_mur_data + _mur_nblocks*4); \
uint32_t _mur_k1=0; \
switch((keylen) & 3) { \
case 3: _mur_k1 ^= _mur_tail[2] << 16; \
case 2: _mur_k1 ^= _mur_tail[1] << 8; \
case 1: _mur_k1 ^= _mur_tail[0]; \
_mur_k1 *= _mur_c1; \
_mur_k1 = MUR_ROTL32(_mur_k1,15); \
_mur_k1 *= _mur_c2; \
_mur_h1 ^= _mur_k1; \
} \
_mur_h1 ^= (keylen); \
MUR_FMIX(_mur_h1); \
hashv = _mur_h1; \
bkt = hashv & (num_bkts-1); \
} while(0)
#endif /* HASH_USING_NO_STRICT_ALIASING */
/* key comparison function; return 0 if keys equal */
#define HASH_KEYCMP(a,b,len) memcmp(a,b,len)
/* iterate over items in a known bucket to find desired item */
#define HASH_FIND_IN_BKT(tbl,hh,head,keyptr,keylen_in,out) \
do { \
if (head.hh_head) DECLTYPE_ASSIGN(out,ELMT_FROM_HH(tbl,head.hh_head)); \
else out=NULL; \
while (out) { \
if ((out)->hh.keylen == keylen_in) { \
if ((HASH_KEYCMP((out)->hh.key,keyptr,keylen_in)) == 0) break; \
} \
if ((out)->hh.hh_next) DECLTYPE_ASSIGN(out,ELMT_FROM_HH(tbl,(out)->hh.hh_next)); \
else out = NULL; \
} \
} while(0)
/* add an item to a bucket */
#define HASH_ADD_TO_BKT(head,addhh) \
do { \
head.count++; \
(addhh)->hh_next = head.hh_head; \
(addhh)->hh_prev = NULL; \
if (head.hh_head) { (head).hh_head->hh_prev = (addhh); } \
(head).hh_head=addhh; \
if (head.count >= ((head.expand_mult+1) * HASH_BKT_CAPACITY_THRESH) \
&& (addhh)->tbl->noexpand != 1) { \
HASH_EXPAND_BUCKETS((addhh)->tbl); \
} \
} while(0)
/* remove an item from a given bucket */
#define HASH_DEL_IN_BKT(hh,head,hh_del) \
(head).count--; \
if ((head).hh_head == hh_del) { \
(head).hh_head = hh_del->hh_next; \
} \
if (hh_del->hh_prev) { \
hh_del->hh_prev->hh_next = hh_del->hh_next; \
} \
if (hh_del->hh_next) { \
hh_del->hh_next->hh_prev = hh_del->hh_prev; \
}
/* Bucket expansion has the effect of doubling the number of buckets
* and redistributing the items into the new buckets. Ideally the
* items will distribute more or less evenly into the new buckets
* (the extent to which this is true is a measure of the quality of
* the hash function as it applies to the key domain).
*
* With the items distributed into more buckets, the chain length
* (item count) in each bucket is reduced. Thus by expanding buckets
* the hash keeps a bound on the chain length. This bounded chain
* length is the essence of how a hash provides constant time lookup.
*
* The calculation of tbl->ideal_chain_maxlen below deserves some
* explanation. First, keep in mind that we're calculating the ideal
* maximum chain length based on the *new* (doubled) bucket count.
* In fractions this is just n/b (n=number of items,b=new num buckets).
* Since the ideal chain length is an integer, we want to calculate
* ceil(n/b). We don't depend on floating point arithmetic in this
* hash, so to calculate ceil(n/b) with integers we could write
*
* ceil(n/b) = (n/b) + ((n%b)?1:0)
*
* and in fact a previous version of this hash did just that.
* But now we have improved things a bit by recognizing that b is
* always a power of two. We keep its base 2 log handy (call it lb),
* so now we can write this with a bit shift and logical AND:
*
* ceil(n/b) = (n>>lb) + ( (n & (b-1)) ? 1:0)
*
*/
#define HASH_EXPAND_BUCKETS(tbl) \
do { \
unsigned _he_bkt; \
unsigned _he_bkt_i; \
struct UT_hash_handle *_he_thh, *_he_hh_nxt; \
UT_hash_bucket *_he_new_buckets, *_he_newbkt; \
_he_new_buckets = (UT_hash_bucket*)uthash_malloc( \
2 * tbl->num_buckets * sizeof(struct UT_hash_bucket)); \
if (!_he_new_buckets) { uthash_fatal( "out of memory"); } \
memset(_he_new_buckets, 0, \
2 * tbl->num_buckets * sizeof(struct UT_hash_bucket)); \
tbl->ideal_chain_maxlen = \
(tbl->num_items >> (tbl->log2_num_buckets+1)) + \
((tbl->num_items & ((tbl->num_buckets*2)-1)) ? 1 : 0); \
tbl->nonideal_items = 0; \
for(_he_bkt_i = 0; _he_bkt_i < tbl->num_buckets; _he_bkt_i++) \
{ \
_he_thh = tbl->buckets[ _he_bkt_i ].hh_head; \
while (_he_thh) { \
_he_hh_nxt = _he_thh->hh_next; \
HASH_TO_BKT( _he_thh->hashv, tbl->num_buckets*2, _he_bkt); \
_he_newbkt = &(_he_new_buckets[ _he_bkt ]); \
if (++(_he_newbkt->count) > tbl->ideal_chain_maxlen) { \
tbl->nonideal_items++; \
_he_newbkt->expand_mult = _he_newbkt->count / \
tbl->ideal_chain_maxlen; \
} \
_he_thh->hh_prev = NULL; \
_he_thh->hh_next = _he_newbkt->hh_head; \
if (_he_newbkt->hh_head) _he_newbkt->hh_head->hh_prev = \
_he_thh; \
_he_newbkt->hh_head = _he_thh; \
_he_thh = _he_hh_nxt; \
} \
} \
uthash_free( tbl->buckets, tbl->num_buckets*sizeof(struct UT_hash_bucket) ); \
tbl->num_buckets *= 2; \
tbl->log2_num_buckets++; \
tbl->buckets = _he_new_buckets; \
tbl->ineff_expands = (tbl->nonideal_items > (tbl->num_items >> 1)) ? \
(tbl->ineff_expands+1) : 0; \
if (tbl->ineff_expands > 1) { \
tbl->noexpand=1; \
uthash_noexpand_fyi(tbl); \
} \
uthash_expand_fyi(tbl); \
} while(0)
/* This is an adaptation of Simon Tatham's O(n log(n)) mergesort */
/* Note that HASH_SORT assumes the hash handle name to be hh.
* HASH_SRT was added to allow the hash handle name to be passed in. */
#define HASH_SORT(head,cmpfcn) HASH_SRT(hh,head,cmpfcn)
#define HASH_SRT(hh,head,cmpfcn) \
do { \
unsigned _hs_i; \
unsigned _hs_looping,_hs_nmerges,_hs_insize,_hs_psize,_hs_qsize; \
struct UT_hash_handle *_hs_p, *_hs_q, *_hs_e, *_hs_list, *_hs_tail; \
if (head) { \
_hs_insize = 1; \
_hs_looping = 1; \
_hs_list = &((head)->hh); \
while (_hs_looping) { \
_hs_p = _hs_list; \
_hs_list = NULL; \
_hs_tail = NULL; \
_hs_nmerges = 0; \
while (_hs_p) { \
_hs_nmerges++; \
_hs_q = _hs_p; \
_hs_psize = 0; \
for ( _hs_i = 0; _hs_i < _hs_insize; _hs_i++ ) { \
_hs_psize++; \
_hs_q = (UT_hash_handle*)((_hs_q->next) ? \
((void*)((char*)(_hs_q->next) + \
(head)->hh.tbl->hho)) : NULL); \
if (! (_hs_q) ) break; \
} \
_hs_qsize = _hs_insize; \
while ((_hs_psize > 0) || ((_hs_qsize > 0) && _hs_q )) { \
if (_hs_psize == 0) { \
_hs_e = _hs_q; \
_hs_q = (UT_hash_handle*)((_hs_q->next) ? \
((void*)((char*)(_hs_q->next) + \
(head)->hh.tbl->hho)) : NULL); \
_hs_qsize--; \
} else if ( (_hs_qsize == 0) || !(_hs_q) ) { \
_hs_e = _hs_p; \
_hs_p = (UT_hash_handle*)((_hs_p->next) ? \
((void*)((char*)(_hs_p->next) + \
(head)->hh.tbl->hho)) : NULL); \
_hs_psize--; \
} else if (( \
cmpfcn(DECLTYPE(head)(ELMT_FROM_HH((head)->hh.tbl,_hs_p)), \
DECLTYPE(head)(ELMT_FROM_HH((head)->hh.tbl,_hs_q))) \
) <= 0) { \
_hs_e = _hs_p; \
_hs_p = (UT_hash_handle*)((_hs_p->next) ? \
((void*)((char*)(_hs_p->next) + \
(head)->hh.tbl->hho)) : NULL); \
_hs_psize--; \
} else { \
_hs_e = _hs_q; \
_hs_q = (UT_hash_handle*)((_hs_q->next) ? \
((void*)((char*)(_hs_q->next) + \
(head)->hh.tbl->hho)) : NULL); \
_hs_qsize--; \
} \
if ( _hs_tail ) { \
_hs_tail->next = ((_hs_e) ? \
ELMT_FROM_HH((head)->hh.tbl,_hs_e) : NULL); \
} else { \
_hs_list = _hs_e; \
} \
_hs_e->prev = ((_hs_tail) ? \
ELMT_FROM_HH((head)->hh.tbl,_hs_tail) : NULL); \
_hs_tail = _hs_e; \
} \
_hs_p = _hs_q; \
} \
_hs_tail->next = NULL; \
if ( _hs_nmerges <= 1 ) { \
_hs_looping=0; \
(head)->hh.tbl->tail = _hs_tail; \
DECLTYPE_ASSIGN(head,ELMT_FROM_HH((head)->hh.tbl, _hs_list)); \
} \
_hs_insize *= 2; \
} \
HASH_FSCK(hh,head); \
} \
} while (0)
/* This function selects items from one hash into another hash.
* The end result is that the selected items have dual presence
* in both hashes. There is no copy of the items made; rather
* they are added into the new hash through a secondary hash
* hash handle that must be present in the structure. */
#define HASH_SELECT(hh_dst, dst, hh_src, src, cond) \
do { \
unsigned _src_bkt, _dst_bkt; \
void *_last_elt=NULL, *_elt; \
UT_hash_handle *_src_hh, *_dst_hh, *_last_elt_hh=NULL; \
ptrdiff_t _dst_hho = ((char*)(&(dst)->hh_dst) - (char*)(dst)); \
if (src) { \
for(_src_bkt=0; _src_bkt < (src)->hh_src.tbl->num_buckets; _src_bkt++) { \
for(_src_hh = (src)->hh_src.tbl->buckets[_src_bkt].hh_head; \
_src_hh; \
_src_hh = _src_hh->hh_next) { \
_elt = ELMT_FROM_HH((src)->hh_src.tbl, _src_hh); \
if (cond(_elt)) { \
_dst_hh = (UT_hash_handle*)(((char*)_elt) + _dst_hho); \
_dst_hh->key = _src_hh->key; \
_dst_hh->keylen = _src_hh->keylen; \
_dst_hh->hashv = _src_hh->hashv; \
_dst_hh->prev = _last_elt; \
_dst_hh->next = NULL; \
if (_last_elt_hh) { _last_elt_hh->next = _elt; } \
if (!dst) { \
DECLTYPE_ASSIGN(dst,_elt); \
HASH_MAKE_TABLE(hh_dst,dst); \
} else { \
_dst_hh->tbl = (dst)->hh_dst.tbl; \
} \
HASH_TO_BKT(_dst_hh->hashv, _dst_hh->tbl->num_buckets, _dst_bkt); \
HASH_ADD_TO_BKT(_dst_hh->tbl->buckets[_dst_bkt],_dst_hh); \
(dst)->hh_dst.tbl->num_items++; \
_last_elt = _elt; \
_last_elt_hh = _dst_hh; \
} \
} \
} \
} \
HASH_FSCK(hh_dst,dst); \
} while (0)
#define HASH_CLEAR(hh,head) \
do { \
if (head) { \
uthash_free((head)->hh.tbl->buckets, \
(head)->hh.tbl->num_buckets*sizeof(struct UT_hash_bucket)); \
HASH_BLOOM_FREE((head)->hh.tbl); \
uthash_free((head)->hh.tbl, sizeof(UT_hash_table)); \
(head)=NULL; \
} \
} while(0)
#ifdef NO_DECLTYPE
#define HASH_ITER(hh,head,el,tmp) \
for((el)=(head), (*(char**)(&(tmp)))=(char*)((head)?(head)->hh.next:NULL); \
el; (el)=(tmp),(*(char**)(&(tmp)))=(char*)((tmp)?(tmp)->hh.next:NULL))
#else
#define HASH_ITER(hh,head,el,tmp) \
for((el)=(head),(tmp)=DECLTYPE(el)((head)?(head)->hh.next:NULL); \
el; (el)=(tmp),(tmp)=DECLTYPE(el)((tmp)?(tmp)->hh.next:NULL))
#endif
/* obtain a count of items in the hash */
#define HASH_COUNT(head) HASH_CNT(hh,head)
#define HASH_CNT(hh,head) ((head)?((head)->hh.tbl->num_items):0)
typedef struct UT_hash_bucket {
struct UT_hash_handle *hh_head;
unsigned count;
/* expand_mult is normally set to 0. In this situation, the max chain length
* threshold is enforced at its default value, HASH_BKT_CAPACITY_THRESH. (If
* the bucket's chain exceeds this length, bucket expansion is triggered).
* However, setting expand_mult to a non-zero value delays bucket expansion
* (that would be triggered by additions to this particular bucket)
* until its chain length reaches a *multiple* of HASH_BKT_CAPACITY_THRESH.
* (The multiplier is simply expand_mult+1). The whole idea of this
* multiplier is to reduce bucket expansions, since they are expensive, in
* situations where we know that a particular bucket tends to be overused.
* It is better to let its chain length grow to a longer yet-still-bounded
* value, than to do an O(n) bucket expansion too often.
*/
unsigned expand_mult;
} UT_hash_bucket;
/* random signature used only to find hash tables in external analysis */
#define HASH_SIGNATURE 0xa0111fe1
#define HASH_BLOOM_SIGNATURE 0xb12220f2
typedef struct UT_hash_table {
UT_hash_bucket *buckets;
unsigned num_buckets, log2_num_buckets;
unsigned num_items;
struct UT_hash_handle *tail; /* tail hh in app order, for fast append */
ptrdiff_t hho; /* hash handle offset (byte pos of hash handle in element */
/* in an ideal situation (all buckets used equally), no bucket would have
* more than ceil(#items/#buckets) items. that's the ideal chain length. */
unsigned ideal_chain_maxlen;
/* nonideal_items is the number of items in the hash whose chain position
* exceeds the ideal chain maxlen. these items pay the penalty for an uneven
* hash distribution; reaching them in a chain traversal takes >ideal steps */
unsigned nonideal_items;
/* ineffective expands occur when a bucket doubling was performed, but
* afterward, more than half the items in the hash had nonideal chain
* positions. If this happens on two consecutive expansions we inhibit any
* further expansion, as it's not helping; this happens when the hash
* function isn't a good fit for the key domain. When expansion is inhibited
* the hash will still work, albeit no longer in constant time. */
unsigned ineff_expands, noexpand;
uint32_t signature; /* used only to find hash tables in external analysis */
#ifdef HASH_BLOOM
uint32_t bloom_sig; /* used only to test bloom exists in external analysis */
uint8_t *bloom_bv;
char bloom_nbits;
#endif
} UT_hash_table;
typedef struct UT_hash_handle {
struct UT_hash_table *tbl;
void *prev; /* prev element in app order */
void *next; /* next element in app order */
struct UT_hash_handle *hh_prev; /* previous hh in bucket order */
struct UT_hash_handle *hh_next; /* next hh in bucket order */
void *key; /* ptr to enclosing struct's key */
unsigned keylen; /* enclosing struct's key len */
unsigned hashv; /* result of hash-fcn(key) */
} UT_hash_handle;
#endif /* UTHASH_H */

View file

@ -1,8 +1,7 @@
%%-------------------------------------------------------------------
%% This file is part of LMDB - Erlang Lightning MDB API
%% This file is part of EMDB - Erlang MDB API
%%
%% Copyright (c) 2012 by Aleph Archives. All rights reserved.
%% Copyright (c) 2013 by Basho Technologies, Inc. All rights reserved.
%%
%%-------------------------------------------------------------------
%% Redistribution and use in source and binary forms, with or without

BIN
rebar vendored Executable file

Binary file not shown.

View file

@ -1,12 +1,13 @@
%% -*- erlang -*-
%% ex: ft=erlang ts=4 sw=4 et
{require_otp_vsn, "R1[56]|1[78]"}.
{require_otp_vsn, "R1[567]"}.
{cover_enabled, true}.
{erl_opts, [%{d,'DEBUG',true},
{erl_opts, [{d,'DEBUG',true},
debug_info,
fail_on_warning,
warn_unused_vars,
warn_export_all,
warn_shadow_vars,
@ -20,8 +21,7 @@
warn_exported_vars,
warn_untyped_record
%warn_missing_spec,
%strict_validation,
%fail_on_warning
%strict_validation
]}.
{xref_checks, [undefined_function_calls, deprecated_function_calls]}.
@ -29,11 +29,14 @@
{eunit_opts, [verbose, {report, {eunit_surefire, [{dir, "."}]}}]}.
{port_specs, [
{"priv/lmdb.so", ["c_src/*.c"]}
{"unix", "priv/emdb.so", ["c_src/*.c"]},
{"linux", "priv/emdb.so", ["c_src/*.c"]},
{"darwin", "priv/emdb.so", ["c_src/*.c"]},
{"win32", "priv/emdb.dll", ["c_src/*.c"]}
]}.
{port_env, [
{"DRV_CFLAGS", "$DRV_CFLAGS -O3 -fPIC -march=native -mtune=native -Wall -Wextra"}
{"DRV_CFLAGS", "$DRV_CFLAGS -O3 -fPIC -march=native -mtune=native -Wall -Wextra -Werror"}
]}.
% for debugging use

View file

@ -21,34 +21,25 @@
%%
%% -------------------------------------------------------------------
-define(ASYNC_NIF_CALL(Fun, Args),
F = fun(F, T) ->
R = erlang:make_ref(),
case erlang:apply(Fun, [R|Args]) of
{ok, {enqueued, PctBusy}} ->
if
PctBusy > 0.25 andalso PctBusy =< 1.0 ->
erlang:bump_reductions(erlang:trunc(2000 * PctBusy));
true ->
ok
end,
receive
{R, {error, shutdown}=Error} ->
%% Work unit was queued, but not executed.
Error;
{R, {error, _Reason}=Error} ->
%% Work unit returned an error.
Error;
{R, Reply} ->
Reply
end;
{error, eagain} ->
case T of
3 -> not_found;
_ -> F(F, T + 1)
end;
Other ->
Other
end
end,
F(F, 1)).
-spec async_nif_enqueue(reference(), function(), [term()]) -> term() | {error, term()}.
async_nif_enqueue(R, F, A) ->
case erlang:apply(F, [R|A]) of
{ok, enqueued} ->
receive
{R, {error, eagain}} ->
%% Work unit was not queued, try again.
async_nif_enqueue(R, F, A);
{R, {error, shutdown}=Error} ->
%% Work unit was queued, but not executed.
Error;
{R, {error, _Reason}=Error} ->
%% Work unit returned an error.
Error;
{R, Reply} ->
Reply
end;
Other ->
Other
end.
-define(ASYNC_NIF_CALL(Fun, Args), async_nif_enqueue(erlang:make_ref(), Fun, Args)).

View file

@ -1,7 +1,7 @@
{application, lmdb,
{application, emdb,
[
{description, "OpenLDAP's Lightning Memory-Mapped Database BTREE"},
{vsn, "1.0.0"},
{description, "Memory-Mapped Database"},
{vsn, "0.9.0"},
{registered, []},
{applications, [
kernel,

View file

@ -1,8 +1,7 @@
%%-------------------------------------------------------------------
%% This file is part of LMDB - Erlang Lightning MDB API
%% This file is part of EMDB - Erlang MDB API
%%
%% Copyright (c) 2012 by Aleph Archives. All rights reserved.
%% Copyright (c) 2013 by Basho Technologies, Inc. All rights reserved.
%%
%%-------------------------------------------------------------------
%% Redistribution and use in source and binary forms, with or without
@ -26,7 +25,7 @@
%% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
%%-------------------------------------------------------------------
-module(lmdb).
-module(emdb).
%%====================================================================
%% EXPORTS
@ -40,11 +39,8 @@
put/3,
get/2,
txn_begin/1,
txn_commit/1,
txn_abort/1,
del/2,
update/3, upd/3,
update/3, upd/3,
drop/1
]).
@ -56,7 +52,7 @@
%% config for testing
-ifdef(TEST).
-ifdef(EQC).
-include_lib("eqc/include/eqc.hrl").
%include_lib("eqc/include/eqc.hrl").
-define(QC_OUT(P), eqc:on_output(fun(Str, Args) -> io:format(user, Str, Args) end, P)).
-endif.
-include_lib("eunit/include/eunit.hrl").
@ -65,15 +61,15 @@
%%====================================================================
%% Includes
%%====================================================================
-include("lmdb.hrl").
-include("emdb.hrl").
-include("async_nif.hrl").
%%====================================================================
%% MACROS
%%====================================================================
-define(LMDB_DRIVER_NAME, "lmdb").
-define(EMDB_DRIVER_NAME, "emdb").
-define(NOT_LOADED, not_loaded(?LINE)).
-define(MDB_MAP_SIZE, 2147483648). %% 2GB in bytes
-define(MDB_MAP_SIZE, 10485760). %% 10MB
%%====================================================================
%% PUBLIC API
@ -119,24 +115,6 @@ get(Handle, Key)
get(_AsyncRef, _Handle, _Key) ->
?NOT_LOADED.
txn_begin(Handle) ->
?ASYNC_NIF_CALL(fun txn_begin/2, [Handle]).
txn_begin(_AsyncRef, _Handle) ->
?NOT_LOADED.
txn_commit(Handle) ->
?ASYNC_NIF_CALL(fun txn_commit/2, [Handle]).
txn_commit(_AsyncRef, _Handle) ->
?NOT_LOADED.
txn_abort(Handle) ->
?ASYNC_NIF_CALL(fun txn_abort/2, [Handle]).
txn_abort(_AsyncRef, _Handle) ->
?NOT_LOADED.
del(Handle, Key)
when is_binary(Key) ->
?ASYNC_NIF_CALL(fun del/3, [Handle, Key]).
@ -177,7 +155,7 @@ init() ->
Path ->
Path
end,
erlang:load_nif(filename:join(PrivDir, ?LMDB_DRIVER_NAME), 0).
erlang:load_nif(filename:join(PrivDir, ?EMDB_DRIVER_NAME), 0).
%%--------------------------------------------------------------------
@ -192,21 +170,20 @@ not_loaded(Line) ->
%% ===================================================================
-ifdef(TEST).
open_test_db() ->
-define(TEST_DATA_DIR, "test/basics").
open_test_db(DataDir) ->
{ok, CWD} = file:get_cwd(),
DataDir = filename:join([CWD, "test", "eunit"]),
?cmd("rm -rf " ++ DataDir),
?assertMatch(ok, filelib:ensure_dir(filename:join([DataDir, "x"]))),
{ok, Handle} = ?MODULE:open(DataDir, 2147483648),
[?MODULE:upd(Handle, crypto:hash(sha, <<X>>),
crypto:rand_bytes(crypto:rand_uniform(128, 4096))) ||
X <- lists:seq(1, 10)],
Path = filename:join([CWD, DataDir]),
?cmd("rm -rf " ++ Path),
?assertMatch(ok, filelib:ensure_dir(filename:join([Path, "x"]))),
{ok, Handle} = ?MODULE:open(Path),
Handle.
basics_test_() ->
{setup,
fun() ->
open_test_db()
open_test_db(?TEST_DATA_DIR)
end,
fun(Handle) ->
ok = ?MODULE:close(Handle)
@ -215,16 +192,16 @@ basics_test_() ->
{inorder,
[{"open and close a database",
fun() ->
Handle = open_test_db()
Handle = open_test_db(Handle)
end},
{"create, then drop an empty database",
fun() ->
Handle = open_test_db(),
Handle = open_test_db(Handle),
?assertMatch(ok, ?MODULE:drop(Handle))
end},
{"create, put an item, get it, then drop the database",
fun() ->
Handle = open_test_db(),
Handle = open_test_db(Handle),
?assertMatch(ok, ?MODULE:put(Handle, <<"a">>, <<"apple">>)),
?assertMatch(ok, ?MODULE:put(Handle, <<"b">>, <<"boy">>)),
?assertMatch(ok, ?MODULE:put(Handle, <<"c">>, <<"cat">>)),
@ -238,63 +215,4 @@ basics_test_() ->
]}
end}.
-ifdef(EQC).
qc(P) ->
?assert(eqc:quickcheck(?QC_OUT(P))).
keys() ->
eqc_gen:non_empty(list(eqc_gen:non_empty(binary()))).
values() ->
eqc_gen:non_empty(list(binary())).
ops(Keys, Values) ->
{oneof([put, delete]), oneof(Keys), oneof(Values)}.
apply_kv_ops([], _Handle, Acc0) ->
Acc0;
apply_kv_ops([{put, K, V} | Rest], Handle, Acc0) ->
ok = ?MODULE:put(Handle, K, V),
apply_kv_ops(Rest, Handle, orddict:store(K, V, Acc0));
apply_kv_ops([{del, K, _} | Rest], Handle, Acc0) ->
ok = case ?MODULE:del(Handle, K) of
ok ->
ok;
not_found ->
ok;
Else ->
Else
end,
apply_kv_ops(Rest, Handle, orddict:store(K, deleted, Acc0)).
prop_put_delete() ->
?LET({Keys, Values}, {keys(), values()},
?FORALL(Ops, eqc_gen:non_empty(list(ops(Keys, Values))),
begin
{ok, CWD} = file:get_cwd(),
DataDir = filename:join([CWD, "test", "eqc"]),
?cmd("rm -rf " ++ DataDir),
ok = filelib:ensure_dir(filename:join([DataDir, "x"])),
{ok, Handle} = ?MODULE:open(DataDir, 2147483648),
try
Model = apply_kv_ops(Ops, Handle, []),
%% Validate that all deleted values return not_found
F = fun({K, deleted}) ->
?assertEqual(not_found, ?MODULE:get(Handle, K));
({K, V}) ->
?assertEqual({ok, V}, ?MODULE:get(Handle, K))
end,
lists:map(F, Model),
true
after
?MODULE:close(Handle)
end
end)).
prop_put_delete_test_() ->
{timeout, 3*60, fun() -> qc(prop_put_delete()) end}.
-endif.
-endif.

View file

@ -1,7 +1,7 @@
%%-------------------------------------------------------------------
%% This file is part of LMDB - Erlang Lightning MDB API
%% This file is part of EMDB - Erlang MDB API
%%
%% Copyright (c) 2013 by Basho Technologies, Inc. All rights reserved.
%% Copyright (c) 2013 by Basho Technologies. All rights reserved.
%%
%%-------------------------------------------------------------------
%% Redistribution and use in source and binary forms, with or without
@ -25,7 +25,7 @@
%% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
%%-------------------------------------------------------------------
-module(lmdb_sup).
-module(emdb_sup).
-behaviour(supervisor).
@ -53,4 +53,4 @@ start_link() ->
%% Supervisor callbacks
%% ===================================================================
init([]) ->
{ok, {{one_for_one, 5, 10}, [?CHILD(lmdb, worker)]}}.
{ok, {{one_for_one, 5, 10}, [?CHILD(emdb, worker)]}}.

View file

@ -1,4 +1,4 @@
-module(basho_bench_driver_lmdb).
-module(basho_bench_driver_emdb).
-record(state, {
handle
@ -15,35 +15,35 @@
%% ====================================================================
new(1) ->
%% Make sure lmdb is available
case code:which(lmdb) of
%% Make sure emdb is available
case code:which(emdb) of
non_existing ->
?FAIL_MSG("~s requires lmdb to be available on code path.\n",
?FAIL_MSG("~s requires emdb to be available on code path.\n",
[?MODULE]);
_ ->
ok
end,
%{ok, _} = lmdb_sup:start_link(),
%{ok, _} = emdb_sup:start_link(),
setup(1);
new(Id) ->
setup(Id).
setup(_Id) ->
%% Get the target directory
Dir = basho_bench_config:get(lmdb_dir, "/tmp"),
%Config = basho_bench_config:get(lmdb, []),
Dir = basho_bench_config:get(emdb_dir, "/tmp"),
%Config = basho_bench_config:get(emdb, []),
%% Start Lightning MDB
case lmdb:open(Dir, 32212254720, 16#10000 bor 16#40000 bor 16#80000) of
case emdb:open(Dir, 32212254720, 16#10000 bor 16#40000 bor 16#80000) of
{ok, H} ->
{ok, #state{handle=H}};
{error, Reason} ->
?FAIL_MSG("Failed to establish a Lightning MDB connection, lmdb backend unable to start: ~p\n", [Reason]),
?FAIL_MSG("Failed to establish a Lightning MDB connection, emdb backend unable to start: ~p\n", [Reason]),
{error, Reason}
end.
run(get, KeyGen, _ValueGen, #state{handle=Handle}=State) ->
case lmdb:get(Handle, KeyGen()) of
case emdb:get(Handle, KeyGen()) of
{ok, _Value} ->
{ok, State};
not_found ->
@ -54,14 +54,14 @@ run(get, KeyGen, _ValueGen, #state{handle=Handle}=State) ->
run(put, KeyGen, ValueGen, #state{handle=Handle}=State) ->
Key = KeyGen(),
Val = ValueGen(),
case lmdb:upd(Handle, Key, Val) of
case emdb:upd(Handle, Key, Val) of
ok ->
{ok, State};
{error, Reason} ->
{error, Reason}
end;
run(delete, KeyGen, _ValueGen, #state{handle=Handle}=State) ->
case lmdb:del(Handle, KeyGen()) of
case emdb:del(Handle, KeyGen()) of
ok ->
{ok, State};
not_found ->

37
tools/emdb.config Normal file
View file

@ -0,0 +1,37 @@
%%-*- mode: erlang -*-
%% ex: ft=erlang ts=4 sw=4 et
%% How to:
%% * put the emdb.config file into basho_bench/examples
%% * put the basho_bench_driver_emdb.erl into basho_bench/src
%% * make clean in basho_bench, then make
%% * edit examples/emdb.config
%% - change {code_paths, ["../emdb"]}. to be a relative path to your
%% emdb directory
%% - change {emdb_dir, "/home/gburd/ws/basho_bench/data"}. to a fully
%% qualified location for your test data files (mkdir that directory
%% yourself, if it doesn't exist the test will fail 'enoent')
%% * to run, replace this path with the proper path on your system:
%% LD_LIBRARY_PATH=/home/you/emdb/priv ./basho_bench examples/emdb.config
%% * the test should run for 10 minutes (as it is configured right now)
%% with 4 concurrent workers accessing the same table
%%
%% Note:
%% There are two config sections in wt.config {emdb, [ ... ]}. and
%% {emdb_, [ ... ]}. The one being used is named "emdb" the other
%% config is ignored. I setup an LSM and BTREE config and to choose
%% which is run you just rename those two sections (turn one off by
%% adding a "_" to the name and take the "_" out of the other's name).
{mode, max}.
{duration, 10}.
{concurrent, 8}.
{driver, basho_bench_driver_emdb}.
{key_generator, {int_to_bin_littleendian,{uniform_int, 5000000}}}.
{value_generator, {fixed_bin, 1024}}.
%{operations, [{get, 9}, {put, 9}, {delete, 2}]}.
{operations, [{put, 1}]}.
{code_paths, ["../emdb"]}.
{emdb_dir, "/home/gburd/ws/basho_bench/data"}.
{emdb, [ ]}.

View file

@ -1,38 +0,0 @@
%%-*- mode: erlang -*-
%% ex: ft=erlang ts=4 sw=4 et
%% How to:
%% * put the lmdb.config file into basho_bench/examples
%% * put the basho_bench_driver_lmdb.erl into basho_bench/src
%% * make clean in basho_bench, then make
%% * edit examples/lmdb.config
%% - change {code_paths, ["../lmdb"]}. to be a relative path to your
%% lmdb directory
%% - change {lmdb_dir, "/home/gburd/ws/basho_bench/data"}. to a fully
%% qualified location for your test data files (mkdir that directory
%% yourself, if it doesn't exist the test will fail 'enoent')
%% * to run, replace this path with the proper path on your system:
%% LD_LIBRARY_PATH=/home/you/lmdb/priv ./basho_bench examples/lmdb.config
%% * the test should run for 10 minutes (as it is configured right now)
%% with 4 concurrent workers accessing the same table
%%
%% Note:
%% There are two config sections in wt.config {lmdb, [ ... ]}. and
%% {lmdb_, [ ... ]}. The one being used is named "lmdb" the other
%% config is ignored. I setup an LSM and BTREE config and to choose
%% which is run you just rename those two sections (turn one off by
%% adding a "_" to the name and take the "_" out of the other's name).
{mode, max}.
{duration, 480}.
{concurrent, 32}.
{driver, basho_bench_driver_lmdb}.
{key_generator, {int_to_bin_littleendian,{uniform_int, 5000000000}}}.
{value_generator, {highly_compressible_bin, 2048}}.
%{value_generator, {fixed_bin, 1024}}.
{operations, [{get, 25}, {put, 70}, {delete, 5}]}.
%{operations, [{put, 1}]}.
{code_paths, ["../lmdb"]}.
{lmdb_dir, "/home/gburd/ws/basho_bench/data"}.
{lmdb, [ ]}.

View file

@ -1,6 +0,0 @@
#!/bin/bash
for file in lmdb.h mdb.c midl.h midl.c; do
curl -O https://raw.githubusercontent.com/LMDB/lmdb/mdb.master/libraries/liblmdb/$file
done