Enormous node memory usage for tiny values #1

Closed
opened 2013-02-21 23:33:09 +00:00 by zv · 7 comments
zv commented 2013-02-21 23:33:09 +00:00 (Migrated from github.com)

Hi gburd, I'm ecstatic that someone built this for and would love to use this in place of an existing gb_tree storing and searching thousands of keys but I noticed that I start getting pagefaults at storing 12k dwords on a linux machine with 16GB of RAM running R15B, making each hamt_node clock in at something like 1250000/8 bytes. I haven't yet run it through erlang:debugger(), any idea what might give?

Hi gburd, I'm ecstatic that someone built this for and would love to use this in place of an existing gb_tree storing and searching thousands of keys but I noticed that I start getting pagefaults at storing 12k dwords on a linux machine with 16GB of RAM running R15B, making each hamt_node clock in at something like 1250000/8 bytes. I haven't yet run it through erlang:debugger(), any idea what might give?
gburd commented 2013-02-21 23:49:03 +00:00 (Migrated from github.com)

I just got it working today, it's only a week old! ;-P There are a ton of places to improve performance, this really is just my learning exercise at this point. I'll continue to make this better as I work toward building a real ctrie for Erlang. We (the Erlang community) need HAMT/Ctrie, no more being jealous of Haskell and Clojure. I did manage in tests to store 1M {int(), int()} pairs, but that was just a quick test.

I just got it working today, it's only a week old! ;-P There are a ton of places to improve performance, this really is just my learning exercise at this point. I'll continue to make this better as I work toward building a real ctrie for Erlang. We (the Erlang community) need HAMT/Ctrie, no more being jealous of Haskell and Clojure. I did manage in tests to store 1M {int(), int()} pairs, but that was just a quick test.
zv commented 2013-02-21 23:53:27 +00:00 (Migrated from github.com)

Ok, I am going to look into this tonight and see if I can figure out the origin of this and replicate it on R16.

Ok, I am going to look into this tonight and see if I can figure out the origin of this and replicate it on R16.
gburd commented 2013-02-22 00:54:51 +00:00 (Migrated from github.com)

You are the man, thanks for the help. BTW, there is at least one issue I know in the delete code where an cnode isn't collapsed when it only has one Node left. Here's how to trigger it:

2> hamt:delete(<<5>>, hamt:put([{<>, <>} || X <- lists:seq(1,6)], hamt:new())).
{hamt,{cnode,17629440,
[{snode,<<3>>,<<3>>},
{snode,<<6>>,<<6>>},
{snode,<<1>>,<<1>>},
{cnode,131072,[{snode,<<2>>,<<2>>}]},
{snode,<<4>>,<<4>>}]}}

the {cnode,131072,[{snode,<<2>>,<<2>>}]} should have collapsed to an {snode, <<2>>, <<2>>}.

You are the man, thanks for the help. BTW, there is at least one issue I know in the delete code where an cnode isn't collapsed when it only has one Node left. Here's how to trigger it: 2> hamt:delete(<<5>>, hamt:put([{<<X>>, <<X>>} || X <- lists:seq(1,6)], hamt:new())). {hamt,{cnode,17629440, [{snode,<<3>>,<<3>>}, {snode,<<6>>,<<6>>}, {snode,<<1>>,<<1>>}, {cnode,131072,[{snode,<<2>>,<<2>>}]}, {snode,<<4>>,<<4>>}]}} the {cnode,131072,[{snode,<<2>>,<<2>>}]} should have collapsed to an {snode, <<2>>, <<2>>}.
gburd commented 2013-02-22 19:46:18 +00:00 (Migrated from github.com)

BTW, FWIW I've fixed the delete issue but not yet looked closely at memory consumption yet.

BTW, FWIW I've fixed the delete issue but not yet looked closely at memory consumption yet.
zv commented 2013-02-22 19:55:36 +00:00 (Migrated from github.com)

Awesome!I couldn't replicate the issue on either R16 or R15 at home (very strange), I'm working on a NIF now to try to implement CTPOP to see if that helps me here.

Awesome!I couldn't replicate the issue on either R16 or R15 at home (very strange), I'm working on a NIF now to try to implement CTPOP to see if that helps me here.
gburd commented 2013-02-22 20:16:15 +00:00 (Migrated from github.com)

Cool, you should consider this C implementation (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) or, if you're game for optimizing down to the ASM instruction to do the popcnt on those architectures that support it.
https://github.com/coapp-packages/nasm/blob/master/test/popcnt.asm
https://patchwork.kernel.org/patch/81479/
https://subversion.assembla.com/svn/min-loss-hashing/trunk/utils/bitops.h using __builtin_popcount*
or on some architectures there exists a POPCNT instruction in the CPU. This is a bit like the wonderful world of crc... it's deeper than expected.

The thing to do is to lobby for a popcnt BIF. :)

Cool, you should consider this C implementation (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) or, if you're game for optimizing down to the ASM instruction to do the popcnt on those architectures that support it. https://github.com/coapp-packages/nasm/blob/master/test/popcnt.asm https://patchwork.kernel.org/patch/81479/ https://subversion.assembla.com/svn/min-loss-hashing/trunk/utils/bitops.h using __builtin_popcount* or on some architectures there exists a POPCNT instruction in the CPU. This is a bit like the wonderful world of crc... it's deeper than expected. The thing to do is to lobby for a popcnt BIF. :)
gburd commented 2013-02-22 21:15:16 +00:00 (Migrated from github.com)

Last thing regarding bitpop, I never did fully debug why this wasn't working, I could spend a bit more time looking into that... to avoid the need for a NIF.

bitpop(0) -> 0;
bitpop(X) when is_integer(X), X > 0, X =< 16#FFFFFFFF ->
X1 = X - ((X bsr 1) band 16#55555555),
X2 = (X1 band 16#33333333) + ((X1 bsr 2) band 16#33333333),
((X2 + (X2 bsr 4) band 16#0F0F0F0F) * 16#01010101) bsr 24;
bitpop(X) when is_integer(X) -> %, X > 16#FFFFFFFF, X =< 16#FFFFFFFFFFFFFFFF ->
M1 = 16#5555555555555555, %% 0101...
M2 = 16#3333333333333333, %% 00110011..
M4 = 16#0f0f0f0f0f0f0f0f, %% 4 zeros, 4 ones ...
H01 = 16#0101010101010101, %% the sum of 256 to the power of 0,1,2,3...

%% 1. put count of each 2 bits into those 2 bits
X1 = X - (X bsr 1) band M1,
%% 2. put count of each 4 bits into those 4 bits
X2 = (X1 band M2) + ((X1 bsr 2) band M2),
%% 3. put count of each 8 bits into those 8 bits
X3 = (X2 + (X2 bsr 4)) band M4,
%% 4. returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ...
(X3 * H01) bsr 56.
Last thing regarding bitpop, I never did fully debug why this wasn't working, I could spend a bit more time looking into that... to avoid the need for a NIF. bitpop(0) -> 0; bitpop(X) when is_integer(X), X > 0, X =< 16#FFFFFFFF -> X1 = X - ((X bsr 1) band 16#55555555), X2 = (X1 band 16#33333333) + ((X1 bsr 2) band 16#33333333), ((X2 + (X2 bsr 4) band 16#0F0F0F0F) \* 16#01010101) bsr 24; bitpop(X) when is_integer(X) -> %, X > 16#FFFFFFFF, X =< 16#FFFFFFFFFFFFFFFF -> M1 = 16#5555555555555555, %% 0101... M2 = 16#3333333333333333, %% 00110011.. M4 = 16#0f0f0f0f0f0f0f0f, %% 4 zeros, 4 ones ... H01 = 16#0101010101010101, %% the sum of 256 to the power of 0,1,2,3... ``` %% 1. put count of each 2 bits into those 2 bits X1 = X - (X bsr 1) band M1, %% 2. put count of each 4 bits into those 4 bits X2 = (X1 band M2) + ((X1 bsr 2) band M2), %% 3. put count of each 8 bits into those 8 bits X3 = (X2 + (X2 bsr 4)) band M4, %% 4. returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... (X3 * H01) bsr 56. ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: greg/hamt#1
No description provided.