From a07852007e732ab0662b01f158f8f6195e3608f4 Mon Sep 17 00:00:00 2001 From: Sears Russell Date: Tue, 17 Jun 2008 04:09:14 +0000 Subject: [PATCH] Improved graph printability, fixed remaining todo's in rose.tex. --- doc/rosePaper/query-innodb.pdf | Bin 14442 -> 14450 bytes doc/rosePaper/rose.tex | 45 ++++++++++++++++++--------------- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/doc/rosePaper/query-innodb.pdf b/doc/rosePaper/query-innodb.pdf index cbacb1ec87163f494d8dc176fc2b6b699d4ef1b1..aaf27ef76d3818148bb50ecea00a476c1ab10ea2 100644 GIT binary patch delta 5011 zcmZuucR1U9yRV>6)TTzQBGjlok|s4v)r>vb#wFw8J ztH&(YS#^3>34SHk>#?i2wy^GiiVd$qsc+_U=OO1?J)0an=$ra|9PHaI-A9iSAP>AF zIlC2kI*NEAP7~FIx*-79SbC!3UhPEr+r`meS5A5luKqft%S$>r+c{RPvwd_PP7+JW z=P&&-xg~1zocrA+8SwC{KIVHyGKXZn4<9{h<`5C2pus!6_GfCPK-JN)RR`-gvoLbG zLnlOY@q|Dw7wn|L&evDar5kb-EJF^Opd&QZHj(85a(c`*!<&Gvav$50vuritd*1~5 z>K(mrot5@0rnRZA-A*e%jA4z4gc$P{g>UptP(g3XrTu4rjTpil=AqW;~TLO zDv$OY8CLIeynhOm*gVd?zBr0khBkDl1@1$;>mPkLwQ;61Qg68n;X_@g@{HsG0xP;r z6XPwtTI>mj_m!jAH$LV{_kZH9{MUaxb#;>q2ZTAY1{93-+(gaHvWSX_O z-n?;&hB$ohAWkl zAs0Q88UR@OhN+w;`RvsjM-^vBoW{Dp5>1)VraQ=n|6<`Cc0w4 z;?b4gSL?h+x)|9eizkPyuEal;fN06O@Fyh4CWTlAFqR)KOgP-Pmb%0QtOervLQX1_ z*>Cj!`m3o-qXPC$W~MAL6=M**>-tP%Dt{^!wqbsS#?Sk4_wzZ&V!I?cO+>YBP9tHO9`{OaD~GuaOiftLfNr@6yD3agkL^Qe zz~gc>emn*4qd|V`O12&a5ESikb45yH0HGy*lI3w(Y;GL~q>TolHe0DAr%@_##ptyX zqS8_Sg|71|B8qWnXr?v$i>(_YyIfFfh9~grdvx+r3AL%5Z>>w8WXx3eSBu@t58upL z+Q!LWcs`;WzD4uOC6p(SZAu@vZ5CBGfML4N?al0&#){ zo^Bi#2@v5SIVLg2>n1q8g|ct=sPj#AU}1^OHd>AEz%1boNt>xpp(5i_;bqdgrJO9e z=6koy=U+dQZi7EQ(*wdEAhVw6fsxS{l7-GNtb*6-n=Y}W6*a6&tSwDeMkQWAp}{av z{0Ub1ckSH1K}IL*C$w2MfQ0;G-bBs!{OzbA_iJ2@d7* zj%qw%;l(;OG+iF~WX1-igx1|Dc!Gwe1*9y$JimdluF0jZd<0x6|B}@ZxKJ(i$BCWN zpndTKEoF~RumN4q`l+D%))#!!N>uJtqPXLk1H3v};4+j`*ZN^zx@Tp*N#K`-Ak#n- zXm+}QQoLPyE`|wSawqCoZw+^OGd)+=1G%OLUoPC$zN#Xjy3uQ!m8e_X)+Vody%Lp` z+lNbB`>m=>#ovQr=YE;C0A>ReFCg`0XL=0vd;T^}Zx&Y_EilsWQNFwQS)#d&*It}o zKR!#WM9T@amTtFP(-G<&XbNk25;#op32USX;t*vId?$aj>NVfMAGA5@?;p z`I}F8AldT~+ANypI@VsJ`4#N0MzcbZn&vDjU0mMGIxM^);^^{?@;D9rdYPTFa~#v# zz%fc=ejxdBZdoW@%(|sFsxPO~j!5)2tDey`G^?&V74ZeXYO5>?97{=+)ZSai{4BHM zPvIZiaMpG8)e7p^1wNIj$T$1A8ye;*$E@LPjkUcP9NCBQT~T@zaW0P@93~>p(JXh<)yyW0e`8%)lGUeui{NQ%E0g>6+b8;fPIL!bcSXZc{%Ip zT=u;ePv^#D|BQzJgeo+GcdsESPvpOH&w)=1H=YmqhTbTWZE6;5q~p|vwLPOW=0aU} zpVb4h+NBG(Fj{1eVGoU73jGuN7*7{1Owq$7I6dKMS!@iM#_Nuc7#T-4AG*eqT9q)! z)o90R8|`o#!!8x$8O|}x*dSKOt0EhN{8M^aDh9cb8#A~aPC@8gFU0?Gb!^ZM8*Ql+ z^0vbgKcv^+*MLp7{GR#5m3yCv<(vW==1RoLKyRH;lk^5)qjfVhZ(D_ zwP8(mEVDEuNMqF~ueNnxEajRF3*nDuxkU};tKFv~%oh)|E#Fr2C!&HUy2v=wo(5FA zZaO}>A1y?EBNMXC?V8$e>bd${wP)nHDFKyxv*raVaqdNgj|^$%oW0){P!iE+>8~xA z>GZ*w@Wk@<$IZGr5tFkYF2s<{tjt;Z4aRj9YEYM7x#*J3#)qQdlBFzCcRg)nCaP@g z6wr7id4yX({!W zy5*6AVot$iYvWdUa$KuZ88uP&Z!Rt2%td)KNQyOJvY>63eSp`nJI#)wCSoRTq-)54 zBJP5)d!-`my%o9dro_@GdgtKj9feD9!r>yfXA96omGUuzKesA)jYEU^$O-)=QMYgp;53Hr?EQ~Rnf;_P z4dM}fWEO2+W+$T3sZy9XvGV%#X>$K#*YJmEIUw_x$C)>?sLhXUfQY&kyPsfVCnk%? zjMMfHOC$^bIe11~{*_YR44D6OJenw&CtPPrL>1HymGgmuI=@-iMJF7tSjZs8?y!C` zeSfDNNmjhxa*OyV;1U^Tc(zwZl)%b(o!Ly~@SUUPPCTcAUAAez<@Mu%6e!TXdTb>8 zDL5!q@uS(wLDs+3AgEVpQs2+gfqK`Sm4lHLdj%q6S}c<);Jp(?BJt0WnFx1mXSoQ+ zwGJa#g|a6%)R|`+NjdNtPZWA3fhiS^Ry`!z!_n!ZFBGd=Mt|jCg*Ji)n}%z!gTZ~7 zMbeyse4lI8Q$(}k`0GGzd`w*mmNYvkun!HoA(QVYRhE=6(4jPt;19_z`g_eW6&|HH zlk+I7bp&uguJCQeeMbBL5XsgWl7_4FonV^$71R%BLQ>&|^Lx<56$O5;VLob?18-_f6XwT z7yn?6kSinm%(!0+A5m|-aph~1MXBjqVnfhPiWA@_<0YPxQF84?zi6X62AXNWkk+>% z{ALFB74>{>x&OA=L@>6zSg>@*42$ge#b*!`_B22SU)8kIZwgsfwwI|G%r#{p8GM`j z@~i2$lnBA#No_Q=7`;|kE=bU7Me`z8%}S<~bymClzaUqA5{{mfqAi_l3{|2daTZH( z8X%+3UnRPAza(*0xYutyEFg5YXAAXdqmNV0?b72=x1j$RWdny2TV;_Prs!^%> z%u}AqfMy9ujS0yNTG*}V#HN)2QGsz~FS{bI7QW%?W1d0-;@v}9ZCH9Re-SW0fdBZJ z!n{0i)nPYtdxcpJA#t)1T$Q^QHJm3x`j|aBv3v;gIWpQlshK&X&N;LBapId3DjReL zHFSM=^0;rOI^t-5>5$XlB_QqJ60#TEAD2J4dqmxb4te+UkbU(-@kqz)7p#?fU>kO^ z!&AWBeOW?%#4KAcE8s``VgO)A7s^f=Fp>?Z$~|jG8z{gD$B^~7!HB8JwVALk<2t2_ zL~LP+eEuO6yLQNQ~K2ZV`SBFWmK`S%-yTv;z zu1PtA&*ZU9T@*gG&+KHK)h$OAjwK8eeygPz&E3VE%1b!-RLY86ZGK$JKyqYLmGCD# z45O#_lOhB#ZD{w;vy<+8)1+TT*Cd67?+_<@-i9$zobqcxjZM*=X`&*S5XHXjaS@Ds zA%BjqHrMr+b$~)+Q;>7gSN@T%#INlT30nqtT%Z{ukn-KwklAsosRkX$k_2PeXZ)hI z+hdrxE_5XnRVk3JF;Oe+q|)zhe2;6#JC=2VnlG=kJ7I!aC6563IB>;Tt%T}*cQT)@ zhQIv_M?Sv&-E5_XM)>CEAI6BBuhg7Y%PGFk{8)dtaRVKG*^KWIglTF{w$sLlAaUGZ zZMr`YE}qDyUWS@cGj)~iR1^7CAw{FaavZJp{7tL|m^b9u;Fept6m%^kr|HNydTc{D zpzCIrjfq>)qV$S6`~ImdoQkK}qq<#uiI= zPMSIeF@OQ15R%BaVb$_!e*F4pqZ|0T-Swj82!Wxvo%OEf)qzRRUHti8XW|~K{PUg@ zpXp$|_};gvqD2G7MZoce%SAZhPR+HS~eV%fvYjvoTu|=cPta5F(tiOtWkq>~Gzd%#_@^ST z2!+D_!>yzs{~sJw;h#Qv1!blGhl9!eBP*{cr>OMboc{l=^oLabkFcTw{GWs66>r@5 bpYZ=>mE`2*(%5ex>7a_2L`82K>sH-+S-8ch2|s?jN6X&nND^=iFWYP5*jXs*4${2WWY!PSo7m?$nN!ch~X$ z_pr-_-}tqB98lk%rWkFDWa9exetNt2j&}KWX^tx37qw53f6n5KtYVQokRG=sQ-_~# zaVDC*z+?PG?Y@b_PmKpW9?7k^bI~DrE%Mpo5fkbQycp72nrROa*zu~Vn^ut*HS&0% zQ3}zhf_&Qf1#;Fvpl;vXP_fj0ik0Nn;w|h+3;V-TSoq`k+h*S^Cklcey%)cUerd5Tjn#Z~l00oV`hf*^l6yPet8Dn+Y-=?#}iUglQcG zWGb2`|9;358>xYd$H=49IpSJOM+A7SvR!Yb&xJ3%^Q^99#s5T|6ZZsVYIL*m zK+xA%{%2fXMPkT83CjqVW};DLLzkV4`n{I`Vf4#mmYj=kX4v$D zCpX>bDfii0A}Ud~UIhwE6M0I?F@0L0hsp!3kG5~yy4vN2hc&?fZPwJiK&A72>y%O+ z&ZY(Du1kBZkLbnLP@4BsPYyfqyd$B>qJ)Ai4ed@l_ijYG5o1Wp9&t(Ns~2rY=C0

-i`a!sO)fnjIgj_m%eJ>8c?w-RQ*>aB?4lgx_Z8Gf_6ot#;jm-m>EE*pVbcv|+NW~Tqbgx80fgbU8)&JVYj z9x1%Mcj^4{x4giBXn>Z4+pOdOkVD=%RH9l*Pty0K>Ic^yNjIVJK^08B^6~kg$kWHnE>EmG8Bi#`jq0o8m*7)Cn4y z*Npy&s5Tw<$c5HxM>;Z9Y-lTuB9pksE!gi-v&|++o7AM-w{>sYF2@WRjP-w-n>tkf zLOU%dy&b4G@#<(B?4dTN_W{ASeN=0{$r|+Axj}H@7MrH06fEk>q}DuM(Wz)6Gx-&q zi+4dzVlDCguQ4#7d%D{fF4=~JEUw&WX1SMD&ywNS0jXQ31Wi#+u@sB9UF0t1XnrEUTyF9KPg)4#y$k8ntB}~3ozE9OtuCsvasg*?Hvo_9p6p!DW5q_Z zpc+&dx8R(nbT9x{eyjLN%mb6Ehn}+!VB1`pLf7aAFW;~XvUY~`vQ)VkG)YD^y5+n1 zT_Elo%)FK%FldtUl*?$;dP7?UgUu~$-I=pD61B|6=0Ohe2HO?tdg`r){ccAw^E5p< z1l=>=K^z30k>j>`J9=ndGs&Z=%r`*YzlQhr{|WXmL62f>&|pCzBJ^G5zZUEri;q@ zGA8*Nmzto12>^HoN1B*L>F~~QT}Y9~(N1bPF>Fi=b+ZCc-WWi$pchdnVKyr85fuSh z1^3{3_qJXH;7-)uumBFzUIf$jRMKAHWVK+YnqJMa`3vI3p(Z$ha&_@F9S_DOScFJ> zmQ{p^Y8&HA0p)nyq>cx;;*|=jUgEAUX{;V7EP*kLH>u~3C--#%OYXsGtY{4ByE5~* z!tV>~%b;kPbKmbK2~;n!>ss|yvv$bnHE|Y_IXqCZ?xEd;AT{06JD2?1K+;O0!$xAa z(kE$?irSc(@l=>splrB$1xP$$6Tqg(cJ(MSt5sUnN-a`pATWxS`K9}N+9UxB{?|Gn z-&C?!EsLcS&7MQD@csCD7i;Q?LOnG-;Hy2l4c!MWk%12njW-5Duap}(gvpvP4T+-1 zqrs)@ot7p&YjrTaO`*9sp{*ErJ-2*jT&DMzibq|UDCp=E}siD z;m+Xv+ieC!;>Oe`t`n!@jMv54eHuV&fP<_tmUiGz;lS=f>ztT{w#0&%)PcfR{j?nc z0%m$glISt+2^#d6Ah0!NAJ1S|f>G_!{*vBk1GhHcm$%TQ^$GD-3=Ib{p zt0wEYtM^_jpk251k9s^p)p4vfc8m{DP&&qEB#9M5 z7aLRxTV;RYGg`!ophhm21A&3>^CA{hRz-`Rlyn33M<&Ms`?Ud{g8L%@tr;Z4B4~h~ zQNrNB57;`pLtRYmN+0L5rEfH(oAm-XTCl;)akFZvEDHZjP>qJYgj=#M!S|A2sciQ% zLEK9QQc7+jMLLkWL{caVgy$W^W{PE5=;nFcz$vRau7?^1D)l^H7qw-c2S}B0`&$@5 zgW=5Bs4FB1TV`yxCJUBKLz;p~mE<`Y2KH_JUHLo7DYQ^f7<80Jg9GWy28 zg6RfoK&n6rSQq9SMKgA@o*5?*!a{l(V`BjAiCvGI`8ymgo8TKoVR4{rFqBlg8U})- z6U_f=lE0_`skcvJM*POtoOXo@-`Rs3ww8DMtv(1u+!8Oa9;wLoWHoV54@d z%^{DaO^sT+#uI0ycAk#67s&Pn?SI6wP}NGVF9ovP_1hE4jd~k*R&C+iS|CE~u;N7{ zX-p$7$`;;OD-0qS7Kg^&1wV9{m4lJ5Mu`v@2;^Bdf~lMDM5R5Ke&X%+wvmL4yI@I% zIAgNmEypITIk`TNCyNs*Df@B|T!8>?(vk0U#_rlho7E~-NhS}z5256$W8_dbsB-YL z!(G|-RoQpsLk`3QM^<&dIE6fN-`$*3bl)|V%CQF!cUJ~h@9E1@p=X@BJ5*m>F`Qu= zWUGE@Jh(aqea=Xp;+HlPuPZ1OMXy+>%Gmgt!7q;1tDtQh9Igr1g}JR5z|RDeLG5>K z6)01M)+*_y>OSlS0_@1mk4Th2eD|FS#?@u2f>H&6t{P(j45P@wg-CoT^+=>Wpe0T% zJ*odOL>K`{%iv1H|Bj{>+a1C$0(Lh?6?s0EzZ)n9AXqqtSwB{5i{$9Q4{D~git(_+ zkz$^O24aQ2Dj+K|K)h}+XS~h6B)ylf0iUY?2FG`QZ&%IMceZ{4MqTcj-P5HYfb@^C z1i^Z-@Vw{{EZ`Me*X_Zq^E7H%NjltXw%{nYd|%rI zjk~@hv@^Shr;gR<-($wC`svWual7)D*7pJ}a;}^0<_q{g);#Bg-%g9IQjU+q`0Owmd6{f$+@?wPEkje1A6j@J<+C2oGAOn z1Lf2;c^*F!Ph8`PN}iF8vp1v7+%ca!5*U2i_c=du-x*msK8==GPjGJ!_qwGA`}e+{06!qLd4iUXHMCa3_}W#>wbD9KNS(AW zSSVF!8A-IrNG*&ybhVR2Hz+wA>US<*s+|r+MUWha^0d3x%YM}DwZo>lVW<}EdB+7m1ob$I|jt{5yWMre%CbpP_M-QetZ3

  • CIdV`4|aN%bpB1B|T9Rp>kZFTZB z{c4s|7#1aGxz^J|uv`mA98vFmd>b4UvP*wFoNZL4yhyK(^Kp|DbdQV;umab~+`Pwf?73!;Cj%u;; za`C47$zOs}X?mnSMFvKx;z2S<3pxH2BkBC%_u6ql^R;Du!bP>*r09X?YrB_NM>Erj z2UuZ(Et>c~@v-all080q+=Q|_ouTRUM1x)1?LNmKW%-6F2|Y^EuL!3O1xAadV8^0S z{kR9&ehGXf+D?cYOoQDkCHqC&?FElmD0a$E5xBNA{p2D(=3Na5-!eYtz3o+WDK5BK ze#_LFy>Ql?_(1ccWT#^d_#xj7fttQ==gEDZnoiD!eAYJh?2-J55GM@Dsc9ln(@6+Y zqTU$)A{dZ7T&usfR+E&>tS)xYcfAgEne7Un{P=1_Zm`aN2ihq52dTA-FD7fq7418i zQ|{?W`@OfV;zQFOXi#+5`WegOWv$(L$8YDJ&!UgcAwwW9Mc70x z1D!K}L3w}LuMlH~;`^IG+94m?)?KUf7We~ip-*MyHvHed?M1-zZDhY|dJe!)rHiK) zVvmQ@5{fm}Fmr3BCI+uwnsNGskIxWK6USPc9n)PWr=+^@@tkLxq0cbq6Hj%jZ}jW` zu01n49CiHcUReHE<197|75@i)_Pz#<7Rxob+=4lY0Q9=0HH zY=waXJlVtY?Bvqx`+}$6Op`g3KOe-KwY*V&7ovUkc2EA(3C#?i{=B!RHIS_u=COrO zPINd;+ diff --git a/doc/rosePaper/rose.tex b/doc/rosePaper/rose.tex index 9bba4a7..21910ab 100644 --- a/doc/rosePaper/rose.tex +++ b/doc/rosePaper/rose.tex @@ -1729,7 +1729,7 @@ dataset. \rows merged $C0$ and $C1$ 59 times and merged $C1$ and $C2$ 15 times. At the end of the run (132 million tuple insertions) $C2$ took up 2.8GB and $C1$ was 250MB. The actual page -file was 8.7GB, and the minimum possible size was 6GB.\xxx{rerun to confirm pagefile size!} InnoDB used +file was 8.0GB, and the minimum possible size was 6GB. InnoDB used 5.3GB after 53 million tuple insertions. @@ -1755,10 +1755,9 @@ throughput. Figure~\ref{fig:avg-tup} shows tuple insertion times for \rows and InnoDB. The ``\rows (instantaneous)'' line reports insertion times averaged over 100,000 insertions, while the other lines are averaged -over the entire run. The large spikes in instantaneous tuple -insertion times occur periodically throughput the run, though the -figure is truncated to show the first 75 million insertions.\xxx{show - the whole run???} The spikes occur when an insertion blocks waiting +over the entire run. +The periodic spikes in instantaneous tuple +insertion times occur when an insertion blocks waiting for a tree merge to complete. This happens when one copy of $C0$ is full and the other one is being merged with $C1$. Admission control would provide consistent insertion times. @@ -1876,13 +1875,16 @@ join and projection of the TPC-H dataset. We use the schema described in Table~\ref{tab:tpc-schema}, and populate the table by using a scale factor of 30 and following the random distributions dictated by the TPC-H specification. The schema for this experiment is designed to -have poor locality for updates. +have poor update locality. Updates from customers are grouped by -order id. -This schema forces the database to permute these updates -into an order more interesting to suppliers; the index is sorted by -product and date, providing inexpensive access to lists of orders to +order id, but the index is sorted by product and date. +This forces the database to permute these updates +into an order that would provide suppliers with +% more interesting to suppliers +%the index is sorted by +%product and date, +inexpensive access to lists of orders to be filled and historical sales information for each product. We generate a dataset containing a list of product orders, and insert @@ -1925,8 +1927,8 @@ of PFOR useless. These fields change frequently enough to limit the effectiveness of run length encoding. Both of these issues would be addressed by bit packing. Also, occasionally re-evaluating and modifying compression strategies is known to improve compression of TPC-H data. -which is clustered in the last few weeks of years during the -20th century.\xxx{check} +TPC-H dates are clustered during weekdays, from 1995-2005, and around +Mother's Day and the last few weeks of each year. \begin{table} \caption{TPC-C/H schema} @@ -1980,7 +1982,9 @@ of experiments, which we call ``Lookup C0,'' the order status query only examines $C0$. In the other, which we call ``Lookup all components,'' we force each order status query to examine every tree component. This keeps \rows from exploiting the fact that most order -status queries can be serviced from $C0$. +status queries can be serviced from $C0$. Finally, \rows provides +versioning for this test; though its garbage collection code is +executed, it never collects overwritten or deleted tuples. %% The other type of query we process is a table scan that could be used %% to track the popularity of each part over time. We know that \rowss @@ -2143,7 +2147,7 @@ are long enough to guarantee good sequential scan performance. \rows always allocates regions of the same length, guaranteeing that Stasis can reuse all freed regions before extending the page file. This can waste nearly an entire region per component, which does not -matter in \rows, but could be a significant overhead for a system with +matter in \rows, but could be significant to systems with many small partitions. Some LSM-tree implementations do not support concurrent insertions, @@ -2187,12 +2191,12 @@ memory. LSM-trees can service delayed LSM-tree index scans without performing additional I/O. Queries that request table scans wait for the merge processes to make a pass over the index. -By combining this idea with lazy merging an LSM-tree could service +By combining this idea with lazy merging an LSM-tree implementation +could service range scans immediately without significantly increasing the amount of I/O performed by the system. \subsection{Row-based database compression} -\xxx{shorten?} Row-oriented database compression techniques compress each tuple individually and sometimes ignore similarities between adjacent tuples. One such approach compresses low cardinality data by building @@ -2202,12 +2206,11 @@ compression and decompression. Other approaches include NULL suppression, which stores runs of NULL values as a single count and leading zero suppression which stores integers in a variable length format that does not store zeros before the first non-zero digit of each -number. Row-based schemes typically allow for easy decompression of -individual tuples. Therefore, they generally store the offset of each -tuple explicitly at the head of each page. +number. Row oriented compression schemes typically provide efficient random access to +tuples, often by explicitly storing tuple offsets at the head of each page. Another approach is to compress page data using a generic compression -algorithm, such as gzip. The primary drawback to this approach is +algorithm, such as gzip. The primary drawback of this approach is that the size of the compressed page is not known until after compression. Also, general purpose compression techniques typically do not provide random access within pages and are often more processor @@ -2225,7 +2228,7 @@ effectiveness of simple, special purpose, compression schemes. PFOR was introduced as an extension to MonetDB~\cite{pfor}, a column-oriented database, along with two other formats. PFOR-DELTA is similar to PFOR, but stores differences between values as -deltas.\xxx{check} PDICT encodes columns as keys and a dictionary that +deltas. PDICT encodes columns as keys and a dictionary that maps to the original values. We plan to add both these formats to \rows in the future. We chose to implement RLE and PFOR because they provide high compression and decompression bandwidth. Like MonetDB,