Improved graph printability, fixed remaining todo's in rose.tex.
This commit is contained in:
parent
d9b2ee7c32
commit
a07852007e
2 changed files with 24 additions and 21 deletions
Binary file not shown.
|
@ -1729,7 +1729,7 @@ dataset.
|
||||||
\rows merged $C0$ and $C1$ 59 times and merged $C1$ and $C2$ 15 times.
|
\rows merged $C0$ and $C1$ 59 times and merged $C1$ and $C2$ 15 times.
|
||||||
At the end of the run (132 million tuple insertions) $C2$ took up
|
At the end of the run (132 million tuple insertions) $C2$ took up
|
||||||
2.8GB and $C1$ was 250MB. The actual page
|
2.8GB and $C1$ was 250MB. The actual page
|
||||||
file was 8.7GB, and the minimum possible size was 6GB.\xxx{rerun to confirm pagefile size!} InnoDB used
|
file was 8.0GB, and the minimum possible size was 6GB. InnoDB used
|
||||||
5.3GB after 53 million tuple insertions.
|
5.3GB after 53 million tuple insertions.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1755,10 +1755,9 @@ throughput.
|
||||||
Figure~\ref{fig:avg-tup} shows tuple insertion times for \rows and InnoDB.
|
Figure~\ref{fig:avg-tup} shows tuple insertion times for \rows and InnoDB.
|
||||||
The ``\rows (instantaneous)'' line reports insertion times
|
The ``\rows (instantaneous)'' line reports insertion times
|
||||||
averaged over 100,000 insertions, while the other lines are averaged
|
averaged over 100,000 insertions, while the other lines are averaged
|
||||||
over the entire run. The large spikes in instantaneous tuple
|
over the entire run.
|
||||||
insertion times occur periodically throughput the run, though the
|
The periodic spikes in instantaneous tuple
|
||||||
figure is truncated to show the first 75 million insertions.\xxx{show
|
insertion times occur when an insertion blocks waiting
|
||||||
the whole run???} The spikes occur when an insertion blocks waiting
|
|
||||||
for a tree merge to complete. This happens when one copy of $C0$ is
|
for a tree merge to complete. This happens when one copy of $C0$ is
|
||||||
full and the other one is being merged with $C1$. Admission control
|
full and the other one is being merged with $C1$. Admission control
|
||||||
would provide consistent insertion times.
|
would provide consistent insertion times.
|
||||||
|
@ -1876,13 +1875,16 @@ join and projection of the TPC-H dataset. We use the schema described
|
||||||
in Table~\ref{tab:tpc-schema}, and populate the table by using a scale
|
in Table~\ref{tab:tpc-schema}, and populate the table by using a scale
|
||||||
factor of 30 and following the random distributions dictated by the
|
factor of 30 and following the random distributions dictated by the
|
||||||
TPC-H specification. The schema for this experiment is designed to
|
TPC-H specification. The schema for this experiment is designed to
|
||||||
have poor locality for updates.
|
have poor update locality.
|
||||||
|
|
||||||
Updates from customers are grouped by
|
Updates from customers are grouped by
|
||||||
order id.
|
order id, but the index is sorted by product and date.
|
||||||
This schema forces the database to permute these updates
|
This forces the database to permute these updates
|
||||||
into an order more interesting to suppliers; the index is sorted by
|
into an order that would provide suppliers with
|
||||||
product and date, providing inexpensive access to lists of orders to
|
% more interesting to suppliers
|
||||||
|
%the index is sorted by
|
||||||
|
%product and date,
|
||||||
|
inexpensive access to lists of orders to
|
||||||
be filled and historical sales information for each product.
|
be filled and historical sales information for each product.
|
||||||
|
|
||||||
We generate a dataset containing a list of product orders, and insert
|
We generate a dataset containing a list of product orders, and insert
|
||||||
|
@ -1925,8 +1927,8 @@ of PFOR useless. These fields change frequently enough to limit the
|
||||||
effectiveness of run length encoding. Both of these issues would be
|
effectiveness of run length encoding. Both of these issues would be
|
||||||
addressed by bit packing. Also, occasionally re-evaluating and modifying
|
addressed by bit packing. Also, occasionally re-evaluating and modifying
|
||||||
compression strategies is known to improve compression of TPC-H data.
|
compression strategies is known to improve compression of TPC-H data.
|
||||||
which is clustered in the last few weeks of years during the
|
TPC-H dates are clustered during weekdays, from 1995-2005, and around
|
||||||
20th century.\xxx{check}
|
Mother's Day and the last few weeks of each year.
|
||||||
|
|
||||||
\begin{table}
|
\begin{table}
|
||||||
\caption{TPC-C/H schema}
|
\caption{TPC-C/H schema}
|
||||||
|
@ -1980,7 +1982,9 @@ of experiments, which we call ``Lookup C0,'' the order status query
|
||||||
only examines $C0$. In the other, which we call ``Lookup all
|
only examines $C0$. In the other, which we call ``Lookup all
|
||||||
components,'' we force each order status query to examine every tree
|
components,'' we force each order status query to examine every tree
|
||||||
component. This keeps \rows from exploiting the fact that most order
|
component. This keeps \rows from exploiting the fact that most order
|
||||||
status queries can be serviced from $C0$.
|
status queries can be serviced from $C0$. Finally, \rows provides
|
||||||
|
versioning for this test; though its garbage collection code is
|
||||||
|
executed, it never collects overwritten or deleted tuples.
|
||||||
|
|
||||||
%% The other type of query we process is a table scan that could be used
|
%% The other type of query we process is a table scan that could be used
|
||||||
%% to track the popularity of each part over time. We know that \rowss
|
%% to track the popularity of each part over time. We know that \rowss
|
||||||
|
@ -2143,7 +2147,7 @@ are long enough to guarantee good sequential scan performance.
|
||||||
\rows always allocates regions of the same length, guaranteeing that
|
\rows always allocates regions of the same length, guaranteeing that
|
||||||
Stasis can reuse all freed regions before extending the page file.
|
Stasis can reuse all freed regions before extending the page file.
|
||||||
This can waste nearly an entire region per component, which does not
|
This can waste nearly an entire region per component, which does not
|
||||||
matter in \rows, but could be a significant overhead for a system with
|
matter in \rows, but could be significant to systems with
|
||||||
many small partitions.
|
many small partitions.
|
||||||
|
|
||||||
Some LSM-tree implementations do not support concurrent insertions,
|
Some LSM-tree implementations do not support concurrent insertions,
|
||||||
|
@ -2187,12 +2191,12 @@ memory.
|
||||||
LSM-trees can service delayed
|
LSM-trees can service delayed
|
||||||
LSM-tree index scans without performing additional I/O. Queries that request table scans wait for
|
LSM-tree index scans without performing additional I/O. Queries that request table scans wait for
|
||||||
the merge processes to make a pass over the index.
|
the merge processes to make a pass over the index.
|
||||||
By combining this idea with lazy merging an LSM-tree could service
|
By combining this idea with lazy merging an LSM-tree implementation
|
||||||
|
could service
|
||||||
range scans immediately without significantly increasing the amount of
|
range scans immediately without significantly increasing the amount of
|
||||||
I/O performed by the system.
|
I/O performed by the system.
|
||||||
|
|
||||||
\subsection{Row-based database compression}
|
\subsection{Row-based database compression}
|
||||||
\xxx{shorten?}
|
|
||||||
Row-oriented database compression techniques compress each tuple
|
Row-oriented database compression techniques compress each tuple
|
||||||
individually and sometimes ignore similarities between adjacent
|
individually and sometimes ignore similarities between adjacent
|
||||||
tuples. One such approach compresses low cardinality data by building
|
tuples. One such approach compresses low cardinality data by building
|
||||||
|
@ -2202,12 +2206,11 @@ compression and decompression. Other approaches include NULL
|
||||||
suppression, which stores runs of NULL values as a single count and
|
suppression, which stores runs of NULL values as a single count and
|
||||||
leading zero suppression which stores integers in a variable length
|
leading zero suppression which stores integers in a variable length
|
||||||
format that does not store zeros before the first non-zero digit of each
|
format that does not store zeros before the first non-zero digit of each
|
||||||
number. Row-based schemes typically allow for easy decompression of
|
number. Row oriented compression schemes typically provide efficient random access to
|
||||||
individual tuples. Therefore, they generally store the offset of each
|
tuples, often by explicitly storing tuple offsets at the head of each page.
|
||||||
tuple explicitly at the head of each page.
|
|
||||||
|
|
||||||
Another approach is to compress page data using a generic compression
|
Another approach is to compress page data using a generic compression
|
||||||
algorithm, such as gzip. The primary drawback to this approach is
|
algorithm, such as gzip. The primary drawback of this approach is
|
||||||
that the size of the compressed page is not known until after
|
that the size of the compressed page is not known until after
|
||||||
compression. Also, general purpose compression techniques typically
|
compression. Also, general purpose compression techniques typically
|
||||||
do not provide random access within pages and are often more processor
|
do not provide random access within pages and are often more processor
|
||||||
|
@ -2225,7 +2228,7 @@ effectiveness of simple, special purpose, compression schemes.
|
||||||
PFOR was introduced as an extension to
|
PFOR was introduced as an extension to
|
||||||
MonetDB~\cite{pfor}, a column-oriented database, along with two other
|
MonetDB~\cite{pfor}, a column-oriented database, along with two other
|
||||||
formats. PFOR-DELTA is similar to PFOR, but stores differences between values as
|
formats. PFOR-DELTA is similar to PFOR, but stores differences between values as
|
||||||
deltas.\xxx{check} PDICT encodes columns as keys and a dictionary that
|
deltas. PDICT encodes columns as keys and a dictionary that
|
||||||
maps to the original values. We plan to add both these formats to
|
maps to the original values. We plan to add both these formats to
|
||||||
\rows in the future. We chose to implement RLE and PFOR because they
|
\rows in the future. We chose to implement RLE and PFOR because they
|
||||||
provide high compression and decompression bandwidth. Like MonetDB,
|
provide high compression and decompression bandwidth. Like MonetDB,
|
||||||
|
|
Loading…
Reference in a new issue