merged suggestions from eric
This commit is contained in:
parent
58e0466339
commit
c5993556ad
1 changed files with 27 additions and 26 deletions
|
@ -83,11 +83,11 @@ transactions. Here, we apply it to archival of weather data.
|
||||||
A \rows replica serves two purposes. First, by avoiding seeks, \rows
|
A \rows replica serves two purposes. First, by avoiding seeks, \rows
|
||||||
reduces the load on the replicas' disks, leaving surplus I/O capacity
|
reduces the load on the replicas' disks, leaving surplus I/O capacity
|
||||||
for read-only queries and allowing inexpensive hardware to handle
|
for read-only queries and allowing inexpensive hardware to handle
|
||||||
workloads produced by specialized database machines. This allows
|
workloads produced by database machines with tens of disks. This
|
||||||
decision support and OLAP queries to scale linearly with the number of
|
allows decision support and OLAP queries to scale linearly with the
|
||||||
machines, regardless of lock contention and other bottlenecks
|
number of machines, regardless of lock contention and other
|
||||||
associated with distributed transactions. Second, \rows replica
|
bottlenecks associated with distributed transactions. Second, \rows
|
||||||
groups provide highly available copies of the database. In
|
replica groups provide highly available copies of the database. In
|
||||||
Internet-scale environments, decision support queries may be more
|
Internet-scale environments, decision support queries may be more
|
||||||
important than update availability.
|
important than update availability.
|
||||||
|
|
||||||
|
@ -275,7 +275,6 @@ Conceptually, when the merge is complete, $C1$ is atomically replaced
|
||||||
with the new tree, and $C0$ is atomically replaced with an empty tree.
|
with the new tree, and $C0$ is atomically replaced with an empty tree.
|
||||||
The process is then eventually repeated when $C1$ and $C2$ are merged.
|
The process is then eventually repeated when $C1$ and $C2$ are merged.
|
||||||
At that point, the insertion will not cause any more I/O operations.
|
At that point, the insertion will not cause any more I/O operations.
|
||||||
Therefore, each index insertion causes $2 R$ tuple comparisons.
|
|
||||||
|
|
||||||
Although our prototype replaces entire trees at once, this approach
|
Although our prototype replaces entire trees at once, this approach
|
||||||
introduces a number of performance problems. The original LSM work
|
introduces a number of performance problems. The original LSM work
|
||||||
|
@ -413,14 +412,16 @@ LSM-tree outperforms the B-tree when:
|
||||||
on a machine that can store 1 GB in an in-memory tree, this yields a
|
on a machine that can store 1 GB in an in-memory tree, this yields a
|
||||||
maximum ``interesting'' tree size of $R^2*1GB = $ 100 petabytes, well
|
maximum ``interesting'' tree size of $R^2*1GB = $ 100 petabytes, well
|
||||||
above the actual drive capacity of $750~GB$. A $750~GB$ tree would
|
above the actual drive capacity of $750~GB$. A $750~GB$ tree would
|
||||||
have an R of $\sqrt{750}\approx27$; we would expect such a tree to
|
have a $C2$ component 750 times larger than the 1GB $C0$ component.
|
||||||
have a sustained insertion throughput of approximately 8000 tuples /
|
Therefore, it would have an R of $\sqrt{750}\approx27$; we would
|
||||||
second, or 800 kbyte/sec\footnote{It would take 11 days to overwrite
|
expect such a tree to have a sustained insertion throughput of
|
||||||
every tuple on the drive in random order.}; two orders of magnitude
|
approximately 8000 tuples / second, or 800 kbyte/sec\footnote{It would
|
||||||
above the 83 I/O operations that the drive can deliver per second, and
|
take 11 days to overwrite every tuple on the drive in random
|
||||||
well above the 41.5 tuples / sec we would expect from a B-tree with a
|
order.}; two orders of magnitude above the 83 I/O operations that
|
||||||
$18.5~GB$ buffer pool. Increasing \rowss system memory to cache 10 GB of
|
the drive can deliver per second, and well above the 41.5 tuples / sec
|
||||||
tuples would increase write performance by a factor of $\sqrt{10}$.
|
we would expect from a B-tree with a $18.5~GB$ buffer pool.
|
||||||
|
Increasing \rowss system memory to cache 10 GB of tuples would
|
||||||
|
increase write performance by a factor of $\sqrt{10}$.
|
||||||
|
|
||||||
% 41.5/(1-80/750) = 46.4552239
|
% 41.5/(1-80/750) = 46.4552239
|
||||||
|
|
||||||
|
@ -773,7 +774,7 @@ code-generation utilities. We found that this set of optimizations
|
||||||
improved compression and decompression performance by roughly an order
|
improved compression and decompression performance by roughly an order
|
||||||
of magnitude. To illustrate this, Table~\ref{table:optimization}
|
of magnitude. To illustrate this, Table~\ref{table:optimization}
|
||||||
compares compressor throughput with and without compiler optimizations
|
compares compressor throughput with and without compiler optimizations
|
||||||
enabled. While compressor throughput varies with data distributions
|
enabled. Although compressor throughput varies with data distributions
|
||||||
and type, optimizations yield a similar performance improvement across
|
and type, optimizations yield a similar performance improvement across
|
||||||
varied datasets and random data distributions.
|
varied datasets and random data distributions.
|
||||||
|
|
||||||
|
@ -804,24 +805,24 @@ generally useful) callbacks. The first, {\tt pageLoaded()}
|
||||||
instantiates a new multicolumn page implementation when the page is
|
instantiates a new multicolumn page implementation when the page is
|
||||||
first read into memory. The second, {\tt pageFlushed()} informs our
|
first read into memory. The second, {\tt pageFlushed()} informs our
|
||||||
multicolumn implementation that the page is about to be written to
|
multicolumn implementation that the page is about to be written to
|
||||||
disk, while the third {\tt pageEvicted()} invokes the multicolumn
|
disk, and the third {\tt pageEvicted()} invokes the multicolumn
|
||||||
destructor. (XXX are these really non-standard?)
|
destructor. (XXX are these really non-standard?)
|
||||||
|
|
||||||
As we mentioned above, pages are split into a number of temporary
|
As we mentioned above, pages are split into a number of temporary
|
||||||
buffers while they are being written, and are then packed into a
|
buffers while they are being written, and are then packed into a
|
||||||
contiguous buffer before being flushed. While this operation is
|
contiguous buffer before being flushed. Although this operation is
|
||||||
expensive, it does present an opportunity for parallelism. \rows
|
expensive, it does present an opportunity for parallelism. \rows
|
||||||
provides a per-page operation, {\tt pack()} that performs the
|
provides a per-page operation, {\tt pack()} that performs the
|
||||||
translation. We can register {\tt pack()} as a {\tt pageFlushed()}
|
translation. We can register {\tt pack()} as a {\tt pageFlushed()}
|
||||||
callback or we can explicitly call it during (or shortly after)
|
callback or we can explicitly call it during (or shortly after)
|
||||||
compression.
|
compression.
|
||||||
|
|
||||||
While {\tt pageFlushed()} could be safely executed in a background
|
{\tt pageFlushed()} could be safely executed in a background thread
|
||||||
thread with minimal impact on system performance, the buffer manager
|
with minimal impact on system performance. However, the buffer
|
||||||
was written under the assumption that the cost of in-memory operations
|
manager was written under the assumption that the cost of in-memory
|
||||||
is negligible. Therefore, it blocks all buffer management requests
|
operations is negligible. Therefore, it blocks all buffer management
|
||||||
while {\tt pageFlushed()} is being executed. In practice, this causes
|
requests while {\tt pageFlushed()} is being executed. In practice,
|
||||||
multiple \rows threads to block on each {\tt pack()}.
|
this causes multiple \rows threads to block on each {\tt pack()}.
|
||||||
|
|
||||||
Also, {\tt pack()} reduces \rowss memory utilization by freeing up
|
Also, {\tt pack()} reduces \rowss memory utilization by freeing up
|
||||||
temporary compression buffers. Delaying its execution for too long
|
temporary compression buffers. Delaying its execution for too long
|
||||||
|
@ -865,13 +866,13 @@ bytes. A frame of reference column header consists of 2 bytes to
|
||||||
record the number of encoded rows and a single uncompressed
|
record the number of encoded rows and a single uncompressed
|
||||||
value. Run length encoding headers consist of a 2 byte count of
|
value. Run length encoding headers consist of a 2 byte count of
|
||||||
compressed blocks. Therefore, in the worst case (frame of reference
|
compressed blocks. Therefore, in the worst case (frame of reference
|
||||||
encoding 64bit integers, and \rowss 4KB pages) our prototype's
|
encoding 64-bit integers, and \rowss 4KB pages) our prototype's
|
||||||
multicolumn format uses $14/4096\approx0.35\%$ of the page to store
|
multicolumn format uses $14/4096\approx0.35\%$ of the page to store
|
||||||
each column header. If the data does not compress well, and tuples
|
each column header. If the data does not compress well, and tuples
|
||||||
are large, additional storage may be wasted because \rows does not
|
are large, additional storage may be wasted because \rows does not
|
||||||
split tuples across pages. Tables~\ref{table:treeCreation}
|
split tuples across pages. Tables~\ref{table:treeCreation}
|
||||||
and~\ref{table:treeCreationTwo} (which draw column values from
|
and~\ref{table:treeCreationTwo}, which draw column values from
|
||||||
independent, identical distributions) show that \rowss compression
|
independent, identical distributions, show that \rowss compression
|
||||||
ratio can be significantly impacted by large tuples.
|
ratio can be significantly impacted by large tuples.
|
||||||
|
|
||||||
% XXX graph of some sort to show this?
|
% XXX graph of some sort to show this?
|
||||||
|
|
Loading…
Reference in a new issue