Rows sumbitted paper.

2007-11-16 18:27:36 +00:00 · 2007-11-16 18:27:36 +00:00 · 588b9a8b25
commit 588b9a8b25
parent 3afe34ece8
3 changed files with 84 additions and 81 deletions
--- a/doc/rosePaper/rose.tex
+++ b/doc/rosePaper/rose.tex
@ -78,16 +78,16 @@ processing queries.  \rows uses {\em log structured merge} (LSM) trees
 to create full database replicas using purely sequential I/O.  It
 provides access to inconsistent data in real-time and consistent data
 with a few seconds delay.  \rows was written to support micropayment
-transactions.  Here, we apply it to archival of weather data.
+transactions.  

 A \rows replica serves two purposes.  First, by avoiding seeks, \rows
-reduces the load on the replicas' disks, leaving surplus I/O capacity
-for read-only queries and allowing inexpensive hardware to replicate
-workloads produced by machines that are equipped with many disks.
-Read-only replication allows decision support and OLAP queries to
+reduces the load on the replicas' disks.  This leaves surplus I/O capacity
+for read-only queries and allows inexpensive hardware to replicate
+workloads produced by expensive machines that are equipped with many disks.
+Affordable, read-only replication allows decision support and OLAP queries to
 scale linearly with the number of machines, regardless of lock
 contention and other bottlenecks associated with distributed
-transactions.  Second, a group of \rows replicas provides a highly
+transactional updates.  Second, a group of \rows replicas provides a highly
 available copy of the database.  In many Internet-scale environments,
 decision support queries are more important than update availability.

@ -159,25 +159,24 @@ reconcile inconsistencies in their results.  If the queries are
 too expensive to run on master database instances they are delegated
 to data warehousing systems and produce stale results.

-In order to address these issues, \rows gives up the ability to
+In order to address the needs of such queries, \rows gives up the ability to
 directly process SQL updates.  In exchange, it is able to replicate
 conventional database instances at a small fraction of the cost of a
 general-purpose database server.

 Like a data warehousing solution, this decreases the cost of large,
 read-only analytical processing and decision support queries, and scales to extremely
-large database instances with high-throughput updates.  Unlike a data
-warehousing solution, \rows does this without introducing significant
+large database instances with high-throughput updates.  Unlike data
+warehousing solutions, \rows does this without introducing significant
 replication latency.

 Conventional database replicas provide low-latency replication at a
-cost comparable to the database instances being replicated.  This
+cost comparable to that of the master database.  The expense associated with such systems
 prevents conventional database replicas from scaling.  The additional
 read throughput they provide is nearly as expensive as read throughput
 on the master.  Because their performance is comparable to that of the
 master database, they are unable to consolidate multiple database
-instances for centralized processing.  \rows suffers from neither of
-these limitations.
+instances for centralized processing.

 Unlike existing systems, \rows provides inexpensive, low-latency, and
 scalable replication of write-intensive relational databases,
@ -185,19 +184,20 @@ regardless of workload contention, database size, or update patterns.

 \subsection{Fictional \rows deployment}

-Imagine a classic, disk bound TPC-C installation.  On modern hardware,
+Imagine a classic, disk-bound TPC-C installation.  On modern hardware,
 such a system would have tens of disks, and would be seek limited.
-Consider the problem of producing a read-only low-latency replica of
+Consider the problem of producing a read-only, low-latency replica of
 the system for analytical processing, decision support, or some other
 expensive read-only workload.  If the replica uses the same storage
 engine as the master, its hardware resources would be comparable to
 (certainly within an order of magnitude) those of the master database
-instances.  As this paper shows, the I/O cost of maintaining a \rows
-replica can be less than 1\% of the cost of maintaining the master
-database.
+instances.  Worse, a significant fraction of these resources would be
+devoted to replaying updates from the master.  As we show below,
+the I/O cost of maintaining a \rows replica can be less than 1\% of
+the cost of maintaining the master database.

 Therefore, unless the replica's read-only query workload is seek
-limited, a \rows replica can make due with many fewer disks than the
+limited, a \rows replica requires many fewer disks than the
 master database instance.  If the replica must service seek-limited
 queries, it will likely need to run on a machine similar to the master
 database, but will use almost none of its (expensive) I/O capacity for
@ -209,8 +209,8 @@ pages, increasing the effective size of system memory.
 The primary drawback of this approach is that it roughly doubles the
 cost of each random index lookup.  Therefore, the attractiveness of
 \rows hinges on two factors: the fraction of the workload devoted to
-random tuple lookups, and the premium one must pay for a specialized
-storage hardware.
+random tuple lookups, and the premium one would have paid for a piece
+of specialized storage hardware that \rows replaces.

 \subsection{Paper structure}

@ -256,7 +256,7 @@ computational power for scarce storage resources.
 The replication log should record each transaction {\tt begin}, {\tt commit}, and
 {\tt abort} performed by the master database, along with the pre- and
 post-images associated with each tuple update.  The ordering of these
-entries should match the order in which they are applied at the
+entries must match the order in which they are applied at the
 database master.

 Upon receiving a log entry, \rows applies it to an in-memory tree, and
@ -280,10 +280,10 @@ Section~\ref{sec:isolation}.
 %larger tree.

 In order to look up a tuple stored in \rows, a query must examine all
-three trees, typically starting with the in-memory (fastest, and most
+three tree components, typically starting with the in-memory (fastest, and most
 up-to-date) component, and then moving on to progressively larger and
 out-of-date trees.  In order to perform a range scan, the query can
-either iterate over the trees manually, or wait until the next round
+iterate over the trees manually.  Alternatively, it can wait until the next round
 of merging occurs, and apply the scan to tuples as the mergers examine
 them.  By waiting until the tuples are due to be merged, the
 range-scan can occur with zero I/O cost, at the expense of significant
@ -293,18 +293,18 @@ delay.
 it to continuously process updates and service index lookup requests.
 In order to minimize the overhead of thread synchronization, index
 lookups lock entire tree components at a time.  Because on-disk tree
-components are read-only, these latches only affect tree deletion,
+components are read-only, these latches only block tree deletion,
 allowing merges and lookups to occur concurrently.  $C0$ is
 updated in place, preventing inserts from occurring concurrently with
 merges and lookups.  However, operations on $C0$ are comparatively
 fast, reducing contention for $C0$'s latch.

-Recovery, space management and atomic updates to \rowss metadata is
+Recovery, space management and atomic updates to \rowss metadata are
 handled by an existing transactional storage system.  \rows is
-implemented as an extension to the transaction system, and stores its
+implemented as an extension to the transaction system and stores its
 data in a conventional database page file.  \rows does not use the
 underlying transaction system to log changes to its tree components.
-Instead, it force writes them to disk before commit, ensuring
+Instead, it force writes tree components to disk after each merge completes, ensuring
 durability without significant logging overhead.

 As far as we know, \rows is the first LSM-Tree implementation.  This
@ -314,8 +314,10 @@ analysis of LSM-Tree performance on current hardware (we refer the
 reader to the original LSM work for a thorough analytical discussion
 of LSM performance).  Finally, we explain how our implementation
 provides transactional isolation, exploits hardware parallelism, and
-supports crash recovery.  We defer discussion of \rowss compression
-techniques to the next section.
+supports crash recovery.  These implementation specific details are an
+important contribution of this work; they explain how to adapt
+LSM-Trees to provide high performance database replication.  We defer
+discussion of \rowss compression techniques to the next section.

 \subsection{Tree merging}

@ -394,7 +396,7 @@ practice.  This section describes the impact of compression on B-Tree
 and LSM-Tree performance using (intentionally simplistic) models of
 their performance characteristics.

-Starting with the (more familiar) B-Tree case, in the steady state, we
+Starting with the (more familiar) B-Tree case, in the steady state we
 can expect each index update to perform two random disk accesses (one
 evicts a page, the other reads a page).  Tuple compression does not
 reduce the number of disk seeks:
@ -429,7 +431,7 @@ The $compression~ratio$ is
 $\frac{uncompressed~size}{compressed~size}$, so improved compression
 leads to less expensive LSM-Tree updates.  For simplicity, we assume
 that the compression ratio is the same throughout each component of
-the LSM-Tree; \rows addresses this at run time by reasoning in terms
+the LSM-Tree; \rows addresses this at run-time by reasoning in terms
 of the number of pages used by each component.

 Our test hardware's hard drive is a 7200RPM, 750 GB Seagate Barracuda
@ -514,7 +516,7 @@ in memory, and write approximately $\frac{41.5}{(1-(80/750)} = 46.5$
 tuples/sec.  Increasing memory further yields a system that
 is no longer disk bound.

-Assuming that the system CPUs are fast enough to allow \rows
+Assuming that the CPUs are fast enough to allow \rows
 compression and merge routines to keep up with the bandwidth supplied
 by the disks, we conclude that \rows will provide significantly higher
 replication throughput for disk bound applications.
@ -528,7 +530,7 @@ inserts an entry into the leftmost entry in the tree, allocating
 additional internal nodes if necessary.  Our prototype does not compress
 internal tree nodes\footnote{This is a limitation of our prototype;
  not our approach.  Internal tree nodes are append-only and, at the
-  very least, the page id data is amenable to compression. Like B-Tree
+  very least, the page ID data is amenable to compression. Like B-Tree
  compression, this would decrease the memory used by lookups.},
 so it writes one tuple into the tree's internal nodes per compressed
 page.  \rows inherits a default page size of 4KB from the transaction
@ -541,7 +543,7 @@ to the next highest level, and so on.  See
 Table~\ref{table:treeCreationTwo} for a comparison of compression
 performance with and without tree creation enabled\footnote{Our
  analysis ignores page headers, per-column, and per-tuple overheads;
-  these factors account for the additional indexing overhead}.  The
+  these factors account for the additional indexing overhead.}.  The
 data was generated by applying \rowss compressors to randomly
 generated five column, 1,000,000 row tables.  Across five runs, in
 Table~\ref{table:treeCreation} RLE's page count had a standard
@ -668,31 +670,31 @@ old versions of frequently updated data.

 \subsubsection{Merging and Garbage collection}

-\rows merges components by iterating over them in order, removing
+\rows merges components by iterating over them in order, garbage collecting
 obsolete and duplicate tuples and writing the rest into a new version
 of the largest component.  Because \rows provides snapshot consistency
-to queries, it must be careful not to delete a version of a tuple that
+to queries, it must be careful not to collect a version of a tuple that
 is visible to any outstanding (or future) queries.  Because \rows
 never performs disk seeks to service writes, it handles deletions by
 inserting special tombstone tuples into the tree.  The tombstone's
 purpose is to record the deletion event until all older versions of
-the tuple have been garbage collected.  At that point, the tombstone
-is deleted.
+the tuple have been garbage collected.  Sometime after that point, the tombstone
+is collected as well.

-In order to determine whether or not a tuple can be deleted, \rows
+In order to determine whether or not a tuple can be collected, \rows
 compares the tuple's timestamp with any matching tombstones (or record
 creations, if the tuple is a tombstone), and with any tuples that
-match on primary key.  Upon encountering such candidates for deletion,
+match on primary key.  Upon encountering such candidates for garbage collection,
 \rows compares their timestamps with the set of locked snapshots.  If
 there are no snapshots between the tuple being examined and the
-updated version, then the tuple can be deleted.  Tombstone tuples can
-also be deleted once they reach $C2$ and any older matching tuples
+updated version, then the tuple can be collected.  Tombstone tuples can
+also be collected once they reach $C2$ and any older matching tuples
 have been removed.

 Actual reclamation of pages is handled by the underlying transaction
 system; once \rows completes its scan over existing components (and
 registers new ones in their places), it tells the transaction system
-to reclaim the regions of the page file that stored the components.
+to reclaim the regions of the page file that stored the old components.

 \subsection{Parallelism}

@ -704,7 +706,7 @@ probes into $C1$ and $C2$ are against read-only trees; beyond locating
 and pinning tree components against deallocation, probes of these
 components do not interact with the merge processes.

-Our prototype exploits replication's piplined parallelism by running
+Our prototype exploits replication's pipelined parallelism by running
 each component's merge process in a separate thread.  In practice,
 this allows our prototype to exploit two to three processor cores
 during replication.  Remaining cores could be used by queries, or (as
@ -736,7 +738,7 @@ with Moore's law for the foreseeable future.
 Like other log structured storage systems, \rowss recovery process is
 inexpensive and straightforward.  However, \rows does not attempt to
 ensure that transactions are atomically committed to disk, and is not
-meant to supplement or replace the master database's recovery log.
+meant to replace the master database's recovery log.

 Instead, recovery occurs in two steps.  Whenever \rows writes a tree
 component to disk, it does so by beginning a new transaction in the
@ -744,7 +746,7 @@ underlying transaction system.  Next, it allocates
 contiguous regions of disk pages (generating one log entry per
 region), and performs a B-Tree style bulk load of the new tree into
 these regions (this bulk load does not produce any log entries).
-Finally, \rows forces the tree's regions to disk, and writes the list
+Then, \rows forces the tree's regions to disk, and writes the list
 of regions used by the tree and the location of the tree's root to
 normal (write ahead logged) records.  Finally, it commits the
 underlying transaction.
@ -803,12 +805,7 @@ multicolumn compression approach.

 \rows builds upon compression algorithms that are amenable to
 superscalar optimization, and can achieve throughputs in excess of
-1GB/s on current hardware.  Multicolumn support introduces significant
-overhead; \rowss variants of these approaches typically run within an
-order of magnitude of published speeds.  Table~\ref{table:perf}
-compares single column page layouts with \rowss dynamically dispatched
-(not hardcoded to table format) multicolumn format.
-
+1GB/s on current hardware.

 %sears@davros:~/stasis/benchmarks$ ./rose -n 10 -p 0.01
 %Compression scheme: Time trial (multiple engines)
@ -826,15 +823,15 @@ compares single column page layouts with \rowss dynamically dispatched
 %Multicolumn  (Rle)      416     46.95x      0.377       0.692

 \begin{table}
-\caption{Compressor throughput - Random data, 10 columns (80 bytes/tuple)}
+\caption{Compressor throughput - Random data Mean of 5 runs, $\sigma<5\%$, except where noted}
 \centering
 \label{table:perf}
 \begin{tabular}{|l|c|c|c|} \hline
-Format     & Ratio & Comp. & Decomp. \\ \hline %& Throughput\\ \hline
-PFOR (one column)       &   3.96x      &    544 mb/s     &    2.982 gb/s  \\ \hline %& 133.4 MB/s \\ \hline
-PFOR (multicolumn) & 3.81x  &  253mb/s &   712 mb/s  \\ \hline %& 129.8 MB/s \\ \hline
-RLE (one column)       &   48.83x    &  961 mb/s & 1.593 gb/s \\ \hline %& 150.6 MB/s \\ \hline
-RLE (multicolumn) & 46.95x  & 377 mb/s & 692 mb/s      \\  %& 148.4 MB/s \\
+Format (\#col)    & Ratio & Comp. mb/s & Decomp. mb/s\\ \hline %& Throughput\\ \hline
+PFOR (1)      &    3.96x  &  547  &    2959 \\ \hline %& 133.4 MB/s \\ \hline
+PFOR (10)     &    3.86x  &  256 &      719 \\ \hline %& 129.8 MB/s \\ \hline
+RLE (1)       &   48.83x  &  960  &    1493 $(12\%)$ \\ \hline %& 150.6 MB/s \\ \hline
+RLE (10)      &   47.60x  &  358 $(9\%)$ & 659 $(7\%)$ \\  %& 148.4 MB/s \\
 \hline\end{tabular}
 \end{table}

@ -853,6 +850,13 @@ page formats (and table schemas), this invokes an extra {\tt for} loop
 order to choose between column compressors) to each tuple compression
 request.

+This form of multicolumn support introduces significant overhead;
+these variants of our compression algorithms run significantly slower
+than versions hard-coded to work with single column data.
+Table~\ref{table:perf} compares a fixed-format single column page
+layout with \rowss dynamically dispatched (not custom generated code)
+multicolumn format.
+
 % explain how append works

 \subsection{The {\tt \large append()} operation}
@ -894,7 +898,7 @@ a buffer of uncompressed data and that it is able to make multiple
 passes over the data during compression.  This allows it to remove
 branches from loop bodies, improving compression throughput.  We opted
 to avoid this approach in \rows, as it would increase the complexity
-of the {\tt append()} interface, and add a buffer to \rowss merge processes.
+of the {\tt append()} interface, and add a buffer to \rowss merge threads.

 \subsection{Static code generation}
 % discuss templatization of code
@ -953,7 +957,7 @@ implementation that the page is about to be written to disk, and the
 third {\tt pageEvicted()} invokes the multicolumn destructor.

 We need to register implementations for these functions because the
-transaction system maintains background threads that may evict \rowss
+transaction system maintains background threads that control eviction of \rowss
 pages from memory.  Registering these callbacks provides an extra
 benefit; we parse the page headers, calculate offsets,
 and choose optimized compression routines when a page is read from
@ -978,7 +982,7 @@ this causes multiple \rows threads to block on each {\tt pack()}.
 Also, {\tt pack()} reduces \rowss memory utilization by freeing up
 temporary compression buffers.  Delaying its execution for too long
 might allow this memory to be evicted from processor cache before the
-{\tt memcpy()} can occur.  For all these reasons, our merge processes
+{\tt memcpy()} can occur.  For these reasons, the merge threads
 explicitly invoke {\tt pack()} as soon as possible.

 \subsection{Storage overhead}
@ -1061,7 +1065,7 @@ We have not examined the tradeoffs between different implementations
 of tuple lookups.  Currently, rather than using binary search to find
 the boundaries of each range, our compressors simply iterate over the
 compressed representation of the data in order to progressively narrow
-the range of tuples to be considered.  It is possible that (given
+the range of tuples to be considered.  It is possible that (because of
 expensive branch mispredictions and \rowss small pages) that our
 linear search implementation will outperform approaches based upon
 binary search.
@ -1123,15 +1127,15 @@ Wind Gust Speed   & RLE       &        \\
 order to the tuples, and insert them in this order.  We compare \rowss
 performance with the MySQL InnoDB storage engine's bulk
 loader\footnote{We also evaluated MySQL's MyISAM table format.
-  Predictably, performance degraded as the tree grew; ISAM indices do not
+  Predictably, performance degraded quickly as the tree grew; ISAM indices do not
  support node splits.}.  This avoids the overhead of SQL insert
 statements.  To force InnoDB to update its B-Tree index in place, we
 break the dataset into 100,000 tuple chunks, and bulk load each one in
 succession.

 If we did not do this, MySQL would simply sort the tuples, and then
-bulk load the index.  This behavior is unacceptable in a low-latency
-replication environment.  Breaking the bulk load into multiple chunks
+bulk load the index.  This behavior is unacceptable in low-latency
+environments.  Breaking the bulk load into multiple chunks
 forces MySQL to make intermediate results available as the bulk load
 proceeds\footnote{MySQL's {\tt concurrent} keyword allows access to
  {\em existing} data during a bulk load; new data is still exposed
@ -1144,7 +1148,7 @@ to a sequential log.  The double buffer increases the amount of I/O
 performed by InnoDB, but allows it to decrease the frequency with
 which it needs to fsync() the buffer pool to disk.  Once the system
 reaches steady state, this would not save InnoDB from performing
-random I/O, but it increases I/O overhead.
+random I/O, but it would increase I/O overhead.

 We compiled \rowss C components with ``-O2'', and the C++ components
 with ``-O3''.  The later compiler flag is crucial, as compiler
@ -1155,7 +1159,7 @@ given the buffer pool's LRU page replacement policy, and \rowss
 sequential I/O patterns.

 Our test hardware has two dual core 64-bit 3GHz Xeon processors with
-2MB of cache (Linux reports CPUs) and 8GB of RAM.  All software used during our tests
+2MB of cache (Linux reports 4 CPUs) and 8GB of RAM.  All software used during our tests
 was compiled for 64 bit architectures.  We used a 64-bit Ubuntu Gutsy
 (Linux ``2.6.22-14-generic'') installation, and the
 ``5.0.45-Debian\_1ubuntu3'' build of MySQL.
@ -1219,16 +1223,16 @@ occur for two reasons.  First, $C0$ accepts insertions at a much
 greater rate than $C1$ or $C2$ can accept them.  Over 100,000 tuples
 fit in memory, so multiple samples are taken before each new $C0$
 component blocks on the disk bound mergers.  Second, \rows merges
-entire trees at once, causing smaller components to occasionally block
+entire trees at once, occasionally blocking smaller components
 for long periods of time while larger components complete a merge
 step.  Both of these problems could be masked by rate limiting the
 updates presented to \rows.  A better solution would perform
 incremental tree merges instead of merging entire components at once.

 This paper has mentioned a number of limitations in our prototype
-implementation.  Figure~\ref{fig:4R} seeks to quantify the effects of
+implementation.  Figure~\ref{fig:4R} seeks to quantify the performance impact of
 these limitations.  This figure uses our simplistic analytical model
-to calculate \rowss effective disk throughput utilization from \rows
+to calculate \rowss effective disk throughput utilization from \rowss
 reported value of $R$ and instantaneous throughput.  According to our
 model, we should expect an ideal, uncompressed version of \rows to
 perform about twice as fast as our prototype performed during our experiments.  During our tests, \rows
@ -1249,7 +1253,7 @@ Finally, our analytical model neglects some minor sources of storage
 overhead.

 One other factor significantly limits our prototype's performance.
-Replacing $C0$ atomically doubles \rows peak memory utilization,
+Atomically replacing $C0$ doubles \rows peak memory utilization,
 halving the effective size of $C0$.  The balanced tree implementation
 that we use roughly doubles memory utilization again.  Therefore, in
 our tests, the prototype was wasting approximately $750MB$ of the
@ -1280,9 +1284,9 @@ update workloads.
 \subsection{LSM-Trees}

 The original LSM-Tree work\cite{lsm} provides a more detailed
-analytical model than the one presented below.  It focuses on update
+analytical model than the one presented above.  It focuses on update
 intensive OLTP (TPC-A) workloads, and hardware provisioning for steady
-state conditions.
+state workloads.

 Later work proposes the reuse of existing B-Tree implementations as
 the underlying storage mechanism for LSM-Trees\cite{cidrPartitionedBTree}.  Many
@ -1303,7 +1307,7 @@ perfectly laid out B-Trees.

 The problem of {\em Online B-Tree merging} is closely related to
 LSM-Trees' merge process.  B-Tree merging addresses situations where
-the contents of a single table index has been split across two
+the contents of a single table index have been split across two
 physical B-Trees that now need to be reconciled.  This situation
 arises, for example, during rebalancing of partitions within a cluster
 of database machines.
@ -1360,8 +1364,7 @@ the MonetDB\cite{pfor} column-oriented database, along with two other
 formats (PFOR-delta, which is similar to PFOR, but stores values as
 deltas, and PDICT, which encodes columns as keys and a dictionary that
 maps to the original values).  We plan to add both these formats to
-\rows in the future.  \rowss novel {\em multicolumn} page format is a
-generalization of these formats.  We chose these formats as a starting
+\rows in the future.  We chose these formats as a starting
 point because they are amenable to superscalar optimization, and
 compression is \rowss primary CPU bottleneck.  Like MonetDB, each
 \rows table is supported by custom-generated code.
@ -1439,7 +1442,7 @@ will become practical.  Improved compression ratios improve \rowss
 throughput by decreasing its sequential I/O requirements.  In addition
 to applying compression to LSM-Trees, we presented a new approach to
 database replication that leverages the strengths of LSM-Tree indices
-by avoiding index probing.  We also introduced the idea of using
+by avoiding index probing during updates.  We also introduced the idea of using
 snapshot consistency to provide concurrency control for LSM-Trees.
 Our prototype's LSM-Tree recovery mechanism is extremely
 straightforward, and makes use of a simple latching mechanism to
@ -1450,16 +1453,16 @@ tree merging.
 Our implementation is a first cut at a working version of \rows; we
 have mentioned a number of potential improvements throughout this
 paper.  We have characterized the performance of our prototype, and
-bounded the performance gain we can expect to achieve by continuing to
-optimize our prototype.  Without compression, LSM-Trees outperform
-B-Tree based indices by many orders of magnitude.  With real-world
+bounded the performance gain we can expect to achieve via continued
+optimization of our prototype.  Without compression, LSM-Trees can outperform
+B-Tree based indices by at least 2 orders of magnitude.  With real-world
 database compression ratios ranging from 5-20x, we expect \rows
 database replicas to outperform B-Tree based database replicas by an
 additional factor of ten.

 We implemented \rows to address scalability issues faced by large
 scale database installations.  \rows addresses seek-limited
-applications that require real time analytical and decision support
+applications that require near-realtime analytical and decision support
 queries over extremely large, frequently updated data sets.  We know
 of no other database technology capable of addressing this class of
 application.  As automated financial transactions, and other real-time
--- a/doc/rosePaper/rows-all-data-final.gnumeric
+++ b/doc/rosePaper/rows-all-data-final.gnumeric
--- a/doc/rosePaper/rows-submitted.pdf
+++ b/doc/rosePaper/rows-submitted.pdf