Initial import of outline and clone of Freenix paper.

2005-03-07 07:42:57 +00:00 · 2005-03-07 07:42:57 +00:00 · 2c12560e7a
commit 2c12560e7a
parent a3112ee81c
2 changed files with 1431 additions and 0 deletions
--- a/doc/paper2/LLADD-Freenix-leftover-text.tex
+++ b/doc/paper2/LLADD-Freenix-leftover-text.tex
--- a/doc/paper2/LLADD.tex
+++ b/doc/paper2/LLADD.tex
@ -0,0 +1,195 @@
+
+\documentclass[letterpaper,english]{article}
+
+%\documentclass[letterpaper,twocolumn,english]{article}
+\usepackage[T1]{fontenc}
+\usepackage[latin1]{inputenc}
+\usepackage{graphicx}
+
+\usepackage{geometry}
+\geometry{verbose,letterpaper,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}
+
+\makeatletter
+
+\usepackage{babel}
+
+\begin{document}
+
+\title{LLADD Outline }
+
+
+\author{Russell Sears \and ... \and Eric Brewer}
+
+\maketitle
+
+
+
+\begin{enumerate}
+
+\item Abstract
+
+\item Introduction 
+
+\begin{enumerate}
+
+  \item Current transactional systems handle conventional workloads
+  well, but object persistence mechanisms are a mess, as are
+  {}``version oriented'' data stores requiring large, efficient atomic
+  updates.
+
+  \item {}``Impedance mismatch'' is a term that refers to a mismatch
+  between the data model provided by the data store and the data model
+  required by the application. A significant percentage of software
+  development effort is related to dealing with this problem. Related
+  problems that have had less treatment in the literature involve
+  mismatches between other performance-critical and labor intensive
+  programming primitives such as concurrency models, error handling
+  techniques and application development patterns.
+
+  \item Past trends in the Database community have been driven by
+  demand for tools that allow extremely specialized (but commercially
+  important!)  types of software to be developed quickly and
+  inexpensively. {[}System R, OODBMS, benchmarks, streaming databases,
+  etc{]} This has led to the development of large, monolithic database
+  severs that perform well under many circumstances, but that are not
+  nearly as flexible as modern programming languages or typical
+  in-memory data structure libraries {[}Java Collections,
+  STL{]}. Historically, programming language and software library
+  development has focused upon the production of wide array of
+  composable general purpose tools, allowing the application developer
+  to pick algorithms and data structures that are most appropriate for
+  the problem at hand.
+
+  \item In the past, modular database and transactional storage
+  implementations have hidden the complexities of page layout,
+  synchronization, locking, and data structure design under relatively
+  narrow interfaces, since transactional storage algorithms'
+  interdependencies and requirements are notoriously complicated.
+
+  \item With these trends in mind, we have implemented a modular
+  version of ARIES that makes as few assumptions as possible about
+  application data structures or workload. Where such assumptions are
+  inevitable, we have produced narrow APIs that allow the application
+  developer to plug in alternative implementations of the modules that
+  comprise our ARIES implementation. Rather than hiding the underlying
+  complexity of the library from developers, we have produced narrow,
+  simple API's and a set of invariants that must be maintained in
+  order to ensure transactional consistency, allowing application
+  developers to produce high-performance extensions with only a little
+  effort.
+
+\end{enumerate}
+\item 2.Prior work
+
+\begin{enumerate}
+
+  \item Databases' Relational model leads to performance /
+  representation problems.
+
+  \item OODBMS / XML database systems provide model tied closely to PL
+  or hierarchical formats, but, like the relational model, these
+  models are extremely general, and might be inappropriate for
+  applications with stringent performance demands, or that use these
+  models in a way that cannot be supported well with the database
+  system's underlying data structures.
+
+  \item Berkeley DB provides a lower level interface, increasing
+  performance, and providing efficient tree and hash based data
+  structures, but hides the details of storage management and the
+  primitives provided by its transactional layer from
+  developers. Again, only a handful of data formats are made available
+  to the developer.
+
+  \item Implementations of ARIES and other transactional storage
+  mechanisms include many of the useful primitives described below,
+  but prior implementations either deny application developers access
+  to these primitives {[}??{]}, or make many high-level assumptions
+  about data representation and workload {[}DB Toolkit from
+  Wisconsin??-need to make sure this statement is true!{]}
+
+\end{enumerate}
+
+\item 3.Architecture 
+
+\begin{enumerate}
+
+  \item {}``Core LLADD'' vs {}``Operations''
+
+  \item ARIES provides {}``transactional pages'' 
+
+\begin{enumerate}
+
+  \item Diversion on ARIES semantics
+  
+  \item Non-interleaved transactions vs. Nested top actions
+  vs. Well-ordered writes.
+
+\end{enumerate}
+
+  \item Log entries as a programming primitive 
+
+  \item Error handling with compensations as {}``abort() for C''
+
+  \item Concurrency models are fundamentally application specific, but
+  record/page level locking and index locks are often a nice trade-off
+
+  \item {}``latching'' vs {}``locking'' - data structures internal to
+  LLADD are protected by LLADD, allowing applications to reason in
+  terms of logical data addresses, not physical representation. Since
+  the application may define a custom representation, this seems to be
+  a reasonable tradeoff between application complexity and
+  performance.
+
+\end{enumerate}
+
+\item Applications (ie, {}``tricks with ARIES'') 
+
+\begin{enumerate}
+
+  \item Atomic file-based transactions. Prototype blob implementation
+  using force, shadow copies (trivial to implement given transactional
+  pages).  File systems that implement atomic operations may allow
+  data to be stored durably without calling flush() on the data
+  file. Current implementation useful for blobs that are typically
+  changed entirely from update to update, but smarter implementations
+  are certainly possible. The blob implementation primarily consists
+  of special log operations that cause file system calls to be made at
+  appropriate times, and is simple, so it could easily be replaced by
+  an application that frequently update small ranges within blobs, for
+  example.
+
+  \item Index implementation - modular hash table. Relies on separate
+  linked list, expandable array implementations.
+
+  \item Asynchronous log implementation/Fast writes. Prioritization of
+  log writes (one {}``log'' per page) implies worst case performance
+  (write, then immediate read) will behave on par with normal
+  implementation, but writes to portions of the database that are not
+  actively read should only increase system load (and not directly
+  increase latency)
+
+  \item Custom locking. Hash table can support all of the SQL degrees
+  of transactional consistency, but can also make use of
+  application-specific invariants and synchronization to accommodate
+  deadlock-avoidance, which is the model most naturally supported by C
+  and other programming languages.
+
+\end{enumerate}
+
+\item Validation 
+
+\begin{enumerate}
+
+  \item Serialization Benchmarks (Abstract log) 
+
+  \item Hierarchical Locking 
+
+  \item TPC-C (Flexibility) 
+
+  \item Sample Application. (Don't know what yet?) 
+
+\end{enumerate}
+
+\item Conclusion\end{enumerate}
+
+\end{document}