skiplist/README.md
2024-04-08 09:24:00 -04:00

6.2 KiB

Skiplist

This project defines a skiplist data structure written in C. Implemented as using macros this code provides a way to essentially "template" (as in C++) and emit code with types and functions specific to your use case. You can apply these macros multiple times safely in your code, once for each application.

While there are lock-free implementations of a skiplist, this implementation is not (yet) lock-free or designed to manage concurrent access in any way. Use a mutex or some other method to serialize access to the API (until I finish the lock-free variant).

Study the example code to see how this works in practice.

Overview

A skiplist is a sorted list with O(log(n)) on average for most operations. It is a probabilistic datastructure, meaning that it does not guarantee O(log(n)) it approximates it over time. This implementation includes improves the probability by integrating the splay list algorithm for re-balancing trading off a bit of computational overhead and code complexity for a nearly always optimal, or "perfect" skiplist.

Conceptually, the arrangement of a skiplist appears as follows:

<head> ----------> [2] --------------------------------------------------> [9] ---------->
<head> ----------> [2] ------------------------------------[7] ----------> [9] ---------->
<head> ----------> [2] ----------> [4] ------------------> [7] ----------> [9] --> [10] ->
<head> --> [1] --> [2] --> [3] --> [4] --> [5] --> [6] --> [7] --> [8] --> [9] --> [10] ->

Each node contains at the very least a link to the next element in the list (corresponding to the lowest level in the above diagram), but it can randomly contain more links which skip further down the list (the towers in the above diagram). This allows for the algorithm to move down the list faster than having to visit every element.

Conceptually, the skiplist can be thought of as a stack of linked lists. At the very bottom is the full linked list with every element, and each layer above corresponds to a linked list containing a random subset of the elements from the layer immediately below it. The probability distribution that determines this random subset can be customized, but typically a layer will contain half the nodes from the layer below.

This implementation maintains a doubly-linked list at the bottom layer to support efficient iteration in either direction. There is also a guard node at the tail rather than simply pointing to NULL.

<head> <-> [1] <-> [2] <-> [3] <-> [4] <-> [5] <-> [6] <-> [7] <-> <tail>

Safety:

The ordered skiplist relies on a well-behaved comparison function. Specifically, given some ordering function f(a, b), it must satisfy the following properties:

  1. Be well-defined: f(a, b) should always return the same value
  2. Be antisymmetric: f(a, b) == Greater if and only if f(b, a) == Less, and f(a, b) == Equal == f(b, a).
  3. Be transitive: If f(a, b) == Greater and f(b, c) == Greater than f(a, c) == Greater.

Failure to satisfy these properties can result in unexpected behavior at best, and at worst will cause a segfault, null deref, or some other bad behavior.

References:

Sources of information most helpful for this implementation include, but are not limited to:

  • Skip lists: a probabilistic alternative to balanced trees

      author = {Pugh, William},
      title = {Skip lists: a probabilistic alternative to balanced trees},
      year = {1990}, issue_date = {June 1990},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      volume = {33}, number = {6}, issn = {0001-0782},
      url = {https://doi.org/10.1145/78973.78977},
      doi = {10.1145/78973.78977},
      journal = {Commun. ACM}, month = {jun}, pages = {668-676}, numpages = {9},
      keywords = {trees, searching, data structures},
      download = {https://www.cl.cam.ac.uk/teaching/2005/Algorithms/skiplists.pdf}
    }```
    
    
  • Tutorial: The Ubiquitous Skiplist, its Variants, and Applications in Modern Big Data Systems

      title={Tutorial: The Ubiquitous Skiplist, its Variants, and Applications in Modern Big Data Systems},
      author={Venkata Sai Pavan Kumar Vadrevu and Lu Xing and Walid G. Aref},
      journal={ArXiv},
      year={2023},
      volume={abs/2304.09983},
      url={https://api.semanticscholar.org/CorpusID:258236678},
      download={https://arxiv.org/pdf/2304.09983.pdf}
    }```
    
    
  • The Splay-List: A Distribution-Adaptive Concurrent Skip-List

      title={The Splay-List: A Distribution-Adaptive Concurrent Skip-List},
      author={Vitaly Aksenov and Dan Alistarh and Alexandra Drozdova and Amirkeivan Mohtashami},
      year={2020},
      eprint={2008.01009},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      download={https://arxiv.org/pdf/2008.01009.pdf}
    }```
    
    
  • JellyFish: A Fast Skip List with MVCC},

      title={JellyFish: A Fast Skip List with MVCC},
      author={Jeseong Yeon and Leeju Kim and Youil Han and Hyeon Gyu Lee and Eunji Lee and Bryan Suk Joon Kim},
      journal={Proceedings of the 21st International Middleware Conference},
      year={2020},
      url={https://api.semanticscholar.org/CorpusID:228086012}
    }```
    
    

Open Source

I'd like to thank others for thoughtfully licensing their work, the community of software engineers succeeds when we work together.

Portions of this code are derived from other copyrighted works:

TODO:

  • The concurrent, lock-free version of this (see gburd/lock-free branch for WIP).