2. Summary
❖ PalDB is an embeddable write-once key-value store
❖ Written in Java, no dependencies and only 110K JAR
❖ Very fast read performance, 2M+ reads/second
❖ Simple, works like an immutable un-typed HashMap
❖ Compact, holds in a single binary file
❖ Open-sourced at LinkedIn in 2015
3. Why PalDB?
❖ Need for an efficient solution to package side-data
❖ Inappropriate existing solutions
‣ Raw data files (CSV, JSON, Avro, Thrift) require complex
parsing code and in-memory data structures
‣ Embeddable key-value stores (LevelDB, RocksDB) have large
overhead due to read/write capabilities
‣ Traditional in-memory data structures (List, HashSet, HashMap)
take too much memory and require load time
4. Features
✓ All primitives and arrays, no schema needed
✓ Random read & iteration (unsorted)
✓ No load time, and uses off-heap memory
✓ Custom serializers can be defined
✓ Read from store file, stream or resources within JAR
✓ Holds in a single binary file
5. Write-once
❖ Write-once, read many
❖ Once a store has been written and closed, it can’t be
modified
❖ Typical use-case is to transport pre-created datasets
❖ Principal benefit is a more compact store size
7. Code: Read store
Java
StoreReader reader = PalDB.createReader(new File("store.paldb"));
String val1 = reader.get("foo");
int[] val2 = reader.get(1213);
reader.close();
Scala
val reader: StoreReader = PalDB.createReader(new File("store.paldb"));
val val1: String = reader.get("foo");
var val2: Array[Int] = reader.get(1213);
reader.close();
8. Benchmark summary
❖ When compared to embeddable key-value stores
(LevelDB, RocksDB)
‣ PalDB has 5X to 15X higher throughput on datasets
fitting in memory*
❖ When compared to in-memory Java HashSet/HashMap
‣ PalDB has 2X to 5X lower throughput
‣ Uses 6X less memory
* PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB