Introduction to PalDB

•

1 j'aime•1,404 vues

Mathieu Bastian

PalDB is an open-source embeddable and immutable key-value store written in Java.

Logiciels

PalDB
Introduction to PalDB
Mathieu Bastian - October 2015

Summary
❖ PalDB is an embeddable write-once key-value store
❖ Written in Java, no dependencies and only 110K JAR
❖ Very fast read performance, 2M+ reads/second
❖ Simple, works like an immutable un-typed HashMap
❖ Compact, holds in a single binary ﬁle
❖ Open-sourced at LinkedIn in 2015

Why PalDB?
❖ Need for an efﬁcient solution to package side-data
❖ Inappropriate existing solutions
‣ Raw data ﬁles (CSV, JSON, Avro, Thrift) require complex
parsing code and in-memory data structures
‣ Embeddable key-value stores (LevelDB, RocksDB) have large
overhead due to read/write capabilities
‣ Traditional in-memory data structures (List, HashSet, HashMap)
take too much memory and require load time

Features
✓ All primitives and arrays, no schema needed
✓ Random read & iteration (unsorted)
✓ No load time, and uses off-heap memory
✓ Custom serializers can be deﬁned
✓ Read from store ﬁle, stream or resources within JAR
✓ Holds in a single binary ﬁle

Write-once
❖ Write-once, read many
❖ Once a store has been written and closed, it can’t be
modiﬁed
❖ Typical use-case is to transport pre-created datasets
❖ Principal beneﬁt is a more compact store size

Code: Write store
Java
StoreWriter writer = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, new int[] {1, 2, 3});
writer.close();
Scala
val writer: StoreWriter = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, Array(1, 2, 3));
writer.close();

Code: Read store
Java
StoreReader reader = PalDB.createReader(new File("store.paldb"));
String val1 = reader.get("foo");
int[] val2 = reader.get(1213);
reader.close();
Scala
val reader: StoreReader = PalDB.createReader(new File("store.paldb"));
val val1: String = reader.get("foo");
var val2: Array[Int] = reader.get(1213);
reader.close();

Benchmark summary
❖ When compared to embeddable key-value stores
(LevelDB, RocksDB)
‣ PalDB has 5X to 15X higher throughput on datasets
ﬁtting in memory*
❖ When compared to in-memory Java HashSet/HashMap
‣ PalDB has 2X to 5X lower throughput
‣ Uses 6X less memory
* PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB

Throughput
❖ Throughput benchmark between PalDB, LevelDB and
RocksDB (higher is better)

Memory
❖ Memory usage benchmark between PalDB and a Java
HashSet (lower is better)

PalDB © 2015 LinkedIn Corp. Licensed under the terms of the Apache License, Version 2.0.
Code & documentation available on GitHub 
https://github.com/linkedin/PalDB
PalDB

Contenu connexe

Tendances

Azure DocumentDB 101Ike Ellis

Mongodb labBas van Oudenaarde

MongoDBMuhammad zubair

Replicating application data into materialized viewsZach Cox

Updating materialized views and caches using kafkaZach Cox

Draft slide of Demystifying DHT in GlusterFSAnkit Raj

ArangoDBArangoDB Database

Introduction to RedisArnab Mitra

Ceph Day Beijing: Containers and Ceph Ceph Community

MongoDB_Sharan_Prakash_BabuSharan

Mongo dbSwecha | స్వేచ్ఛ

FOXX - a Javascript application framework on top of ArangoDBArangoDB Database

Operationalizing MongoDB at AOLradiocats

MongoDB Aggregation MongoSF May 2011Chris Westin

KeyValue StoresMauro Pompilio

Comparison with storing data using NoSQL(CouchDB) and a relational database.eross77

PENXY - Redis in Azuremourhoon

Visualize your graph databaseMichael Hackstein

Redis IUIsaiah Edem

CouchDB: replicated data store for distributed proxy servertkramar

Tendances (20)

Azure DocumentDB 101

Mongodb lab

MongoDB

Replicating application data into materialized views

Updating materialized views and caches using kafka

Draft slide of Demystifying DHT in GlusterFS

ArangoDB

Introduction to Redis

Ceph Day Beijing: Containers and Ceph

MongoDB_Sharan_Prakash_Babu

Mongo db

FOXX - a Javascript application framework on top of ArangoDB

Operationalizing MongoDB at AOL

MongoDB Aggregation MongoSF May 2011

KeyValue Stores

Comparison with storing data using NoSQL(CouchDB) and a relational database.

PENXY - Redis in Azure

Visualize your graph database

Redis IU

CouchDB: replicated data store for distributed proxy server

Similaire à Introduction to PalDB

Hadoop and object stores can we do it bettergvernik

Hadoop and object stores: Can we do it better?gvernik

Project Voldemort: Big data loadingDan Harvey

Taming NoSQL with Spring DataSergi Almar i Graupera

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt

Introduction to Apache Spark EcosystemBojan Babic

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha

מיכאלsqlserver.co.il

Scalding by Adform Research, Alex GryzlovVasil Remeniuk

NoSQL: Why, When, and HowBigBlueHat

JDD 2016 - Michal Matloka - Small Intro To Big DataPROIDEA

OCF.tw's talk about "Introduction to spark"Giivee The

Efficient In-situ Processing of Various Storage Types on Apache TajoDataWorks Summit

In-memory databaseChien Nguyen Dang

Efficient in situ processing of various storage types on apache tajoHyunsik Choi

Efficient In‐situ Processing of Various Storage Types on Apache TajoGruter

Ceph Day New York 2014: Future of CephFS Ceph Community

Real time Analytics with Apache Kafka and Apache SparkRahul Jain

Cassandra ExplainedEric Evans

Introduction to NoSqlOmid Vahdaty

Similaire à Introduction to PalDB (20)

Hadoop and object stores can we do it better

Hadoop and object stores: Can we do it better?

Project Voldemort: Big data loading

Taming NoSQL with Spring Data

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...

Introduction to Apache Spark Ecosystem

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)

מיכאל

Scalding by Adform Research, Alex Gryzlov

NoSQL: Why, When, and How

JDD 2016 - Michal Matloka - Small Intro To Big Data

OCF.tw's talk about "Introduction to spark"

Efficient In-situ Processing of Various Storage Types on Apache Tajo

In-memory database

Efficient in situ processing of various storage types on apache tajo

Efficient In‐situ Processing of Various Storage Types on Apache Tajo

Ceph Day New York 2014: Future of CephFS

Real time Analytics with Apache Kafka and Apache Spark

Cassandra Explained

Introduction to NoSql

Dernier

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Right Money Management App For Your Financial GoalsJhone kinadey

Software Quality Assurance Interview QuestionsArshad QA

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Dernier (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

HR Software Buyers Guide in 2024 - HRSoftware.com

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service

Hand gesture recognition PROJECT PPT.pptx

How To Use Server-Side Rendering with Nuxt.js

5 Signs You Need a Fashion PLM Software.pdf

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

A Secure and Reliable Document Management System is Essential.docx

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Microsoft AI Transformation Partner Playbook.pdf

Diamond Application Development Crafting Solutions with Precision

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Right Money Management App For Your Financial Goals

Software Quality Assurance Interview Questions

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

Introduction to PalDB

1. PalDB Introduction to PalDB Mathieu Bastian - October 2015

2. Summary ❖ PalDB is an embeddable write-once key-value store ❖ Written in Java, no dependencies and only 110K JAR ❖ Very fast read performance, 2M+ reads/second ❖ Simple, works like an immutable un-typed HashMap ❖ Compact, holds in a single binary ﬁle ❖ Open-sourced at LinkedIn in 2015

3. Why PalDB? ❖ Need for an efﬁcient solution to package side-data ❖ Inappropriate existing solutions ‣ Raw data ﬁles (CSV, JSON, Avro, Thrift) require complex parsing code and in-memory data structures ‣ Embeddable key-value stores (LevelDB, RocksDB) have large overhead due to read/write capabilities ‣ Traditional in-memory data structures (List, HashSet, HashMap) take too much memory and require load time

4. Features ✓ All primitives and arrays, no schema needed ✓ Random read & iteration (unsorted) ✓ No load time, and uses off-heap memory ✓ Custom serializers can be defined ✓ Read from store file, stream or resources within JAR ✓ Holds in a single binary file

5. Write-once ❖ Write-once, read many ❖ Once a store has been written and closed, it can’t be modiﬁed ❖ Typical use-case is to transport pre-created datasets ❖ Principal beneﬁt is a more compact store size

6. Code: Write store Java StoreWriter writer = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, new int[] {1, 2, 3}); writer.close(); Scala val writer: StoreWriter = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, Array(1, 2, 3)); writer.close();

7. Code: Read store Java StoreReader reader = PalDB.createReader(new File("store.paldb")); String val1 = reader.get("foo"); int[] val2 = reader.get(1213); reader.close(); Scala val reader: StoreReader = PalDB.createReader(new File("store.paldb")); val val1: String = reader.get("foo"); var val2: Array[Int] = reader.get(1213); reader.close();

8. Benchmark summary ❖ When compared to embeddable key-value stores (LevelDB, RocksDB) ‣ PalDB has 5X to 15X higher throughput on datasets ﬁtting in memory* ❖ When compared to in-memory Java HashSet/HashMap ‣ PalDB has 2X to 5X lower throughput ‣ Uses 6X less memory * PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB

9. Throughput ❖ Throughput benchmark between PalDB, LevelDB and RocksDB (higher is better)

10. Memory ❖ Memory usage benchmark between PalDB and a Java HashSet (lower is better)

Introduction to PalDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Introduction to PalDB

Similaire à Introduction to PalDB (20)

Dernier

Dernier (20)

Introduction to PalDB