SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
PalDB
Introduction to PalDB
Mathieu Bastian - October 2015
Summary
❖ PalDB is an embeddable write-once key-value store
❖ Written in Java, no dependencies and only 110K JAR
❖ Very fast read performance, 2M+ reads/second
❖ Simple, works like an immutable un-typed HashMap
❖ Compact, holds in a single binary file
❖ Open-sourced at LinkedIn in 2015
Why PalDB?
❖ Need for an efficient solution to package side-data
❖ Inappropriate existing solutions
‣ Raw data files (CSV, JSON, Avro, Thrift) require complex
parsing code and in-memory data structures
‣ Embeddable key-value stores (LevelDB, RocksDB) have large
overhead due to read/write capabilities
‣ Traditional in-memory data structures (List, HashSet, HashMap)
take too much memory and require load time
Features
✓ All primitives and arrays, no schema needed
✓ Random read & iteration (unsorted)
✓ No load time, and uses off-heap memory
✓ Custom serializers can be defined
✓ Read from store file, stream or resources within JAR
✓ Holds in a single binary file
Write-once
❖ Write-once, read many
❖ Once a store has been written and closed, it can’t be
modified
❖ Typical use-case is to transport pre-created datasets
❖ Principal benefit is a more compact store size
Code: Write store
Java
StoreWriter writer = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, new int[] {1, 2, 3});
writer.close();
Scala
val writer: StoreWriter = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, Array(1, 2, 3));
writer.close();
Code: Read store
Java
StoreReader reader = PalDB.createReader(new File("store.paldb"));
String val1 = reader.get("foo");
int[] val2 = reader.get(1213);
reader.close();
Scala
val reader: StoreReader = PalDB.createReader(new File("store.paldb"));
val val1: String = reader.get("foo");
var val2: Array[Int] = reader.get(1213);
reader.close();
Benchmark summary
❖ When compared to embeddable key-value stores
(LevelDB, RocksDB)
‣ PalDB has 5X to 15X higher throughput on datasets
fitting in memory*
❖ When compared to in-memory Java HashSet/HashMap
‣ PalDB has 2X to 5X lower throughput
‣ Uses 6X less memory
* PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB
Throughput
❖ Throughput benchmark between PalDB, LevelDB and
RocksDB (higher is better)
Memory
❖ Memory usage benchmark between PalDB and a Java
HashSet (lower is better)
PalDB © 2015 LinkedIn Corp. Licensed under the terms of the Apache License, Version 2.0.
Code & documentation available on GitHub

https://github.com/linkedin/PalDB
PalDB

Contenu connexe

Tendances

Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized viewsZach Cox
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Draft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSDraft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSAnkit Raj
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Community
 
MongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuMongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuSharan
 
FOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBFOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBArangoDB Database
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOLradiocats
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 
Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.eross77
 
PENXY - Redis in Azure
PENXY - Redis in AzurePENXY - Redis in Azure
PENXY - Redis in Azuremourhoon
 
Visualize your graph database
Visualize your graph databaseVisualize your graph database
Visualize your graph databaseMichael Hackstein
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy servertkramar
 

Tendances (20)

Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
MongoDB
MongoDBMongoDB
MongoDB
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized views
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Draft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSDraft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFS
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph
 
MongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuMongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_Babu
 
Mongo db
Mongo dbMongo db
Mongo db
 
FOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBFOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDB
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOL
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 
KeyValue Stores
KeyValue StoresKeyValue Stores
KeyValue Stores
 
Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.
 
PENXY - Redis in Azure
PENXY - Redis in AzurePENXY - Redis in Azure
PENXY - Redis in Azure
 
Visualize your graph database
Visualize your graph databaseVisualize your graph database
Visualize your graph database
 
Redis IU
Redis IURedis IU
Redis IU
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
 

Similaire à Introduction to PalDB

Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it bettergvernik
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?gvernik
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loadingDan Harvey
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemBojan Babic
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
JDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataJDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataPROIDEA
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Efficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache TajoEfficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache TajoDataWorks Summit
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoGruter
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 

Similaire à Introduction to PalDB (20)

Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
Taming NoSQL with Spring Data
Taming NoSQL with Spring DataTaming NoSQL with Spring Data
Taming NoSQL with Spring Data
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
מיכאל
מיכאלמיכאל
מיכאל
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
JDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataJDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big Data
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Efficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache TajoEfficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache Tajo
 
In-memory database
In-memory databaseIn-memory database
In-memory database
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 

Dernier

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Dernier (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Introduction to PalDB

  • 1. PalDB Introduction to PalDB Mathieu Bastian - October 2015
  • 2. Summary ❖ PalDB is an embeddable write-once key-value store ❖ Written in Java, no dependencies and only 110K JAR ❖ Very fast read performance, 2M+ reads/second ❖ Simple, works like an immutable un-typed HashMap ❖ Compact, holds in a single binary file ❖ Open-sourced at LinkedIn in 2015
  • 3. Why PalDB? ❖ Need for an efficient solution to package side-data ❖ Inappropriate existing solutions ‣ Raw data files (CSV, JSON, Avro, Thrift) require complex parsing code and in-memory data structures ‣ Embeddable key-value stores (LevelDB, RocksDB) have large overhead due to read/write capabilities ‣ Traditional in-memory data structures (List, HashSet, HashMap) take too much memory and require load time
  • 4. Features ✓ All primitives and arrays, no schema needed ✓ Random read & iteration (unsorted) ✓ No load time, and uses off-heap memory ✓ Custom serializers can be defined ✓ Read from store file, stream or resources within JAR ✓ Holds in a single binary file
  • 5. Write-once ❖ Write-once, read many ❖ Once a store has been written and closed, it can’t be modified ❖ Typical use-case is to transport pre-created datasets ❖ Principal benefit is a more compact store size
  • 6. Code: Write store Java StoreWriter writer = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, new int[] {1, 2, 3}); writer.close(); Scala val writer: StoreWriter = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, Array(1, 2, 3)); writer.close();
  • 7. Code: Read store Java StoreReader reader = PalDB.createReader(new File("store.paldb")); String val1 = reader.get("foo"); int[] val2 = reader.get(1213); reader.close(); Scala val reader: StoreReader = PalDB.createReader(new File("store.paldb")); val val1: String = reader.get("foo"); var val2: Array[Int] = reader.get(1213); reader.close();
  • 8. Benchmark summary ❖ When compared to embeddable key-value stores (LevelDB, RocksDB) ‣ PalDB has 5X to 15X higher throughput on datasets fitting in memory* ❖ When compared to in-memory Java HashSet/HashMap ‣ PalDB has 2X to 5X lower throughput ‣ Uses 6X less memory * PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB
  • 9. Throughput ❖ Throughput benchmark between PalDB, LevelDB and RocksDB (higher is better)
  • 10. Memory ❖ Memory usage benchmark between PalDB and a Java HashSet (lower is better)
  • 11. PalDB © 2015 LinkedIn Corp. Licensed under the terms of the Apache License, Version 2.0. Code & documentation available on GitHub
 https://github.com/linkedin/PalDB PalDB