SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Distributed Data Systems
How Do They Even?
About Me - Jared L Kerim
- Software Developer (Python)
- Mozilla Geolocation Cloud Services Team
- CTO at PressureNET
PressureNET (Shameless Plug)
- Gathers sensor data from
smartphones
- Constant stream of data to
servers
- API to retrieve data
- Visualization
- Analysis
The First Architecture
Sensors Web Servers MySQL API
The Problem: MySQL
- Slow lookups
- Takes a lot of disk space
- Cost (Large Relational DBs are expensive)
- Schema changes (become slow or impossible)
How Big is “Big”
- PressureNET 100 req/s, 1.5 billion records
- Analytics Systems 5000 req/s, 100s of
billions of records
- Ad Buying Service 500k req/s, trillions of
records
The Question
What is ????
Sensors ???? APIWeb Servers
What do we want to accomplish?
- Receive and store large amounts of data
- Access it quickly
- Small fast lookups (visualization)
- Large batch computations (mapreduce)
Considerations
- Durability (we don’t want to lose data)
- Redundancy (expect failures!)
- Scalability (simple growth, no upper limit)
Durability
- Data in a durable store should be ‘safe’
- Don’t remove data from one durable data
store until it is confirmed to be in another
durable data store
- Durable data stores should have redundant
backups (hot standbys)
Redundancy
- Each stage of your system should have
multiple copies
- If one copy goes down, another should take
over
- Redundancy ensures availability
Scalability
- The rate of data intake can grow or spike
- Your system should be able to add more
resources to handle that growth
- Require that your workload is partitionable
Proposed Architecture
Sensors Ingestors Queue Aggregator
S3
DynamoDB
We Are Not Alone
- This architecture is widely adopted
- Analytics
- Ad Serving/Views
- Log Analysis
- Sensor Data
- Game Events
- Video Events
Ingestors
- A redundant, scalable set of nodes which
receive data over http
- Can apply early validation and
authentication
- Stateless, low latency
Queue
- A scalable, durable storage mechanism for
data ‘in flight’
- Only holds data temporarily
- Typically preserves the order data was
received in
Aggregator
- A scalable, stateless set of workers which
consume data from the queue
- Can process data in small batches
- Write raw or transformed data to persistent
storage such as S3, Databases, etc.

Contenu connexe

Tendances

End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusDatabricks
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeSingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemSingleStore
 
Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraDataStax
 
RealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache ApexRealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache ApexAshish Tadose
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and TypesAnjani Phuyal
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureOliver Buckley-Salmon
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationAlluxio, Inc.
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...HostedbyConfluent
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Fwdays
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big dataSanjeev Solanki
 
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...DataStax
 
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source AnalysisThe New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source AnalysisZoomdata
 

Tendances (20)

End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Data streaming at VRT
Data streaming at VRTData streaming at VRT
Data streaming at VRT
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & Cassandra
 
RealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache ApexRealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache Apex
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big data
 
Cap server log file analytics
Cap server log file analyticsCap server log file analytics
Cap server log file analytics
 
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
 
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source AnalysisThe New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
 

En vedette

Evaluation Q2 dorcas
Evaluation Q2  dorcasEvaluation Q2  dorcas
Evaluation Q2 dorcasdorcasfaida
 
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015Ronel Hirsch
 
Aan de slag met dropbox
Aan de slag met dropboxAan de slag met dropbox
Aan de slag met dropboxTrudy Winkels
 
Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2Luke Krison
 
Búsquedas por palabras claves
Búsquedas por palabras clavesBúsquedas por palabras claves
Búsquedas por palabras clavesFernando Saravia
 
Magazine analysis
Magazine analysisMagazine analysis
Magazine analysismillerjess
 
Compos first period✧
Compos first period✧Compos first period✧
Compos first period✧Kira Flores Ü
 
Biografi Elvira Devinamira
Biografi Elvira DevinamiraBiografi Elvira Devinamira
Biografi Elvira Devinamiraanggiagustiani
 

En vedette (14)

Evaluation Q2 dorcas
Evaluation Q2  dorcasEvaluation Q2  dorcas
Evaluation Q2 dorcas
 
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
 
Aan de slag met dropbox
Aan de slag met dropboxAan de slag met dropbox
Aan de slag met dropbox
 
Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2
 
Búsquedas por palabras claves
Búsquedas por palabras clavesBúsquedas por palabras claves
Búsquedas por palabras claves
 
Magazine analysis
Magazine analysisMagazine analysis
Magazine analysis
 
Il nucleo
Il nucleoIl nucleo
Il nucleo
 
Rbi
RbiRbi
Rbi
 
Презентация Give5 Club
Презентация Give5 ClubПрезентация Give5 Club
Презентация Give5 Club
 
Seconday day
Seconday day Seconday day
Seconday day
 
Compos first period✧
Compos first period✧Compos first period✧
Compos first period✧
 
My last vacation
My last vacationMy last vacation
My last vacation
 
Biografi Elvira Devinamira
Biografi Elvira DevinamiraBiografi Elvira Devinamira
Biografi Elvira Devinamira
 
Landress garage build
Landress garage buildLandress garage build
Landress garage build
 

Similaire à Distributed Data Systems

Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data AnalyticsBo Yang
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Kevin Mao
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Amazon Web Services Korea
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureMarco van der Hart
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 

Similaire à Distributed Data Systems (20)

Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data Analytics
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Lecture1
Lecture1Lecture1
Lecture1
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

Dernier

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Distributed Data Systems

  • 2. About Me - Jared L Kerim - Software Developer (Python) - Mozilla Geolocation Cloud Services Team - CTO at PressureNET
  • 3. PressureNET (Shameless Plug) - Gathers sensor data from smartphones - Constant stream of data to servers - API to retrieve data - Visualization - Analysis
  • 4. The First Architecture Sensors Web Servers MySQL API
  • 5. The Problem: MySQL - Slow lookups - Takes a lot of disk space - Cost (Large Relational DBs are expensive) - Schema changes (become slow or impossible)
  • 6. How Big is “Big” - PressureNET 100 req/s, 1.5 billion records - Analytics Systems 5000 req/s, 100s of billions of records - Ad Buying Service 500k req/s, trillions of records
  • 7. The Question What is ???? Sensors ???? APIWeb Servers
  • 8. What do we want to accomplish? - Receive and store large amounts of data - Access it quickly - Small fast lookups (visualization) - Large batch computations (mapreduce)
  • 9. Considerations - Durability (we don’t want to lose data) - Redundancy (expect failures!) - Scalability (simple growth, no upper limit)
  • 10. Durability - Data in a durable store should be ‘safe’ - Don’t remove data from one durable data store until it is confirmed to be in another durable data store - Durable data stores should have redundant backups (hot standbys)
  • 11. Redundancy - Each stage of your system should have multiple copies - If one copy goes down, another should take over - Redundancy ensures availability
  • 12. Scalability - The rate of data intake can grow or spike - Your system should be able to add more resources to handle that growth - Require that your workload is partitionable
  • 13. Proposed Architecture Sensors Ingestors Queue Aggregator S3 DynamoDB
  • 14. We Are Not Alone - This architecture is widely adopted - Analytics - Ad Serving/Views - Log Analysis - Sensor Data - Game Events - Video Events
  • 15. Ingestors - A redundant, scalable set of nodes which receive data over http - Can apply early validation and authentication - Stateless, low latency
  • 16. Queue - A scalable, durable storage mechanism for data ‘in flight’ - Only holds data temporarily - Typically preserves the order data was received in
  • 17. Aggregator - A scalable, stateless set of workers which consume data from the queue - Can process data in small batches - Write raw or transformed data to persistent storage such as S3, Databases, etc.