SlideShare une entreprise Scribd logo
1  sur  63
OPEN SOURCE LAMBDA ARCHITECTURE
KAFKA · HADOOP · SAMZA · DRUID
FANGJIN YANG · GIAN MERLINO · DRUID COMMITTERS
PROBLEM DEALING WITH EVENT DATA
MOTIVATION EVOLUTION OF A “REAL-TIME” STACK
ARCHITECTURE THE “RAD”-STACK
NEXT STEPS TRY IT OUT FOR YOURSELF
OVERVIEW
THE PROBLEM
2013
THE PROBLEM
‣ Arbitrary and interactive exploration of time series data
• Ad-tech, system/app metrics, network/website traffic analysis
‣ Multi-tenancy: lots of concurrent users
‣ Scalability: 10+ TB/day, ad-hoc queries on trillions of events
‣ Recency matters! Real-time analysis
2013
FINDING A SOLUTION
‣ Load all your data into Hadoop. Query it. Done!
‣ Good job guys, let’s go home
2013
FINDING A SOLUTION
Hadoop
EventStreams
Insight
2013
PROBLEMS WITH THE NAIVE SOLUTION
‣ MapReduce can handle almost every distributed computing
problem
‣ MapReduce over your raw data is flexible but slow
‣ Hadoop is not optimized for query latency
‣ To optimize queries, we need a query layer
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage) Query Layer
Hadoop
EventStreams
Insight
A FASTER QUERY LAYER
2013
MAKE QUERIES FASTER
‣ What types of queries to optimize for?
• Revenue over time broken down by demographic
• Top publishers by clicks over the last month
• Number of unique visitors broken down by any dimension
• Not dumping the entire dataset
• Not examining individual events
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage) RDBMS
Hadoop
EventStreams
Insight
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage)
NoSQL K/V
Stores
Hadoop
EventStreams
Insight
2013
FINDING A SOLUTION
Hadoop (pre-processing and storage)
Commercial
Databases
Hadoop
EventStreams
Insight
DRUID AS A QUERY LAYER
2013
DRUID
‣ Druid project started in 2011, went open source in 2012
‣ Designed for low latency ingestion and ad-hoc aggregations
‣ Designed for keeping around a lot of history (years are ok)
‣ Growing Community
• ~100 contributors
• Used in production at numerous large and small organizations
2014
REALTIME INGESTION
>500K EVENTS / SECOND AVERAGE
>1M EVENTS / SECOND PEAK
10 – 100K EVENTS / SECOND / CORE
DRUID IN PRODUCTION
2014
0.0
0.5
1.0
1.5
0
1
2
3
4
0
5
10
15
20
90%ile95%ile99%ile
Feb 03 Feb 10 Feb 17 Feb 24
time
querytime(seconds)
datasource
a
b
c
d
e
f
g
h
Query latency percentiles
QUERY LATENCY (500MS AVERAGE)
90% < 1S 95% < 5S 99% < 10S
DRUID IN PRODUCTION
2013
RAW DATA
timestamp publisher advertiser gender country click price
2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45
...
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
2013
ROLLUP DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Truncate timestamps
‣ GroupBy over string columns (dimensions)
‣ Aggregate numeric columns (metrics)
2013
PARTITION DATA
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
‣ Shard data by time
‣ Immutable chunks of data called “segments”
Segment 2011-01-01T02/2011-01-01T03
Segment 2011-01-01T01/2011-01-01T02
2013
IMMUTABLE SEGMENTS
‣ Fundamental storage unit in Druid
‣ Read consistency
‣ One thread scans one segment
‣ Multiple threads can access same underlying data
‣ Segment sizes -> computation completes in ms
‣ Simplifies distribution & replication
2013
COLUMN ORIENTATION
timestamp publisher advertiser gender country impressions clicks revenue
2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
‣ Scan/load only what you need
‣ Compression!
‣ Indexes!
DRUID INGESTION
‣ Must have denormalized, flat data
‣ Druid cannot do stateful processing at ingestion time
‣ …like stream-stream joins
‣ …or user session reconstruction
‣ …or a bunch of other useful things!
‣ Many Druid users need an ETL pipeline
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
Data
Source
Stream
Processor
User queries
2013
DRUID REAL-TIME INGESTION
Druid
Realtime
Workers
Immediate Druid
Historical
Nodes
Periodic
Druid
Broker
Nodes
User queries
STREAMING DATA PIPELINES
AN EXAMPLE: ONLINE ADS
‣ Input data: impressions, clicks, ID-to-name mappings
‣ Output: enhanced impressions
‣ Steps
‣ Join impressions with clicks ->“clicks”
‣ Look up IDs to names -> “advertiser”, “publisher”, …
‣ Geocode -> “country”, …
‣ Lots of other additions
PIPELINE
Impressions
Clicks
Druid
?
PIPELINE
Impressions
Partition 0
{key: 186bd591-9442-48f0, publisher: foo, …}
{key: 9b5e2cd2-a8ac-4232, publisher: qux, …}
…
Partition 1
{key: 1079026c-7151-4871, publisher: baz, …}
…
Clicks
Partition 0
…
Partition 1
{key: 186bd591-9442-48f0}
…
PIPELINE
Impressions
Clicks
Druid
PIPELINE
Impressions
Clicks
Shuffled
Shuffle
Druid
PIPELINE
Shuffled
Partition 0
{type: impression, key: 186bd591-9442-48f0, publisher: foo, …}
{type: impression, key: 1079026c-7151-4871, publisher: baz, …}
{type: click, key: 186bd591-9442-48f0}
…
Partition 1
{type: impression, key: 9b5e2cd2-a8ac-4232, publisher: qux, …}
…
PIPELINE
Impressions
Clicks
Shuffled
Shuffle
Druid
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Druid
PIPELINE
Joined
Partition 0
{key: 186bd591-9442-48f0, is_clicked: true, publisher: foo, …}
{key: 1079026c-7151-4871, is_clicked: false, publisher: baz, …}
…
Partition 1
{key: 9b5e2cd2-a8ac-4232, is_clicked: false, publisher: qux, …}
…
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Druid
PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Enhance & Output
Druid
ALTERNATIVE PIPELINE
Impressions
Clicks
Shuffled
Joined
Shuffle
Join
Enhance Druid
Enhanced
REPROCESSING
WHY REPROCESS DATA?
‣ Bugs in processing code
‣ Imprecise streaming operations
‣ …like using short join windows
‣ Limitations of current software
‣ …Kafka 0.8.x, Samza 0.9.x can generate duplicate messages
‣ …Druid 0.7.x streaming ingestion is best-effort
LAMBDA ARCHITECTURES
‣ Hybrid batch/streaming data pipeline
‣ Batch technologies
• Hadoop MapReduce
• Spark
‣ Streaming technologies
• Samza
• Storm
• Spark Streaming
LAMBDA ARCHITECTURES
‣ Advantages?
• Works as advertised
• Works with a huge variety of open software
• Druid supports batch-replace-by-time-range through Hadoop
LAMBDA ARCHITECTURES
‣ Disadvantages?
‣ Need code to run on two very different systems
‣ Maintaining two codebases is perilous
‣ …productivity loss
‣ …code drift
‣ …difficulty training new developers
LAMBDA ARCHITECTURES
Data
streaming
LAMBDA ARCHITECTURES
Data batch
LAMBDA ARCHITECTURES
Data
streaming
batch
KAPPA ARCHITECTURE
‣ Pure streaming
‣ Reprocess data by replaying the input stream
‣ Doesn’t require operating two systems
‣ Doesn’t overcome software limitations
‣ I don’t have much experience with this
‣ http://radar.oreilly.com/2014/07/questioning-the-lambda-
architecture.html
OPERATIONS
NICE THINGS ABOUT KAFKA
‣ Scalable, replicated pub/sub
‣ Replayable message logs
‣ New consumers can read all old messages
‣ Existing consumers can reprocess all old messages
NICE THINGS ABOUT SAMZA
‣ Multi-tenancy: one main thread per container
‣ Robustness: isolated containers limit slowness and failure
‣ Visibility
‣ Multistage jobs, lots of metrics per stage
‣ Can inspect the message queue in Kafka
‣ State is simple
‣ Logging and restoring handled for you
‣ Single-threaded programming
NICE THINGS ABOUT DRUID
‣ Fast ingestion, fast queries
‣ Seamlessly merge stream-ingested and batch-ingested data
‣ Batch loads can “replace” stream loads for the same time range
NICE THINGS ABOUT HADOOP
‣ Solid batch processing system
‣ Easy to partition and reprocess data by time range
‣ Jobs can process all data, or a pre-partitioned slice
MONITORING
‣ Kafka partition availability
‣ Kafka log cleaner
‣ Samza consumer offsets
‣ Druid ingestion process rate
‣ Druid ingestion drop rate
‣ Druid query latency
‣ System metrics: CPU, network, disk
‣ Event counts at various stages
STREAM METRICS
STREAM METRICS
DO TRY THIS AT HOME
2013
CORNERSTONES
‣ Druid - druid.io - @druidio
‣ Samza - samza.apache.org - @samzastream
‣ Kafka - kafka.apache.org - @apachekafka
‣ Hadoop - hadoop.apache.org
GLUE
Tranquility
Camus / Secor Druid Hadoop indexer
GLUE
Camus / Secor Druid Hadoop indexer
druid-kaka-eight
TAKE AWAYS
‣ Consider Kafka for making your streams available
‣ Consider Samza for streaming data integration
‣ Consider Druid for interactive exploration of streams
‣ Metrics, metrics, metrics
‣ Have a reprocessing strategy if you’re interested in historical data
THANK YOU

Contenu connexe

Tendances

[2019] 점진적으로 프런트엔드 프레임워크 교체하기
[2019] 점진적으로 프런트엔드 프레임워크 교체하기[2019] 점진적으로 프런트엔드 프레임워크 교체하기
[2019] 점진적으로 프런트엔드 프레임워크 교체하기
NHN FORWARD
 

Tendances (20)

Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATSDeep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
 
차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서
차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서
차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서
 
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
 
Getting Started with FIDO2
Getting Started with FIDO2Getting Started with FIDO2
Getting Started with FIDO2
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven Microservices
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
[NDC18] 만들고 붓고 부수고 - 〈야생의 땅: 듀랑고〉 서버 관리 배포 이야기
[NDC18] 만들고 붓고 부수고 - 〈야생의 땅: 듀랑고〉 서버 관리 배포 이야기[NDC18] 만들고 붓고 부수고 - 〈야생의 땅: 듀랑고〉 서버 관리 배포 이야기
[NDC18] 만들고 붓고 부수고 - 〈야생의 땅: 듀랑고〉 서버 관리 배포 이야기
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
[2019] 점진적으로 프런트엔드 프레임워크 교체하기
[2019] 점진적으로 프런트엔드 프레임워크 교체하기[2019] 점진적으로 프런트엔드 프레임워크 교체하기
[2019] 점진적으로 프런트엔드 프레임워크 교체하기
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 

Similaire à Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid

Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmed
Salman Ahmed
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 

Similaire à Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid (20)

Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Acting on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsActing on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won Transactions
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmed
 
Web Performance Internals explained for Developers and other stake holders.
Web Performance Internals explained for Developers and other stake holders.Web Performance Internals explained for Developers and other stake holders.
Web Performance Internals explained for Developers and other stake holders.
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
The Future of Distributed Databases
The Future of Distributed DatabasesThe Future of Distributed Databases
The Future of Distributed Databases
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 

Plus de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid