SlideShare une entreprise Scribd logo
1  sur  26
Rocket Fuel
Big Data and Artificial Intelligence for Digital Advertising
Abhijit Pol
Marilson Campos
Designing Data Pipelines
July, 2013
What We Do?
Data
Partners*
Optimize
Bid
Request
Rocket Fuel
Winning Ad
Ad Request
Ad Served to
User
Page
Request
Bid & Ad
Web Browser
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Response
Prediction
Model
Publishers
User
Engagement
Recorded
User Engages with
Ad
Refresh
learning
Campaign &
User Data
Warehouse
Qualify
Audience
Some Exchange Partners
Ad
Excha
nge
Ads &
Budget
How Big Is This Problem Each Day?
Trades on NASDAQ
Facebook Page Views
Searches on Google
Bid Requests Considered by Rocket Fuel
How Big Is This Problem Each Day?
Trades on NASDAQ
Facebook Page Views
Searches on Google
Bid Requests Considered by Rocket Fuel
~5 billion
10 million
30 billion
~20 billion
BIG DATA + AI
Advertising That Learns
Outline
•Architecture Evolution
•Hurdles and Challenges Faced
•Data Pipelines Best Practices
Architecture for Growth
•20 GB/month to 2 PB/month in 3 years
•New and complex requirements
•More consumers
•Rapid growth
How We Started?
Architecture 2.0
Current Architecture
Outline
•Architecture Evolution
•Hurdles and Challenges Faced
•Data Pipelines Best Practices
Hurdles and Challenges Faced
•Exponential data growth and user
queries
•Network issues
•Bots
•Bad user queries
Outline
•Architecture Evolution
•Hurdles and Challenges Faced
•Data Pipelines Best Practices
Data Pipeline Design Best Practices
Job Design
Consistency
Job Features
Avoid Re-work Golden Input
Shadow Cluster
Data Collection
Dashboard
Job Design / Consistency
•Idempotent
•Execution by different users
•Account for Execution Time
Job Execution Timeline
Job Features / Re-Work
•Smaller Jobs
•Record completion of steps
Recording completion times
Start
Is mark
already
there?
Step of workflow, job or script
Yes
No
Execute work
for the step.
Create the mark
End
Collect other
data (Optional)
Golden Input / Shadow Cluster
•Integration tests on realistic data sets.
•Safe environment to innovate.
Data Collection - Delivery time view
J
Data product
Workflow Workflow
Job
Job
Job Job
Job Job
Job
Job
JobJob
Job
Hive/Pig SSH Script
J J… J
J
Hive
J J J
Pig
…
Data collection : Data profiles view
Data product
Data set
Data set
= Data Set
= Transformation
Record Size & Type
Job
Counts
Join success ratios Data Set
Consistency
Data Collection Hierarchy
wk_external_events
wk_build_profile
user_profile
extract_fields
consolidate_metrics
load_into_data_centers
extract_features
compact_user_profile
Workflow/Job/Script StepData Product
Golden Input / Shadow Cluster
•Integration tests on realistic data sets.
•Safe environment to innovate.
Dashboard
• Delivery Time
• Data Profile Ratios
• Counters
• Alarms
Thank you
www.rocketfuel.com

Contenu connexe

Tendances

Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlSpark Summit
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksMichelle Ufford
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at RestInternap
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIn Marketing We Trust
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipelineyalisassoon
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
 
Spark Summit Keynote by Shaun Connolly
Spark Summit Keynote by Shaun ConnollySpark Summit Keynote by Shaun Connolly
Spark Summit Keynote by Shaun ConnollySpark Summit
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Altis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data PlatformAltis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data PlatformAltis Consulting
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google CloudWlodek Bielski
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningSingleStore
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
 
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Databricks
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsSingleStore
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processingidan_by
 

Tendances (20)

Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at Rest
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Spark Summit Keynote by Shaun Connolly
Spark Summit Keynote by Shaun ConnollySpark Summit Keynote by Shaun Connolly
Spark Summit Keynote by Shaun Connolly
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Altis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data PlatformAltis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data Platform
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive Analytics
 
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 

Similaire à Designing Data Pipelines Using Hadoop

Microsoft bing ads product overview
Microsoft bing ads  product overview Microsoft bing ads  product overview
Microsoft bing ads product overview Samia Kesseiri
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future businessAshish Bhasin
 
Taming the Big Data Beast to Drive More Internet Sales
Taming the Big Data Beast to Drive More Internet SalesTaming the Big Data Beast to Drive More Internet Sales
Taming the Big Data Beast to Drive More Internet SalesVickie Gibbs
 
Tạp trí Internet Marketing Số 19 - FEB 2013
Tạp trí Internet Marketing Số 19 - FEB 2013Tạp trí Internet Marketing Số 19 - FEB 2013
Tạp trí Internet Marketing Số 19 - FEB 2013Nguyễn Văn Mạnh
 
Socitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm Briefings
 
Socitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm
 
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...BrightEdge Technologies
 
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)Dave Parker
 
Unifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisUnifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisPrinciple America
 
Making display advertising work for dealers - pdf - sept 24,14
Making display advertising work for dealers - pdf - sept 24,14Making display advertising work for dealers - pdf - sept 24,14
Making display advertising work for dealers - pdf - sept 24,14Ian Cruickshank
 
Stop drowning in data 032013
Stop drowning in data 032013Stop drowning in data 032013
Stop drowning in data 032013Vickie Gibbs
 
Invite media playbook report
Invite media playbook reportInvite media playbook report
Invite media playbook reportAdCMO
 
Digital marketing strategy playbook
Digital marketing strategy playbookDigital marketing strategy playbook
Digital marketing strategy playbookAdCMO
 
Invite media playbook
Invite media playbookInvite media playbook
Invite media playbookAdCMO
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Tealium
 
Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017Clovis Chapman
 
Gartner AADI 2010 Sponsor Presentation
Gartner AADI 2010 Sponsor PresentationGartner AADI 2010 Sponsor Presentation
Gartner AADI 2010 Sponsor PresentationPascal Winckel
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterDavid Garrison
 

Similaire à Designing Data Pipelines Using Hadoop (20)

Microsoft bing ads product overview
Microsoft bing ads  product overview Microsoft bing ads  product overview
Microsoft bing ads product overview
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future business
 
Taming the Big Data Beast to Drive More Internet Sales
Taming the Big Data Beast to Drive More Internet SalesTaming the Big Data Beast to Drive More Internet Sales
Taming the Big Data Beast to Drive More Internet Sales
 
Tạp trí Internet Marketing Số 19 - FEB 2013
Tạp trí Internet Marketing Số 19 - FEB 2013Tạp trí Internet Marketing Số 19 - FEB 2013
Tạp trí Internet Marketing Số 19 - FEB 2013
 
Socitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm Supplier Briefing London
Socitm Supplier Briefing London
 
Socitm Supplier Briefing London
Socitm Supplier Briefing LondonSocitm Supplier Briefing London
Socitm Supplier Briefing London
 
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...
BrightEdge Share15 - S305: Data Learning & Decision Making – Crawl, Walk & Ru...
 
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)
Trajectory Series i-Corps How Your Startup Makes $$ (Feb 2021)
 
Unifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisUnifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution Analysis
 
Making display advertising work for dealers - pdf - sept 24,14
Making display advertising work for dealers - pdf - sept 24,14Making display advertising work for dealers - pdf - sept 24,14
Making display advertising work for dealers - pdf - sept 24,14
 
Stop drowning in data 032013
Stop drowning in data 032013Stop drowning in data 032013
Stop drowning in data 032013
 
Invite media playbook report
Invite media playbook reportInvite media playbook report
Invite media playbook report
 
Digital marketing strategy playbook
Digital marketing strategy playbookDigital marketing strategy playbook
Digital marketing strategy playbook
 
Invite media playbook
Invite media playbookInvite media playbook
Invite media playbook
 
Digital Marketing Approach - Finoit
Digital Marketing Approach - FinoitDigital Marketing Approach - Finoit
Digital Marketing Approach - Finoit
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
 
David cutler projects and activities
David cutler projects and activitiesDavid cutler projects and activities
David cutler projects and activities
 
Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017
 
Gartner AADI 2010 Sponsor Presentation
Gartner AADI 2010 Sponsor PresentationGartner AADI 2010 Sponsor Presentation
Gartner AADI 2010 Sponsor Presentation
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at Twitter
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Designing Data Pipelines Using Hadoop