SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Analyzing Large-Scale User Data
    with Hadoop and HBase

Aaron Kimball – CTO



                           WibiData, Inc.
We can now collect more
data than at any time in
history.
Yesterday’s engineering challenge:
Fitting the problem into the
hardware.
Today’s constrained
resource is understanding.
How do we best apply data




            …to better serving our users?
The best products are user-centric
• Intuitive UI
• Continuously learning
  – Guided search
  – Smarter recommendations
• More effective service
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
Requirements




 1. Understand the user population
Requirements
               2. Respond to
               users in real time
Requirements




 3. Support graceful data evolution
Large-scale data science is hard
• What does a user look like?
  – What data is available about the user?
  – Which features are important?
  – Which features are correlated?
• How do I model this in MapReduce?
• How do I serve results in a timely fashion?
Tools of the trade
• Store all data about a user
  in one place
• Support real-time get/put,
  as well as MapReduce
Tools of the trade
             • Use complex data types to
               model complex data
             • Support extended data
               models over time
             • Retain support for legacy
               systems using older models
Tools of the trade
• Abstract computational
  model away from MapReduce
• Support computation over all
  users… or one user at a time
: for set-top boxes



Viewing/recording history
: for set-top boxes
                                       Libraries
                                 Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations
: for set-top boxes
                                       Libraries
                                 Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations




   Analysis for
product roadmap
: for set-top boxes
                                                Libraries
                                          Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations




   Analysis for
product roadmap             Tech support portal
: for set-top boxes
                                                Libraries
                                          Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations



                                                           Improved
   Analysis for
                                                           reports for
product roadmap             Tech support portal
                                                           advertisers
The future
•   More personalization
•   Adaptive UIs (self arranging dashboards)
•   Targeted content, ads
•   More effective customer service
Conclusions
• Applications are becoming increasingly user-
  centric
• Data drives this capability, but harnessing it
  requires a new distributed architecture
• The biggest challenge is allowing data
  scientists to effectively leverage the data
www.wibidata.com / @wibidata
   Aaron Kimball – aaron@wibidata.com

Contenu connexe

Similaire à Analyzing Large-Scale User Data with Hadoop and HBase

Software Programs for Data Analysis
Software Programs for Data AnalysisSoftware Programs for Data Analysis
Software Programs for Data Analysisunmgrc
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...Senturus
 
Knowage 8 presentation
Knowage 8   presentationKnowage 8   presentation
Knowage 8 presentationKNOWAGE
 
Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)Mal Booth
 
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Cloudera, Inc.
 
Building Personalized Applications with HBase
Building Personalized Applications with HBaseBuilding Personalized Applications with HBase
Building Personalized Applications with HBaseWibiData
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Self Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an EnterpriseSelf Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an EnterpriseSreejith Madhavan
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisationShwetabh Jaiswal
 
Conceptual Design of TAPipedia
Conceptual Design of TAPipediaConceptual Design of TAPipedia
Conceptual Design of TAPipediaNikos Manouselis
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
In memory analysis 衍華
In memory analysis 衍華In memory analysis 衍華
In memory analysis 衍華Lawrence Huang
 
Birst for SAP HANA
Birst for SAP HANABirst for SAP HANA
Birst for SAP HANABirst
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Mal Booth
 

Similaire à Analyzing Large-Scale User Data with Hadoop and HBase (20)

Software Programs for Data Analysis
Software Programs for Data AnalysisSoftware Programs for Data Analysis
Software Programs for Data Analysis
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 
Knowage 8 presentation
Knowage 8   presentationKnowage 8   presentation
Knowage 8 presentation
 
Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)
 
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
 
Building Personalized Applications with HBase
Building Personalized Applications with HBaseBuilding Personalized Applications with HBase
Building Personalized Applications with HBase
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Self Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an EnterpriseSelf Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an Enterprise
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
 
Conceptual Design of TAPipedia
Conceptual Design of TAPipediaConceptual Design of TAPipedia
Conceptual Design of TAPipedia
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
1 introba
1 introba1 introba
1 introba
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
In memory analysis 衍華
In memory analysis 衍華In memory analysis 衍華
In memory analysis 衍華
 
Birst for SAP HANA
Birst for SAP HANABirst for SAP HANA
Birst for SAP HANA
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)
 

Dernier

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Analyzing Large-Scale User Data with Hadoop and HBase

  • 1.
  • 2. Analyzing Large-Scale User Data with Hadoop and HBase Aaron Kimball – CTO WibiData, Inc.
  • 3. We can now collect more data than at any time in history.
  • 4. Yesterday’s engineering challenge: Fitting the problem into the hardware.
  • 6. How do we best apply data …to better serving our users?
  • 7. The best products are user-centric • Intuitive UI • Continuously learning – Guided search – Smarter recommendations • More effective service
  • 8. What are we building toward?
  • 9. What are we building toward?
  • 10. What are we building toward?
  • 11. What are we building toward?
  • 12. What are we building toward?
  • 13. Requirements 1. Understand the user population
  • 14. Requirements 2. Respond to users in real time
  • 15. Requirements 3. Support graceful data evolution
  • 16. Large-scale data science is hard • What does a user look like? – What data is available about the user? – Which features are important? – Which features are correlated? • How do I model this in MapReduce? • How do I serve results in a timely fashion?
  • 17.
  • 18. Tools of the trade • Store all data about a user in one place • Support real-time get/put, as well as MapReduce
  • 19. Tools of the trade • Use complex data types to model complex data • Support extended data models over time • Retain support for legacy systems using older models
  • 20. Tools of the trade • Abstract computational model away from MapReduce • Support computation over all users… or one user at a time
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. : for set-top boxes Viewing/recording history
  • 26. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations
  • 27. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap
  • 28. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap Tech support portal
  • 29. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Improved Analysis for reports for product roadmap Tech support portal advertisers
  • 30. The future • More personalization • Adaptive UIs (self arranging dashboards) • Targeted content, ads • More effective customer service
  • 31. Conclusions • Applications are becoming increasingly user- centric • Data drives this capability, but harnessing it requires a new distributed architecture • The biggest challenge is allowing data scientists to effectively leverage the data
  • 32. www.wibidata.com / @wibidata Aaron Kimball – aaron@wibidata.com