Analyzing Large-Scale User Data with Hadoop and HBase

•

0 j'aime•527 vues

WibiData

WibiData's presentation on personalization and large-scale user data at Structure:Data 2012

Technologie Business

Analyzing Large-Scale User Data
with Hadoop and HBase

Aaron Kimball – CTO

WibiData, Inc.

We can now collect more
data than at any time in
history.

Yesterday’s engineering challenge:
Fitting the problem into the
hardware.

Today’s constrained
resource is understanding.

How do we best apply data

…to better serving our users?

The best products are user-centric
• Intuitive UI
• Continuously learning
– Guided search
– Smarter recommendations
• More effective service

Requirements

1. Understand the user population

Requirements
2. Respond to
users in real time

Requirements

3. Support graceful data evolution

Large-scale data science is hard
• What does a user look like?
– What data is available about the user?
– Which features are important?
– Which features are correlated?
• How do I model this in MapReduce?
• How do I serve results in a timely fashion?

Tools of the trade
• Store all data about a user
in one place
• Support real-time get/put,
as well as MapReduce

Tools of the trade
• Use complex data types to
model complex data
• Support extended data
models over time
• Retain support for legacy
systems using older models

Tools of the trade
• Abstract computational
model away from MapReduce
• Support computation over all
users… or one user at a time

: for set-top boxes

Viewing/recording history

: for set-top boxes
Libraries
Device and User Analysis

Viewing/recording history

Personalized offers and
recommendations

: for set-top boxes
Libraries
Device and User Analysis

Viewing/recording history

Personalized offers and
recommendations

Analysis for
product roadmap

: for set-top boxes
Libraries
Device and User Analysis

Viewing/recording history

Personalized offers and
recommendations

Improved
Analysis for
reports for
product roadmap Tech support portal
advertisers

The future
• More personalization
• Adaptive UIs (self arranging dashboards)
• Targeted content, ads
• More effective customer service

Conclusions
• Applications are becoming increasingly user-
centric
• Data drives this capability, but harnessing it
requires a new distributed architecture
• The biggest challenge is allowing data
scientists to effectively leverage the data

www.wibidata.com / @wibidata
Aaron Kimball – aaron@wibidata.com

Recommandé

Data Evolution on HBase with KijiWibiData

Exploring the Enron Email Dataset with Kiji and HiveWibiData

Analyzing Large-Scale User Data with Hadoop and HBaseWibiData

Book of Quarter [Q1 Review]Uday Shankar AB

Design history kewKaren Wieckert

Beneath the Surface: The Impact of Data on UXAlyssa Gruen

Ch 1 intro_dwSushil Kulkarni

Responsive Innovation in a Local ContextPaul Walk

Recommandé

Data Evolution on HBase with KijiWibiData

Exploring the Enron Email Dataset with Kiji and HiveWibiData

Analyzing Large-Scale User Data with Hadoop and HBaseWibiData

Book of Quarter [Q1 Review]Uday Shankar AB

Design history kewKaren Wieckert

Beneath the Surface: The Impact of Data on UXAlyssa Gruen

Ch 1 intro_dwSushil Kulkarni

Responsive Innovation in a Local ContextPaul Walk

Software Programs for Data Analysisunmgrc

Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...Senturus

Knowage 8 presentationKNOWAGE

Digitisation workshop pres 2009(v1)Mal Booth

Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Cloudera, Inc.

Building Personalized Applications with HBaseWibiData

Engineering patterns for implementing data science models on big data platformsHisham Arafat

Self Service Reporting & Analytics For an EnterpriseSreejith Madhavan

Business analytics and data visualisationShwetabh Jaiswal

Conceptual Design of TAPipediaNikos Manouselis

Big Data IntroductionDurga Gadiraju

1 introbaClaudia Gomez

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent

How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon

Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network

In memory analysis 衍華Lawrence Huang

Birst for SAP HANABirst

Big data unit 2RojaT4

Digitisation Workshop Pres 2008(V1)Mal Booth

MINDCTI Revenue Release Quarter One 2024MIND CTI

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Contenu connexe

Similaire à Analyzing Large-Scale User Data with Hadoop and HBase

Software Programs for Data Analysisunmgrc

Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...Senturus

Knowage 8 presentationKNOWAGE

Digitisation workshop pres 2009(v1)Mal Booth

Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Cloudera, Inc.

Building Personalized Applications with HBaseWibiData

Engineering patterns for implementing data science models on big data platformsHisham Arafat

Self Service Reporting & Analytics For an EnterpriseSreejith Madhavan

Business analytics and data visualisationShwetabh Jaiswal

Conceptual Design of TAPipediaNikos Manouselis

Big Data IntroductionDurga Gadiraju

1 introbaClaudia Gomez

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent

How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon

Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network

In memory analysis 衍華Lawrence Huang

Birst for SAP HANABirst

Big data unit 2RojaT4

Digitisation Workshop Pres 2008(V1)Mal Booth

Similaire à Analyzing Large-Scale User Data with Hadoop and HBase (20)

Software Programs for Data Analysis

Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...

Knowage 8 presentation

Digitisation workshop pres 2009(v1)

Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...

Building Personalized Applications with HBase

Engineering patterns for implementing data science models on big data platforms

Self Service Reporting & Analytics For an Enterprise

Business analytics and data visualisation

Conceptual Design of TAPipedia

Big Data Introduction

1 introba

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

How we use Hive at SnowPlow, and how the role of HIve is changing

Big Data Architectures @ JAX / BigDataCon 2016

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...

In memory analysis 衍華

Birst for SAP HANA

Big data unit 2

Digitisation Workshop Pres 2008(V1)

Dernier

MINDCTI Revenue Release Quarter One 2024MIND CTI

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Real Time Object Detection Using Open CVKhem

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Why Teams call analytics are critical to your entire businesspanagenda

Manulife - Insurer Innovation Award 2024The Digital Insurer

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

GenAI Risks & Security Meetup 01052024.pdflior mazor

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Dernier (20)

MINDCTI Revenue Release Quarter One 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Real Time Object Detection Using Open CV

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Top 10 Most Downloaded Games on Play Store in 2024

Partners Life - Insurer Innovation Award 2024

Artificial Intelligence Chap.5 : Uncertainty

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

HTML Injection Attacks: Impact and Mitigation Strategies

A Domino Admins Adventures (Engage 2024)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Why Teams call analytics are critical to your entire business

Manulife - Insurer Innovation Award 2024

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

GenAI Risks & Security Meetup 01052024.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

Analyzing Large-Scale User Data with Hadoop and HBase

2. Analyzing Large-Scale User Data with Hadoop and HBase Aaron Kimball – CTO WibiData, Inc.

3. We can now collect more data than at any time in history.

4. Yesterday’s engineering challenge: Fitting the problem into the hardware.

5. Today’s constrained resource is understanding.

6. How do we best apply data …to better serving our users?

7. The best products are user-centric • Intuitive UI • Continuously learning – Guided search – Smarter recommendations • More effective service

8. What are we building toward?

9. What are we building toward?

10. What are we building toward?

11. What are we building toward?

12. What are we building toward?

13. Requirements 1. Understand the user population

14. Requirements 2. Respond to users in real time

15. Requirements 3. Support graceful data evolution

16. Large-scale data science is hard • What does a user look like? – What data is available about the user? – Which features are important? – Which features are correlated? • How do I model this in MapReduce? • How do I serve results in a timely fashion?

17.

18. Tools of the trade • Store all data about a user in one place • Support real-time get/put, as well as MapReduce

19. Tools of the trade • Use complex data types to model complex data • Support extended data models over time • Retain support for legacy systems using older models

20. Tools of the trade • Abstract computational model away from MapReduce • Support computation over all users… or one user at a time

21.

22.

23.

24.

25. : for set-top boxes Viewing/recording history

26. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations

27. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap

28. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap Tech support portal

29. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Improved Analysis for reports for product roadmap Tech support portal advertisers

30. The future • More personalization • Adaptive UIs (self arranging dashboards) • Targeted content, ads • More effective customer service

31. Conclusions • Applications are becoming increasingly user- centric • Data drives this capability, but harnessing it requires a new distributed architecture • The biggest challenge is allowing data scientists to effectively leverage the data

32. www.wibidata.com / @wibidata Aaron Kimball – aaron@wibidata.com