16. Large-scale data science is hard
• What does a user look like?
– What data is available about the user?
– Which features are important?
– Which features are correlated?
• How do I model this in MapReduce?
• How do I serve results in a timely fashion?
17.
18. Tools of the trade
• Store all data about a user
in one place
• Support real-time get/put,
as well as MapReduce
19. Tools of the trade
• Use complex data types to
model complex data
• Support extended data
models over time
• Retain support for legacy
systems using older models
20. Tools of the trade
• Abstract computational
model away from MapReduce
• Support computation over all
users… or one user at a time
26. : for set-top boxes
Libraries
Device and User Analysis
Viewing/recording history
Personalized offers and
recommendations
27. : for set-top boxes
Libraries
Device and User Analysis
Viewing/recording history
Personalized offers and
recommendations
Analysis for
product roadmap
28. : for set-top boxes
Libraries
Device and User Analysis
Viewing/recording history
Personalized offers and
recommendations
Analysis for
product roadmap Tech support portal
29. : for set-top boxes
Libraries
Device and User Analysis
Viewing/recording history
Personalized offers and
recommendations
Improved
Analysis for
reports for
product roadmap Tech support portal
advertisers
30. The future
• More personalization
• Adaptive UIs (self arranging dashboards)
• Targeted content, ads
• More effective customer service
31. Conclusions
• Applications are becoming increasingly user-
centric
• Data drives this capability, but harnessing it
requires a new distributed architecture
• The biggest challenge is allowing data
scientists to effectively leverage the data