Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Watching Pigs Fly with the
Netflix Hadoop Toolkit
Hadoop Summit 2013
San Jose, CA
Data should be accessible, easy to discover, and
easy to process for everyone.
Our Motivation
Our Users
Analysts Engineers
Hadoop Platform as a Service
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
Data Platform
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Forklift
(Data Movement)
Looper
(Backloadin...
Let’s solve a problem using the data!
Build a recommender.
But, what makes good recommendations?
Similarity
Personalization
COLORS!
COLORS!
Box art is colorful…
We’re Sorry
COLORS!
Box art is colorful…
Where can I find the data?
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin
(Metadata API)
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin
(Metadata API)
Create a dataset for box art and color.
Whether your dataset is large or small, being
able to visualize it makes it easier to explain.
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Sting
• Allows users to cache the results of a genie job
in memory
• Sub second response to OLAP style operations
(slicing...
Hive
Query
Schema
% Content Consumed / Hour
Hemlock
Grove
House of
Cards
Arrested
Development
Similarity
House of
Cards
Macbeth
Toddlers
& Tiaras
Star Trek:
Voyager
Personalization
# of subscribers X # of titles
= ???,000,…,000 (big data)
Big Data
Netflix Apache Pig
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Lipstick
• Allows users to visualize their data flow
• Allows users to see common errors
• Allows users to easily monitor ...
Lipstick
Overall Job
Progress
Logical
Plan
Overall Job
Progress
Logical Operator
(reduce side)
Logical Operator
(map side)
Map/Reduce Job
Intermediate Row Count
Records
Loaded
Hadoop
Counters
My Job has stalled.
Common Problem #1
Unoptimized/Optimized
Logical Plan Toggle
Dangling
Operator
I didn’t get the data I was expecting
Common Problem #2
I don’t understand why my job failed.
Common Problem #3
Failed Job
(light red background)
Successful Job
(light blue background)
Wrapping up
• Demos at the Netflix booth in the exhibit hall
(see more Lipstick, Sting, and Genie).
• Lipstick is part of ...
 Charles Smith: charsmith@netflix.com
 Jeff Magnusson: jmagnusson@netflix.com
Thank you!
Jobs: http://jobs.netflix.com
N...
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Prochain SlideShare
Chargement dans…5
×

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

3 707 vues

Publié le

Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.

From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.

Publié dans : Technologie, Business
  • Memory Improvement: How To Improve Your Memory In Just 30 Days, click here.. ★★★ https://tinyurl.com/brainpill101
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Your opinions matter! get paid BIG $$$ for them! START NOW!!.. ●●● https://tinyurl.com/realmoneystreams2019
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Earn $500 for taking a 1 hour paid survey! read more... ★★★ https://tinyurl.com/make2793amonth
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Earn $500 for taking a 1 hour paid survey! read more... ●●● https://tinyurl.com/realmoneystreams2019
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • The methods and techniques in the PE Bible are exclusive to this unique program. The two step system involves low cost off the shelf natural supplements and a specially designed exercise program. Many users experience gains of almost an inch within just a few weeks of starting this unique program! Imagine having 2-4 inches of extra length and girth added onto your penis size, this Penis Enlargement Bible makes it possible. Over 5000 copies of this product have already been sold, and unlike most products on the market there is real video proof from actual users that show REAL results. You can see the video here ●●● https://tinyurl.com/getpebible2019
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

  1. 1. Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA
  2. 2. Data should be accessible, easy to discover, and easy to process for everyone. Our Motivation
  3. 3. Our Users Analysts Engineers
  4. 4. Hadoop Platform as a Service
  5. 5. Hadoop Platform as a Service S3
  6. 6. Hadoop Platform as a Service Data Platform
  7. 7. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization) Forklift (Data Movement) Looper (Backloading) Ignite (A/B Test Analytics) Spock (Data Auditing) Genie (Hadoop PaaS) Lipstick (Pig Workflow Visualization) Event Service (Orchestration) Hadoop S3 Other Processing
  8. 8. Let’s solve a problem using the data!
  9. 9. Build a recommender.
  10. 10. But, what makes good recommendations? Similarity Personalization
  11. 11. COLORS!
  12. 12. COLORS! Box art is colorful…
  13. 13. We’re Sorry COLORS! Box art is colorful…
  14. 14. Where can I find the data?
  15. 15. Hadoop Platform as a Service S3
  16. 16. Hadoop Platform as a Service S3Cassandra TeradataRedshiftRDS
  17. 17. Data Platform as a Service Franklin (Metadata API) S3Cassandra TeradataRedshiftRDS
  18. 18. Data Platform as a Service Franklin (Metadata API)
  19. 19. Create a dataset for box art and color.
  20. 20. Whether your dataset is large or small, being able to visualize it makes it easier to explain.
  21. 21. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  22. 22. Sting • Allows users to cache the results of a genie job in memory • Sub second response to OLAP style operations (slicing, dicing, aggregations). • Adhoc / recurring schedule • Easy to use!
  23. 23. Hive Query Schema
  24. 24. % Content Consumed / Hour
  25. 25. Hemlock Grove House of Cards Arrested Development
  26. 26. Similarity
  27. 27. House of Cards Macbeth
  28. 28. Toddlers & Tiaras Star Trek: Voyager
  29. 29. Personalization
  30. 30. # of subscribers X # of titles = ???,000,…,000 (big data) Big Data
  31. 31. Netflix Apache Pig
  32. 32. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  33. 33. Lipstick • Allows users to visualize their data flow • Allows users to see common errors • Allows users to easily monitor their jobs • Empowers users to support themselves • Facilitates communication between infrastructure team and users
  34. 34. Lipstick
  35. 35. Overall Job Progress
  36. 36. Logical Plan Overall Job Progress
  37. 37. Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded
  38. 38. Hadoop Counters
  39. 39. My Job has stalled. Common Problem #1
  40. 40. Unoptimized/Optimized Logical Plan Toggle Dangling Operator
  41. 41. I didn’t get the data I was expecting Common Problem #2
  42. 42. I don’t understand why my job failed. Common Problem #3
  43. 43. Failed Job (light red background) Successful Job (light blue background)
  44. 44. Wrapping up • Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie). • Lipstick is part of Netflix OSS. • Clone it on github at http://github.com/Netflix/Lipstick • We welcome feedback and contributions!
  45. 45.  Charles Smith: charsmith@netflix.com  Jeff Magnusson: jmagnusson@netflix.com Thank you! Jobs: http://jobs.netflix.com Netflix OSS: http://netflix.github.io Tech Blog: http://techblog.netflix.com/

×