Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
The Secrets of Building  Realtime Big Data       Systems                Nathan Marz                @nathanmarz
Who am I?
Who am I?
Who am I?
Who am I?(Upcoming book)
BackType• >30 TB of data• Process 100M messages / day• Serve 300 requests / sec• 100 to 200 machine cluster• 3 full-time e...
Built on open-source                   Thrift                  Cascading                   Scribe                  ZeroMQ ...
What is a data system?                View 1  Raw data                View 2                View 3
What is a data system?               # Tweets /                  URL  Tweets                Influence                  scor...
Everything else: schemas, databases, indexing, etc are implementation
Essential properties of    a data system
1. Robust
1. Robustto machine failure
1. Robustto machine failureand human error
2. Low latency reads     and updates
3. Scalable
4. General
5. Extensible
6. Allows ad-hoc     analysis
7. Minimal maintenance
8. Debuggable
Layered Architecture       Speed Layer       Batch Layer
Let’s pretend temporarily thatupdate latency doesn’t matter
Let’s pretend it’s OK for a view to         lag by a few hours
Batch layer• Arbitrary computation• Horizontally scalable• High latency
Batch layer  Not the end-all-be-all of batchcomputation, but the most general
HadoopDistributed               DistributedFilesystem                FilesystemInput files                Output files      ...
Hadoop• Express your computation in terms of  MapReduce• Get parallelism and scalability “for free”
Batch layer• Store master copy of dataset• Master dataset is append-only
Batch layerview = fn(master dataset)
Batch layer                   MapReduce   BatchMaster dataset                               View 1                   MapRe...
Batch layer• In practice, too expensive to fully  recompute each view to get updates• A production batch workflow adds  min...
Incremental batch layer                                                Batch                                              ...
Batch layerRobust and fault-tolerant to both machineand human error.Low latency reads.Low latency updates.Scalable to incr...
Speed layerCompensate for high latency of updates to batch layer
Speed layerKey point: Only needs to compensate for  data not yet absorbed in serving layer
Speed layerKey point: Only needs to compensate for  data not yet absorbed in serving layer  Hours of data instead of years...
Application-level Queries  Batch Layer   Query                        Merge  Speed Layer   Query
Speed layerOnce data is absorbed into batch layer, can       discard speed layer results
Speed layer• Message passing• Incremental algorithms• Read/Write databases    • Riak    • Cassandra    • HBase    • etc.
Speed layerSignificantly more complex    than the batch layer
Speed layerBut the batch layer eventually  overrides the speed layer
Speed layerSo that complexity is transient
Flexibility in layered      architecture• Do slow and accurate algorithm in batch  layer• Do fast but approximate algorith...
Data modelEvery record is a single, discrete   fact at a moment in time
Data model• Alice lives in San Francisco as of time 12345• Bob and Gary are friends as of time 13723• Alice lives in New Y...
Data model• Remember: master dataset is append-only• A person can have multiple location  records• “Current location” is a...
Data model• Extremely useful having the full history for  each entity    • Doing analytics    • Recovering from mistakes (...
Data model                              Reshare: trueGender: female                                      Property         ...
Questions?   Twitter: @nathanmarzEmail: nathan.marz@gmail.com Web: http://nathanmarz.com
Prochain SlideShare
Chargement dans…5
×

The Secrets of Building Realtime Big Data Systems

141 813 vues

Publié le

The architectural principles behind building systems that scale to vast amounts of data and operate on that data in realtime.

Presented at POSSCON '11.

Publié dans : Technologie
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • 80% Win Rate? It's Not a BUG? [Proof Inside] ★★★ https://tinyurl.com/yxcmgjf5
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

The Secrets of Building Realtime Big Data Systems

  1. The Secrets of Building Realtime Big Data Systems Nathan Marz @nathanmarz
  2. Who am I?
  3. Who am I?
  4. Who am I?
  5. Who am I?(Upcoming book)
  6. BackType• >30 TB of data• Process 100M messages / day• Serve 300 requests / sec• 100 to 200 machine cluster• 3 full-time employees, 2 interns
  7. Built on open-source Thrift Cascading Scribe ZeroMQ Zookeeper Pallet
  8. What is a data system? View 1 Raw data View 2 View 3
  9. What is a data system? # Tweets / URL Tweets Influence scores Trending topics
  10. Everything else: schemas, databases, indexing, etc are implementation
  11. Essential properties of a data system
  12. 1. Robust
  13. 1. Robustto machine failure
  14. 1. Robustto machine failureand human error
  15. 2. Low latency reads and updates
  16. 3. Scalable
  17. 4. General
  18. 5. Extensible
  19. 6. Allows ad-hoc analysis
  20. 7. Minimal maintenance
  21. 8. Debuggable
  22. Layered Architecture Speed Layer Batch Layer
  23. Let’s pretend temporarily thatupdate latency doesn’t matter
  24. Let’s pretend it’s OK for a view to lag by a few hours
  25. Batch layer• Arbitrary computation• Horizontally scalable• High latency
  26. Batch layer Not the end-all-be-all of batchcomputation, but the most general
  27. HadoopDistributed DistributedFilesystem FilesystemInput files Output files MapReduceInput files Output filesInput files Output files
  28. Hadoop• Express your computation in terms of MapReduce• Get parallelism and scalability “for free”
  29. Batch layer• Store master copy of dataset• Master dataset is append-only
  30. Batch layerview = fn(master dataset)
  31. Batch layer MapReduce BatchMaster dataset View 1 MapReduce Batch View 2 Batch View 3 MapReduce
  32. Batch layer• In practice, too expensive to fully recompute each view to get updates• A production batch workflow adds minimum amount of incrementalization necessary for performance
  33. Incremental batch layer Batch View 1New data Batch View Batch View 2 maintenance workflow Query Append Batch View 3 All data
  34. Batch layerRobust and fault-tolerant to both machineand human error.Low latency reads.Low latency updates.Scalable to increases in data or traffic.Extensible to support new features or relatedservices.Generalizes to diverse types of data and requests.Allows ad hoc queries.Minimal maintenance.Debuggable: can trace how any value in thesystem came to be.
  35. Speed layerCompensate for high latency of updates to batch layer
  36. Speed layerKey point: Only needs to compensate for data not yet absorbed in serving layer
  37. Speed layerKey point: Only needs to compensate for data not yet absorbed in serving layer Hours of data instead of years of data
  38. Application-level Queries Batch Layer Query Merge Speed Layer Query
  39. Speed layerOnce data is absorbed into batch layer, can discard speed layer results
  40. Speed layer• Message passing• Incremental algorithms• Read/Write databases • Riak • Cassandra • HBase • etc.
  41. Speed layerSignificantly more complex than the batch layer
  42. Speed layerBut the batch layer eventually overrides the speed layer
  43. Speed layerSo that complexity is transient
  44. Flexibility in layered architecture• Do slow and accurate algorithm in batch layer• Do fast but approximate algorithm in speed layer• “Eventual accuracy”
  45. Data modelEvery record is a single, discrete fact at a moment in time
  46. Data model• Alice lives in San Francisco as of time 12345• Bob and Gary are friends as of time 13723• Alice lives in New York as of time 19827
  47. Data model• Remember: master dataset is append-only• A person can have multiple location records• “Current location” is a view on this data: pick location with most recent timestamp
  48. Data model• Extremely useful having the full history for each entity • Doing analytics • Recovering from mistakes (like writing bad data)
  49. Data model Reshare: trueGender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
  50. Questions? Twitter: @nathanmarzEmail: nathan.marz@gmail.com Web: http://nathanmarz.com

×