Polyglot Processing - An Introduction 1.0

800 vues

Publié le

0 commentaire
3 j’aime
  • Soyez le premier à commenter

Aucun téléchargement
Nombre de vues
Sur SlideShare
Issues des intégrations
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

Polyglot Processing - An Introduction 1.0

  1. 1. POLYGLOT PROCESSING – AN INTRODUCTION Dr. Mohan K. Bavirisetty Chief Scientist Modern Renaissance
  2. 2. Agenda 1. Big Data Landscape 2. Lambda vs. Kappa Architecture 3. Spark vs. Storm vs. Flink 4. Demo 1 – Apache Spark 5. Demo 2 – Storm, Kafka and Redis 6. Demo 3 – Flink with Data Stream API? 7. Summary 8. Questions The purpose of computing is insight not data – Richard Hamming
  4. 4. What is Big Data? Big data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making. Source: Gartner Research
  5. 5. What is a Real-time Analytics Platform?
  6. 6. • Batch Operations1 • Micro batch Operations2 • Real-time Streaming3 3 Common Kinds of Workloads “Evidence-based decision-making (aka Big Data) is not just the latest fad, it's the future of how we are going to guide and grow business.” – Kristen Hammond, CTO, Narrative Sciences
  7. 7. 8 Requirements of Real-time Computing Keep Data Moving Allow SQL Queries Handle Stream Imperfections Generate Predictable Outcomes Integrate Streaming Data and Stored Data Guarantee Data Safety and Availability Partition and Scale Applications Automatically Process and Respond Instantaneously
  8. 8. How do major data engines compare?
  9. 9. Real-time Streaming Architecture
  10. 10. Berkeley Data Analytics Stack
  11. 11. Polyglot ….. •One who is versed in many languages …Polyglot •Different languages, frameworks and services •Example Java with Scala, Clojure inside Trident Polyglot Programming •Capacity to store data in multiple formats •Structured, document, Log, GPS Polyglot Persistence •Refers to capability to process any kind of data, any kind of workload, any kind of workflow Polyglot Processing
  13. 13. Lambda Architecture
  14. 14. What is Apache Storm? Apache Storm is a free and open source distributed real-time computation system it makes it easy to reliably process unbounded streams of data.
  15. 15. Why Apache Storm? Storm is fast, horizontally scalable, fault-tolerant, easy to setup and operate and programming language agnostic
  16. 16. Apache Storm
  17. 17. Apache Storm can be used to realize an APM Use Case
  18. 18. Apache Spark Apache Spark is a fast and general engine for large-scale data processing. • Spark is fast • Spark is easy • Spark is extensible
  19. 19. Lambda Implementation with Spark
  20. 20. Kappa Architecture
  21. 21. Apache Flink
  22. 22. Apache Flink has unified runtime engine
  24. 24. SUMMARY
  25. 25. Summary • Big Data Challenges are being met with new and innovative approaches and architectures. • Lambda Architecture is a pragmatic near-term solution. Fidelity is already implementing it. • Kappa Architecture could turn out to be long-term elegant solution to Polyglot Processing. • Apache Spark, Strom and Flink have their strengths and niche areas of applicability. • Apache Samoa, Apache Zappelin and Tacheon add value further by providing additional capabilities
  26. 26. Maturity Time Descriptive Preventive/ Prescriptive Working Toward Analytics Mastery Predictive
  27. 27. Next Stage of Data Explosion
  28. 28. QUESTIONS? We do not learn by inference and deduction and the application of mathematics to philosophy, but by direct intercourse … - Henry David Thoreau
  29. 29. THANK YOU
  30. 30. Appendix- References and Resources • 8 Requirements of Real-time Stream Processing http://cs.brown.edu/~ugur/8rulesSigRec.pdf • Design Patterns for Real-Time Streaming Analytics http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774 • Big Data: Principles and Best Practices of Scalable Real-time Data Systems. http://bit.ly/1LscB7z • Real-time Stream Processing Next-Step for Apache Flink http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/ • SAMOA – Scalable Advanced Massive Online Analysis http://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf • Lambda Architecture http://lambda-architecture.net/ • Kappa Architecture http://www.kappa-architecture.com/ • Apache Spark http://spark.apache.org/ • Apache Storm https://storm.apache.org/ • Apache Flink https://flink.apache.org/ • Apache SAMOA https://samoa.incubator.apache.org/ • Apache Zappelin https://zeppelin.incubator.apache.org/ • Tacheon http://tachyon-project.org/