Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
© 2019 Ververica
Presenter Name & Title
PRESENTATION TITLE
The Apache Flink® Conference
Stream Processing | Event Driven |...
© 2019 Ververica2
A big thanks to our Sponsors
Platinum
Gold
Silver
Community
Flink Fest
Media
© 2019 Ververica3
A big thanks to our Program Committee
Fabian Hueske Dean Wampler
Sonali SharmaEric SammerStefan RichterT...
© 2019 Ververica4
A big thanks to our Speakers
© 2019 Ververica5
Flink Forward App
Rate speakers and sessions with
a chance to win a Flybrix drone!
Get involved!
Apache ...
© 2019 Ververica
Kostas Tzoumas
INTRODUCING VERVERICA
© 2019 Ververica7
Founded in 2014 by the original creators of Apache Flink to
commercialize the open source project and su...
© 2019 Ververica8
+
=
© 2019 Ververica9
Why?
• Alibaba has been the largest user of Flink and second largest
contributor for years
• Deeply comm...
© 2019 Ververica10
Flink at Alibaba (few examples)
Taobao is the largest e-commerce platform globally with more than 600 m...
© 2019 Ververica11
What’s in a name?
verum (“real” in Latin)
Understanding the truth about the world by
getting the real-t...
© 2019 Ververica12
What is Ververica?
1. Double down on the open source community and improve its health
and diversity
2. ...
© 2019 Ververica13
Ververica Commercial Products
Full continuation of our commercial products and
services
• Ververica Pla...
© 2019 Ververica14
Announcing: Ververica Partner Program
We are looking for partners to help us develop the broader Flink
...
© 2019 Ververica
Stephan Ewen
Xiaowei Jiang
Robert Metzger
From Stream Processor to
Unified Data Processing System
© 2019 Ververica16
Use Cases Presented Today
© 2019 Ververica17
Apache Flink and Public Clouds
cloud
on-prem
© 2019 Ververica
Data Processing Applications
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
more lag time
Batch
Processing
Continuous
Processing
Event-driven
Applications
Transactional
Processing
m...
© 2019 Ververica
The Relationship between
Batch and Streaming
© 2019 Ververica26
Everything Streams
That is about 60% of the truth…
© 2019 Ververica27
The remaining 40% of the truth
Continuous
Streaming
Batch
Processing
Data is incomplete
Latency SLAs
Co...
© 2019 Ververica28
The remaining 40% of the truth
Continuous
Streaming
Batch
Processing
Data is incomplete
Latency SLAs
Co...
© 2019 Ververica29
Streaming versus Batch Join
© 2019 Ververica30
Streaming versus Batch Join
2x RocksDB
LSM-Trees 1x Hybrid Hash Join
DataStream API DataSet API
push-ba...
© 2019 Ververica31
Exploiting the Batch Special Case
Planner/Optimizer
See also: "Towards Flink 2.0: Rethinking the stack ...
© 2019 Ververica
Stream Processing,
Analytics, and Applications
© 2019 Ververica33
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic t...
© 2019 Ververica
Applications
(physical)
Analytics
(declarative)
DataStream API Table API
Types are Java / Scala classes L...
© 2019 Ververica35
Rethinking the Flink Stack
Stream Operator & DAG API
Runtime
DataSet
(deprecated)
DataStream Table / SQ...
© 2019 Ververica
SQL, Notebooks,
and Machine Learning
© 2019 Ververica
JobGraphPhysical PlanTable API & SQL Logical Plan
Pluggable
Catalog
Query
Optimizer
SubQuery
Decorrelatio...
© 2019 Ververica
Sub-QueryANSI Syntax Data Type Join
Advanced
Aggregates
Window
Operator
Over Window TPC-H/TPC-DS
Function...
© 2019 Ververica
Performant
Operators
Operator codegen
HashAgg/Local-global
Agg
Improved HashJoin
Semi/Anti join
Vectoriza...
© 2019 Ververica
Blink
Spark
1T TPC-DS Queries
Blink
Spark
10T TPC-DS Queries
Blink
Spark
30T TPC-DS Queries
2000S
20000S
...
© 2019 Ververica
1.7B10K 10K
Sub-
Second 100TB
Flink SQL in Production at Alibaba
© 2019 Ververica
Common
Implementation
Modular/
Composable
Applications
Dynamic
Query Logic
Interactive
Programming
Ease o...
© 2019 Ververica
June, 2019
TableAPI Refactor
FLIP-32
July, 2019
Initial Blink Runner Merge
Flink 1.9 Release
Oct, 2019
Fu...
© 2019 Ververica
Flink Hive
+
For More?
Integrate Flink with Hive Ecosystem
Xuefu Zhang & Bowen Li, Alibaba 12:20pm - 1:00...
© 2019 Ververica
• Zeppelin Integration
Zeppelin support for Flink
© 2019 Ververica
Unified Interface ML algorithms Common Utilities
For More?
When Table meets AI: Build Flink AI Ecosystem ...
© 2019 Ververica
The Apache Flink Community
© 2019 Ververica48
A growing Apache Flink Community
… not only Flink’s codebase that is growing massively …
230 Emails / w...
© 2019 Ververica49
© 2019 Ververica50
Launch of a new Chinese language user support mailing list
© 2019 Ververica51
Growing the Contributors Community
• Cleanup & reorganization of the Jira
components
• Flinkbot: Improv...
© 2019 Ververica52
Flink Community Packages
© 2019 Ververica
Closing
© 2019 Ververica
The Apache Flink community is
more active than ever
Apache Flink continues to evolve with the
Stream Proc...
© 2019 Ververica
Thank you!
Prochain SlideShare
Chargement dans…5
×

KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified Data Processing System - Stephan Ewen & Xiaowei Jiang & Robert Metzger

238 vues

Publié le

The Apache Flink community has pushed (and continues to push) the boundary for Stream Processing over the last years, following the understanding that Stream Processing is unifying paradigm to build data processing applications, beyond real-time analytics.

The latest major effort in the Flink community is nothing less then re-architecting the API and runtime stack, with the goal to naturally support the spectrum of analytics and data-driven applications, to unify the APIs for batch and streaming (Table API and DataStream API), and to build a streaming runtime that is not only state-of-the-art in stream processing, but also in batch processing performance.

In this keynote, we give an overview of the goals and technology behind the above effort, and look at the adoption of Apache Flink for Stream Processing and "beyond streaming" use cases, as well as various efforts in the community to support the growth in users, applications, and ecosystem.

Publié dans : Technologie
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Soyez le premier à aimer ceci

KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified Data Processing System - Stephan Ewen & Xiaowei Jiang & Robert Metzger

  1. 1. © 2019 Ververica Presenter Name & Title PRESENTATION TITLE The Apache Flink® Conference Stream Processing | Event Driven | Real Time San Francisco 1-2, 2019
  2. 2. © 2019 Ververica2 A big thanks to our Sponsors Platinum Gold Silver Community Flink Fest Media
  3. 3. © 2019 Ververica3 A big thanks to our Program Committee Fabian Hueske Dean Wampler Sonali SharmaEric SammerStefan RichterTyler Akidau Jamie Grier
  4. 4. © 2019 Ververica4 A big thanks to our Speakers
  5. 5. © 2019 Ververica5 Flink Forward App Rate speakers and sessions with a chance to win a Flybrix drone! Get involved! Apache Flink User Survey Win a trip to one of the next Flink Forward conferences of your choice! Flink Forward Survey Help us improve the quality of Flink Forward. We appreciate your feedback! Community Contribution Sign up as a content contribut or for blog posts or speaking oppor tunities. Flink Forward Social Feed - View & Engage! Get involved
  6. 6. © 2019 Ververica Kostas Tzoumas INTRODUCING VERVERICA
  7. 7. © 2019 Ververica7 Founded in 2014 by the original creators of Apache Flink to commercialize the open source project and support the community
  8. 8. © 2019 Ververica8 + =
  9. 9. © 2019 Ververica9 Why? • Alibaba has been the largest user of Flink and second largest contributor for years • Deeply committed to open source and creating technological impact • Joining forces made a lot of sense for the two teams in order to collaborate even closer and accelerate their contributions to Flink
  10. 10. © 2019 Ververica10 Flink at Alibaba (few examples) Taobao is the largest e-commerce platform globally with more than 600 million monthly active users. Every time a user logs into the Taobao app they see a different landing page personalized for the user and depending on the latest real- time activity in the platform. Using Flink for real-time machine learning at Taobao has resulted in over 20% increase in purchase conversion rate. At peak during Singles Day last year, the system processed over 1.7 billion events/sec. In the Hangzhou City Brain Project, Flink is used to process in real-time data from a variety of sensors (traffic cameras, map applications, etc), and manage traffic signals in 128 intersections. The City Brain project has halved traveling times for ambulances and commuters. Traffic accidents can be detected immediately, and help can reach the accident site within 5 minutes.
  11. 11. © 2019 Ververica11 What’s in a name? verum (“real” in Latin) Understanding the truth about the world by getting the real-time view
  12. 12. © 2019 Ververica12 What is Ververica? 1. Double down on the open source community and improve its health and diversity 2. Contribute a number of innovations to the open source project starting with Alibaba’s Blink for batch processing 3. Create an ecosystem and foundation for the commercial success of Flink projects and products across the world Our #1 goal is to position Apache Flink for the next 10 years of its life
  13. 13. © 2019 Ververica13 Ververica Commercial Products Full continuation of our commercial products and services • Ververica Platform including Apache Flink, Application Manager, and Streaming Ledger • Apache Flink Training and Consulting Services • Enterprise Support A lot of innovation coming here as well leveraging existing work in Alibaba Cloud
  14. 14. © 2019 Ververica14 Announcing: Ververica Partner Program We are looking for partners to help us develop the broader Flink ecosystem • Ververica Platform Partner Preferred partners of our commercial products around the globe • Ververica Services Partner Service provider on Apache Flink certified by Ververica Sign up here! ververica.com/partner-program
  15. 15. © 2019 Ververica Stephan Ewen Xiaowei Jiang Robert Metzger From Stream Processor to Unified Data Processing System
  16. 16. © 2019 Ververica16 Use Cases Presented Today
  17. 17. © 2019 Ververica17 Apache Flink and Public Clouds cloud on-prem
  18. 18. © 2019 Ververica Data Processing Applications
  19. 19. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Stream Processing Data Pipelines Streaming Analytics
  20. 20. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Stream Processing Data Pipelines Streaming Analytics Batch Processing & Continuous Streaming Analytics & Applications
  21. 21. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Stream Processing Data Pipelines Streaming Analytics Flink community's focus over the last releases
  22. 22. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Recent Features Data Pipelines Streaming Analytics Time-versioned Joins MATCH_RECOGNIZE Schema Upgrades SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rideTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId) A B A X Y
  23. 23. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Stream Processing Data Pipelines Streaming Analytics "Steam Processing takes on ACID" by Seth Wiesman 11am, Nikko I
  24. 24. © 2019 Ververica more lag time Batch Processing Continuous Processing Event-driven Applications Transactional Processing more real time Stream Processing Data Pipelines Streaming Analytics SQL and SQL Ecosystem / tools Machine Learning Graphs Batch Performance Batch Fault Tolerance Interactive Queries Dashboards
  25. 25. © 2019 Ververica The Relationship between Batch and Streaming
  26. 26. © 2019 Ververica26 Everything Streams That is about 60% of the truth…
  27. 27. © 2019 Ververica27 The remaining 40% of the truth Continuous Streaming Batch Processing Data is incomplete Latency SLAs Completeness and Latency is a tradeoff Data is as complete as it gets within the job No Low Latency SLAs
  28. 28. © 2019 Ververica28 The remaining 40% of the truth Continuous Streaming Batch Processing Data is incomplete Latency SLAs Completeness and Latency is a tradeoff Data is as complete as it gets within the job No Low Latency SLAs
  29. 29. © 2019 Ververica29 Streaming versus Batch Join
  30. 30. © 2019 Ververica30 Streaming versus Batch Join 2x RocksDB LSM-Trees 1x Hybrid Hash Join DataStream API DataSet API push-based operators low-latency minimize in-flight data pull-based operators flexible data flow control high latency no checkpoints
  31. 31. © 2019 Ververica31 Exploiting the Batch Special Case Planner/Optimizer See also: "Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream" by Aljoscha Krettek, 2pm, Nikko II/III Continuous Operators Streaming Scheduler Rules Additional Bounded Operators Additional Scheduling Strategies if (bounded && non-incremental) activates additional optimizer choices Core operators, cover all cases
  32. 32. © 2019 Ververica Stream Processing, Analytics, and Applications
  33. 33. © 2019 Ververica33 Process Function (events, state, time) DataStream API (streams, windows) Stream SQL / Tables (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } How we showed the API stack in the past…
  34. 34. © 2019 Ververica Applications (physical) Analytics (declarative) DataStream API Table API Types are Java / Scala classes Logical Schema for Tables Transformation Functions Declarative Language (SQL, Table DSL) State implicit in operationsExplicit control over State Explicit control over Time SLAs define when to trigger Executes as described Automatic Optimization
  35. 35. © 2019 Ververica35 Rethinking the Flink Stack Stream Operator & DAG API Runtime DataSet (deprecated) DataStream Table / SQL Still possible to mix and match within a program
  36. 36. © 2019 Ververica SQL, Notebooks, and Machine Learning
  37. 37. © 2019 Ververica JobGraphPhysical PlanTable API & SQL Logical Plan Pluggable Catalog Query Optimizer SubQuery Decorrelation Filter/Project Push-Down Join Reorder … Adding a new Table API / SQL Query Processor (Blink)
  38. 38. © 2019 Ververica Sub-QueryANSI Syntax Data Type Join Advanced Aggregates Window Operator Over Window TPC-H/TPC-DS Functional Improvements
  39. 39. © 2019 Ververica Performant Operators Operator codegen HashAgg/Local-global Agg Improved HashJoin Semi/Anti join Vectorization Resource Optimizations Stats based estimation Dynamic memory allocation Expression Optimizations Record Format Operate binary data JVM intrinsics Hot method codegen Rich Stats NDV NULL count Avg length Max length Min Max Cost Based Join order Join type Agg strategy …… Advanced Rules Subplan reuse Join condition expansion Shuffle removal Distinct Agg rewrite …. Query Execution Query Optimizer Performance Improvements
  40. 40. © 2019 Ververica Blink Spark 1T TPC-DS Queries Blink Spark 10T TPC-DS Queries Blink Spark 30T TPC-DS Queries 2000S 20000S 50000S ? Batch SQL Benchmark
  41. 41. © 2019 Ververica 1.7B10K 10K Sub- Second 100TB Flink SQL in Production at Alibaba
  42. 42. © 2019 Ververica Common Implementation Modular/ Composable Applications Dynamic Query Logic Interactive Programming Ease of Use Table API
  43. 43. © 2019 Ververica June, 2019 TableAPI Refactor FLIP-32 July, 2019 Initial Blink Runner Merge Flink 1.9 Release Oct, 2019 Full Merge Table API Layer Flink Runner Blink Runner Blink SQL Merge Plan
  44. 44. © 2019 Ververica Flink Hive + For More? Integrate Flink with Hive Ecosystem Xuefu Zhang & Bowen Li, Alibaba 12:20pm - 1:00pm Carmel HMS Data FlinkHive Hive Integration
  45. 45. © 2019 Ververica • Zeppelin Integration Zeppelin support for Flink
  46. 46. © 2019 Ververica Unified Interface ML algorithms Common Utilities For More? When Table meets AI: Build Flink AI Ecosystem on Table API Shaoxuan Wang, Alibaba 4:30pm - 5:10pm Nikko II & II High performance ML library based on Flink Xu Yang, Alibaba 2:50pm - 3:10pm Carmel Clustering K-means Latent Dirichlet allocation (LDA) Bisecting k-means Gaussian Mixture Model (GMM) Regression Linear regression Lasso regression Ridge regression Generalized linear regression Survival regression Isotonic regression Classifier Binomial logistic regression Multinomial logistic regression Multilayer perceptron classifier Linear Support Vector Machine Naive Bayes Random Forest GBDT Decision Tree Others Collaborative filtering FP-Growth PrefixSpan Proposal for Machine Learning
  47. 47. © 2019 Ververica The Apache Flink Community
  48. 48. © 2019 Ververica48 A growing Apache Flink Community … not only Flink’s codebase that is growing massively … 230 Emails / week
  49. 49. © 2019 Ververica49
  50. 50. © 2019 Ververica50 Launch of a new Chinese language user support mailing list
  51. 51. © 2019 Ververica51 Growing the Contributors Community • Cleanup & reorganization of the Jira components • Flinkbot: Improve pull request reviews and labeling • Discussions about improving the contribution workflow • PMC mentoring new committer candidates • Flink Community Packages website
  52. 52. © 2019 Ververica52 Flink Community Packages
  53. 53. © 2019 Ververica Closing
  54. 54. © 2019 Ververica The Apache Flink community is more active than ever Apache Flink continues to evolve with the Stream Processing space. Seamlessly integrate analytics, machine learning, applications and very fast batch processing on top of stream processing
  55. 55. © 2019 Ververica Thank you!

×