SlideShare une entreprise Scribd logo
1  sur  25
Huawei Cloud
Flink real-time analysis
in Cloud Stream Service
Jinkui Shi
Radu Tudoran
2018/04
Speakers
Jinkui Shi
Principal Engineer @ Huawei
Cloud
shijinkui@huawei.com
Radu Tudoran
Staff Engineer @ Huawei
Cloud
Radu.Tudoran@huawei.com
Background about Huawei Cloud
❖ Cloud BU
❖ Foundation at 2017/06
❖ Huawei Cloud
❖ HUAWEI CLOUD services-let enterprises use ICT
services in the same way as using water and electric
utilities.
Why choose Flink
❖ Graceful Runtime framework
❖ Rich Stream SQL function
❖ lightweight async checkpoint
❖ Real low latency and hight throughput
❖ expansibility: ML, Graph, Edge
Cloud Stream Service
❖ Cloud Stream Service (CS) :
Real-time big data stream analysis service on Huawei Cloud.
Compatible with Apache Flink and Spark APIs, CS also fully
managed computing clusters. Users just focus on StreamSQL or
UDF and run jobs in real time.
❖ CS is the first public cloud native service that choose
Flink as its Runtime computing engine in the world.
https://www.huaweicloud.com/en-us/product/cs.html
CS Overview
- Industrial IoT
- Car Internet
- exchange(BitCoin/Stock)
- Bank/insurance industry
- Electronic Commerce …
Make the computing easier
- Union batch and stream
- SQL and Job visualization
- Streaming monitoring
Connect everything
- Open Source source/sink
- Cloud Service source/sink
Features
easy-to-use, serverless, fully-managed, safe, High cost performance
Cost Comparison (Reference)
Item Offline Environment Buildup CS Saved Cost
Hardware cost
80,000 x 3 = 105,000 CNY
The hardware cost of a single
physical machine is 80
thousand CNY. The cost is for
reference only.
0.5 x 20 x 24 x 30 x 12 x 3 =
259,000 CNY
Users are charged 0.5 CNY
per hour for a single SPU. 20
SPUs are purchased.
O&M manpower cost 200,000 CNY/man-year 0
Water/Electricity/DC
maintenance 76300 CNY/year 0
Total 516,300 CNY 259,000 CNY 42.9%
To achieve the same computing capability
CS saves:42.9% costs
Job types
❖ Flink SQL: First-class citizen for easy-use
❖ Flink Jar job: FlinkML, Gelly, CEP, SQL
❖ Spark Streaming and structured streaming Jar job
❖ PySpark Jar Job
❖ Edge Computing Job: beta now
Connect to Ecosystem
❖ Open Source Connectors(Flink connector and
Bahir Flink)
❖ Connect to cloud native service in Huawei Cloud
Problem of Connection API adapter:
1. define unified connector API between Flink and Spark such as Kafka, JDBC connector..
2. define cloud service general connector API such as object bucket storage..
Apache Bahir need more contributions.
Online Stream SQL editor
SPU: Stream Processing Units, 1 core and 4G memory
https://console.huaweicloud.com/cs
Visualization[vɪʒʊəlɪ'zeʃən]
❖ runtime monitoring
❖ for dev: editor, notebook
❖ for prod: pipeline, DSL
Flink Benchmark - chicken ribs
❖ Standard benchmark problem:
❖ just focus on performance and supposed use case
❖ can’t cover all the API and feature
❖ performance only show your best, no worst case
❖ Enterprise care more reliability and best practice
Flink Reliability benchmark
❖ Test metric dimensionality for every API:
❖ overall source generating rate:
❖ fixed rate, rapid rate, index rate
❖ data skew and backpresure
❖ Job.ratio= max{Vertex.ratio | Vertex∈Job};
❖ Vertex.ratio = max{SubTask.ratio | SubTask∈Vertex}
❖ latency
❖ job latency: source generate rate and job processing rate
❖ event latency: the time cost between source and sink
❖ throughput and GC …
AutoRun a large-scale test to find Flink that may encounter runtime memory overflow,
calculation result error, run-time reliability problems, and collect metrics of anti-pressure,
latency, throughput, memory, CPU, rate to analyze the reasons for the reliability problem.
Flink ReliabilityBench project
❖ The generated report include all API
❖ In next half year, we’ll publish Flink reliability bench and
standard benchmark to Cloud Stream Service
❖ User just set the needed resource, then auto run the
bench, generate a final report for tuning and best
practice guide
Welcome everyone and Flink community to try it then
Some problem
❖ In SQL, how expression JSON and OpenTSDB, and other data format?
❖ SQL with phrase:
❖ how make a general and extensible rule to support all connector?
❖ how support general and extensible cloud standard, like object
bucket storage..
❖ API server?
❖ manage job lifetime and metric
❖ For job, input the source data, …, output sink data with Streaming
API
❖ sink reliability support for external Write ahead log framework:
❖ source1 - processing – sink1 – source2 - processing - sink –
source2 - …
maybe lost data
Intelligent Streaming Computing
❖ Open Source framework
❖ Streaming+ML: Spark MLlib, pySpark, Flink ML
❖ Streaming+Graph: Spark GraphX, Flink Gelly
❖ SQL: bonding the above by UDF
Stream Analysis is not enough, Intelligent framework is need.
If we make less efforts, maybe surpassed by others quickly.
Keep hunger
Scenario 1: streaming trading analysis
Just a example diagram for showing. From sohu site.
1. Disorder stream data for K line
charts of 5min, 15min, 30min, 60min
2. Aggregate streaming data at window
3. Low latency
BitCoin trading pain spots
Cloud StreamDIS
Kafka Flink
Cloud Table
OpenTSDB
HBas
e
Spark
DCS(Redis)
Huawei
Cloud
solution
Scenario 3: Stream Analysis and ETL
CS uses jobs of the Flink SQL,
Flink, and Spark Streaming types to
conduct exception detection, real-
time alarm reporting, and CEP-
based processing on stream data.
Feedback/decision-
making/monitoring: Based on the
positive feedback during service
running and monitoring information,
CS provides guidance for positive
product optimization, loss stop,
quantization, and visualization.
Enhanced Statistics and ML Features
Extraction
Design Principles
• Incremental
computation
• Fixed size
memory
• Constant to sub-
linear time
complexity
Enhanced Statistics and ML Features
Extraction
𝑆2
= 𝑦 − 𝑓 𝑥𝑖, 𝛽1, 𝛽2, 𝛽3, … , 𝛽 𝑛
2
For the linear fit:
𝑆2
= 𝑦𝑖 − 𝑓 𝛽1 + 𝛽2 𝑥𝑖
2
𝛽2 =
𝑠 𝑥𝑦,𝑡
𝑚2,𝑡
2
𝛽1 = 𝑦 − 𝛽2 𝑥
Regression parameters
𝑚2,𝑡 = 𝑚2,𝑡−1 + (𝑥 𝑡 − 𝑥 𝑡−1)(𝑥 𝑡 − 𝑥 𝑡)
Incremental variance (2nd central moment)
𝑥 = 𝑥 𝑡−1 +
1
𝑡
(𝑥 𝑡 − 𝑥 𝑡−1)
Incremental mean
In general:
𝑠 𝑥𝑦,𝑡 =
𝑡 − 2
𝑡 − 1
𝑠 𝑥𝑦,𝑡−1 +
1
𝑡
𝑥 𝑡 − 𝑥 𝑡−1 𝑦𝑡 − 𝑦𝑡−1
Incremental covariance
Online Linear Regression Learner
Execution time (s)
Trhoughput(ev)
Time range (ms)
Events
Latency analysis
Throughput analysis
GeoSepatial
• DDL for Time Geospatial
• ST_Point
• ST_Line
• ST_Polygon
• SQL Geospatial Scalar Functions
• ST_CONTAINS
• ST_COVERS
• ST_DISJOINT
• ST_BUFFER
• ST_INTERSECTION
• ST_ENVELOPE
• SQL Time Geospatial
• AGG_DISTANCE
• AVG_SPEED
• … on HOP/TUMBLE/OVER/SESSION windows
• …on count/time windows
• ….on rowtime/proctime windows
•Huawei offers complete coverage of geospatial standard plus extra time-
based functions
• ST_DISTANCE
• ST_PERIMETER
• ST_AREA (polygon)
• ST_OVERLAPS
• ST_INTERSECTS
• ST_WITHIN
Realtime IoT Analytics
Flink IoT Stream Engine
Deploy Execute
Geometry
Engine
GeoSpatial
function
User
Define
Function Geometry
Engine
GeoSpatial
function
Stream Topology
Stream SQL IoT
Translation
Optimizatio
n
IoT Op. Library
SQL IoT
Fun.
SQL IoT Functions
•ST_DISTANCE
• ST_PERIMETER
• ST_AREA
(polygon)
• ST_OVERLAPS
• ST_INTERSECTS
• ST_WITHIN
•…
• ST_CONTAINS
• ST_COVERS
• ST_DISJOINT
• ST_BUFFER
•ST_INTERSECTION
•ST_ENVELOPE
•…
Stream IoT Operators
•Window Tumble Count/
Time
•Window Hop Count/
Time
•Window Session Count/
Time
•Process Function
•Map
•FlatMap
Stream SQL Time
GeoSpatial Analytics
Submit
Continuous data
GeoSepatial Examples
Select if cars deviate from road
SELECT carId FROM CarStream
WHERE ST_WITHIN( +
ST_POINT( car.lat, car.lon),
ST_BUFFER( ST_ROAD_FROM_FILE(file), 2.0))
Compute Time Aggregates over Spatial Data
SELECT timestampa, lat, lon,
AGG_DISTANCE( ST_POINT(lat, lon)) OVER (
PARTITION BY carid ORDER BY proctime RANGE BETWEEN
INTERVAL '1' HOUR PRECEDING AND CURRENT ROW),
AVG_SPEED( ST_POINT(lat, lon)) OVER (
PARTITION BY carid ORDER BY proctime RANGE BETWEEN
INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)
FROM CarStream
Filter by region
SELECT timestampr, lat, lon, speed
FROM CarStream
WHERE ST_WITHIN( ST_POINT(lat, lon), ST_POLYGON( ARRAY[
ST_POINT(53.454326,7.334517),
ST_POINT(53.682480, 13.906822),
ST_POINT(47.761194, 12.607594),
ST_POINT(47.722358, 7.601213),
ST_POINT(53.454326,7.334517)]))
Flink CEP on SQL enhance
SQL CEP Syntax
SELECT * FROM stream...
MATCH_RECOGNIZE (
[row_pattern_partition_by ]
[row_pattern_order_by ]
[row_pattern_measures ]
[row_pattern_rows_per_match ]
[row_pattern_skip_to ]
PATTERN (row_pattern) [with_in clause]
[duration clause]
[row_pattern_subset_clause]
DEFINE row_pattern_definition_list )
Define pattern matching computation
Offer complete syntax
coverage for real time
CEP analytics
SELECT * FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
MEASURES
FINAL FIRST(A.price) AS firstAPrice,
FIINAL FIRST(B.price) AS firstBPrice,
FINAL FIRST(C.price) AS firstCPrice,
FINAL LAST(A.price) AS lastAPrice,
FINAL LAST(B.price) AS lastBPrice,
FINAL LAST(C.price) AS lastCPrice
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN ((A B C){2})
DEFINE A AS A.price < 50, B AS B.price < 30,
C AS C.price < 70 ) # Events: ~2.5M # Matched events: ~ 100K
# Stocks: 7 Average latency: ~ 27.13 ms
Thanks

Contenu connexe

Tendances

Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYCNeo4j
 
Introducing r3 corda™ a distributed ledger designed for financial services
Introducing r3 corda™  a distributed ledger designed for financial servicesIntroducing r3 corda™  a distributed ledger designed for financial services
Introducing r3 corda™ a distributed ledger designed for financial servicesRazi Rais
 
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...Edureka!
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)Lucas Jellema
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
 
From distributed caches to in-memory data grids
From distributed caches to in-memory data gridsFrom distributed caches to in-memory data grids
From distributed caches to in-memory data gridsMax Alexejev
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cacheSebastian Andrasoni
 
Reddit/Quora Software System Design
Reddit/Quora Software System DesignReddit/Quora Software System Design
Reddit/Quora Software System DesignElia Ahadi
 
Smart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinSmart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinJim McKeeth
 
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S... Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...HostedbyConfluent
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxyconfluent
 
Aave General Presentation
Aave General PresentationAave General Presentation
Aave General PresentationVanessa Lošić
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
 
A Secure Model of IoT Using Blockchain
A Secure Model of IoT Using BlockchainA Secure Model of IoT Using Blockchain
A Secure Model of IoT Using BlockchainAltoros
 
How Blockchain Can Be Used In Big Data Analytics
How Blockchain Can Be Used In Big Data AnalyticsHow Blockchain Can Be Used In Big Data Analytics
How Blockchain Can Be Used In Big Data AnalyticsBibrainia
 
PoW vs. PoS - Key Differences
PoW vs. PoS - Key DifferencesPoW vs. PoS - Key Differences
PoW vs. PoS - Key Differences101 Blockchains
 
Prudential Financial – Insurer Innovation Award 2023
Prudential Financial – Insurer Innovation Award 2023Prudential Financial – Insurer Innovation Award 2023
Prudential Financial – Insurer Innovation Award 2023The Digital Insurer
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekDatabricks
 

Tendances (20)

Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
 
Introducing r3 corda™ a distributed ledger designed for financial services
Introducing r3 corda™  a distributed ledger designed for financial servicesIntroducing r3 corda™  a distributed ledger designed for financial services
Introducing r3 corda™ a distributed ledger designed for financial services
 
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
 
Open Banking APIs on AWS
Open Banking APIs on AWSOpen Banking APIs on AWS
Open Banking APIs on AWS
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
From distributed caches to in-memory data grids
From distributed caches to in-memory data gridsFrom distributed caches to in-memory data grids
From distributed caches to in-memory data grids
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cache
 
Reddit/Quora Software System Design
Reddit/Quora Software System DesignReddit/Quora Software System Design
Reddit/Quora Software System Design
 
Smart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinSmart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond Bitcoin
 
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S... Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Aave General Presentation
Aave General PresentationAave General Presentation
Aave General Presentation
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
A Secure Model of IoT Using Blockchain
A Secure Model of IoT Using BlockchainA Secure Model of IoT Using Blockchain
A Secure Model of IoT Using Blockchain
 
How Blockchain Can Be Used In Big Data Analytics
How Blockchain Can Be Used In Big Data AnalyticsHow Blockchain Can Be Used In Big Data Analytics
How Blockchain Can Be Used In Big Data Analytics
 
PoW vs. PoS - Key Differences
PoW vs. PoS - Key DifferencesPoW vs. PoS - Key Differences
PoW vs. PoS - Key Differences
 
Prudential Financial – Insurer Innovation Award 2023
Prudential Financial – Insurer Innovation Award 2023Prudential Financial – Insurer Innovation Award 2023
Prudential Financial – Insurer Innovation Award 2023
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
 

Similaire à Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieVMware Tanzu
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasVMware Tanzu
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...VMware Tanzu
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Devoxx 2018 - Pivotal and AxonIQ - Quickstart your event driven architecture
Devoxx 2018 -  Pivotal and AxonIQ - Quickstart your event driven architectureDevoxx 2018 -  Pivotal and AxonIQ - Quickstart your event driven architecture
Devoxx 2018 - Pivotal and AxonIQ - Quickstart your event driven architectureBen Wilcock
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...VMware Tanzu
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Microservices with kubernetes @190316
Microservices with kubernetes @190316Microservices with kubernetes @190316
Microservices with kubernetes @190316Jupil Hwang
 

Similaire à Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud" (20)

XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Devoxx 2018 - Pivotal and AxonIQ - Quickstart your event driven architecture
Devoxx 2018 -  Pivotal and AxonIQ - Quickstart your event driven architectureDevoxx 2018 -  Pivotal and AxonIQ - Quickstart your event driven architecture
Devoxx 2018 - Pivotal and AxonIQ - Quickstart your event driven architecture
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Microservices with kubernetes @190316
Microservices with kubernetes @190316Microservices with kubernetes @190316
Microservices with kubernetes @190316
 

Plus de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Plus de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

  • 1. Huawei Cloud Flink real-time analysis in Cloud Stream Service Jinkui Shi Radu Tudoran 2018/04
  • 2. Speakers Jinkui Shi Principal Engineer @ Huawei Cloud shijinkui@huawei.com Radu Tudoran Staff Engineer @ Huawei Cloud Radu.Tudoran@huawei.com
  • 3. Background about Huawei Cloud ❖ Cloud BU ❖ Foundation at 2017/06 ❖ Huawei Cloud ❖ HUAWEI CLOUD services-let enterprises use ICT services in the same way as using water and electric utilities.
  • 4. Why choose Flink ❖ Graceful Runtime framework ❖ Rich Stream SQL function ❖ lightweight async checkpoint ❖ Real low latency and hight throughput ❖ expansibility: ML, Graph, Edge
  • 5. Cloud Stream Service ❖ Cloud Stream Service (CS) : Real-time big data stream analysis service on Huawei Cloud. Compatible with Apache Flink and Spark APIs, CS also fully managed computing clusters. Users just focus on StreamSQL or UDF and run jobs in real time. ❖ CS is the first public cloud native service that choose Flink as its Runtime computing engine in the world. https://www.huaweicloud.com/en-us/product/cs.html
  • 6. CS Overview - Industrial IoT - Car Internet - exchange(BitCoin/Stock) - Bank/insurance industry - Electronic Commerce … Make the computing easier - Union batch and stream - SQL and Job visualization - Streaming monitoring Connect everything - Open Source source/sink - Cloud Service source/sink
  • 8. Cost Comparison (Reference) Item Offline Environment Buildup CS Saved Cost Hardware cost 80,000 x 3 = 105,000 CNY The hardware cost of a single physical machine is 80 thousand CNY. The cost is for reference only. 0.5 x 20 x 24 x 30 x 12 x 3 = 259,000 CNY Users are charged 0.5 CNY per hour for a single SPU. 20 SPUs are purchased. O&M manpower cost 200,000 CNY/man-year 0 Water/Electricity/DC maintenance 76300 CNY/year 0 Total 516,300 CNY 259,000 CNY 42.9% To achieve the same computing capability CS saves:42.9% costs
  • 9. Job types ❖ Flink SQL: First-class citizen for easy-use ❖ Flink Jar job: FlinkML, Gelly, CEP, SQL ❖ Spark Streaming and structured streaming Jar job ❖ PySpark Jar Job ❖ Edge Computing Job: beta now
  • 10. Connect to Ecosystem ❖ Open Source Connectors(Flink connector and Bahir Flink) ❖ Connect to cloud native service in Huawei Cloud Problem of Connection API adapter: 1. define unified connector API between Flink and Spark such as Kafka, JDBC connector.. 2. define cloud service general connector API such as object bucket storage.. Apache Bahir need more contributions.
  • 11. Online Stream SQL editor SPU: Stream Processing Units, 1 core and 4G memory https://console.huaweicloud.com/cs
  • 12. Visualization[vɪʒʊəlɪ'zeʃən] ❖ runtime monitoring ❖ for dev: editor, notebook ❖ for prod: pipeline, DSL
  • 13. Flink Benchmark - chicken ribs ❖ Standard benchmark problem: ❖ just focus on performance and supposed use case ❖ can’t cover all the API and feature ❖ performance only show your best, no worst case ❖ Enterprise care more reliability and best practice
  • 14. Flink Reliability benchmark ❖ Test metric dimensionality for every API: ❖ overall source generating rate: ❖ fixed rate, rapid rate, index rate ❖ data skew and backpresure ❖ Job.ratio= max{Vertex.ratio | Vertex∈Job}; ❖ Vertex.ratio = max{SubTask.ratio | SubTask∈Vertex} ❖ latency ❖ job latency: source generate rate and job processing rate ❖ event latency: the time cost between source and sink ❖ throughput and GC … AutoRun a large-scale test to find Flink that may encounter runtime memory overflow, calculation result error, run-time reliability problems, and collect metrics of anti-pressure, latency, throughput, memory, CPU, rate to analyze the reasons for the reliability problem.
  • 15. Flink ReliabilityBench project ❖ The generated report include all API ❖ In next half year, we’ll publish Flink reliability bench and standard benchmark to Cloud Stream Service ❖ User just set the needed resource, then auto run the bench, generate a final report for tuning and best practice guide Welcome everyone and Flink community to try it then
  • 16. Some problem ❖ In SQL, how expression JSON and OpenTSDB, and other data format? ❖ SQL with phrase: ❖ how make a general and extensible rule to support all connector? ❖ how support general and extensible cloud standard, like object bucket storage.. ❖ API server? ❖ manage job lifetime and metric ❖ For job, input the source data, …, output sink data with Streaming API ❖ sink reliability support for external Write ahead log framework: ❖ source1 - processing – sink1 – source2 - processing - sink – source2 - … maybe lost data
  • 17. Intelligent Streaming Computing ❖ Open Source framework ❖ Streaming+ML: Spark MLlib, pySpark, Flink ML ❖ Streaming+Graph: Spark GraphX, Flink Gelly ❖ SQL: bonding the above by UDF Stream Analysis is not enough, Intelligent framework is need. If we make less efforts, maybe surpassed by others quickly. Keep hunger
  • 18. Scenario 1: streaming trading analysis Just a example diagram for showing. From sohu site. 1. Disorder stream data for K line charts of 5min, 15min, 30min, 60min 2. Aggregate streaming data at window 3. Low latency BitCoin trading pain spots Cloud StreamDIS Kafka Flink Cloud Table OpenTSDB HBas e Spark DCS(Redis) Huawei Cloud solution
  • 19. Scenario 3: Stream Analysis and ETL CS uses jobs of the Flink SQL, Flink, and Spark Streaming types to conduct exception detection, real- time alarm reporting, and CEP- based processing on stream data. Feedback/decision- making/monitoring: Based on the positive feedback during service running and monitoring information, CS provides guidance for positive product optimization, loss stop, quantization, and visualization.
  • 20. Enhanced Statistics and ML Features Extraction Design Principles • Incremental computation • Fixed size memory • Constant to sub- linear time complexity
  • 21. Enhanced Statistics and ML Features Extraction 𝑆2 = 𝑦 − 𝑓 𝑥𝑖, 𝛽1, 𝛽2, 𝛽3, … , 𝛽 𝑛 2 For the linear fit: 𝑆2 = 𝑦𝑖 − 𝑓 𝛽1 + 𝛽2 𝑥𝑖 2 𝛽2 = 𝑠 𝑥𝑦,𝑡 𝑚2,𝑡 2 𝛽1 = 𝑦 − 𝛽2 𝑥 Regression parameters 𝑚2,𝑡 = 𝑚2,𝑡−1 + (𝑥 𝑡 − 𝑥 𝑡−1)(𝑥 𝑡 − 𝑥 𝑡) Incremental variance (2nd central moment) 𝑥 = 𝑥 𝑡−1 + 1 𝑡 (𝑥 𝑡 − 𝑥 𝑡−1) Incremental mean In general: 𝑠 𝑥𝑦,𝑡 = 𝑡 − 2 𝑡 − 1 𝑠 𝑥𝑦,𝑡−1 + 1 𝑡 𝑥 𝑡 − 𝑥 𝑡−1 𝑦𝑡 − 𝑦𝑡−1 Incremental covariance Online Linear Regression Learner Execution time (s) Trhoughput(ev) Time range (ms) Events Latency analysis Throughput analysis
  • 22. GeoSepatial • DDL for Time Geospatial • ST_Point • ST_Line • ST_Polygon • SQL Geospatial Scalar Functions • ST_CONTAINS • ST_COVERS • ST_DISJOINT • ST_BUFFER • ST_INTERSECTION • ST_ENVELOPE • SQL Time Geospatial • AGG_DISTANCE • AVG_SPEED • … on HOP/TUMBLE/OVER/SESSION windows • …on count/time windows • ….on rowtime/proctime windows •Huawei offers complete coverage of geospatial standard plus extra time- based functions • ST_DISTANCE • ST_PERIMETER • ST_AREA (polygon) • ST_OVERLAPS • ST_INTERSECTS • ST_WITHIN Realtime IoT Analytics Flink IoT Stream Engine Deploy Execute Geometry Engine GeoSpatial function User Define Function Geometry Engine GeoSpatial function Stream Topology Stream SQL IoT Translation Optimizatio n IoT Op. Library SQL IoT Fun. SQL IoT Functions •ST_DISTANCE • ST_PERIMETER • ST_AREA (polygon) • ST_OVERLAPS • ST_INTERSECTS • ST_WITHIN •… • ST_CONTAINS • ST_COVERS • ST_DISJOINT • ST_BUFFER •ST_INTERSECTION •ST_ENVELOPE •… Stream IoT Operators •Window Tumble Count/ Time •Window Hop Count/ Time •Window Session Count/ Time •Process Function •Map •FlatMap Stream SQL Time GeoSpatial Analytics Submit Continuous data
  • 23. GeoSepatial Examples Select if cars deviate from road SELECT carId FROM CarStream WHERE ST_WITHIN( + ST_POINT( car.lat, car.lon), ST_BUFFER( ST_ROAD_FROM_FILE(file), 2.0)) Compute Time Aggregates over Spatial Data SELECT timestampa, lat, lon, AGG_DISTANCE( ST_POINT(lat, lon)) OVER ( PARTITION BY carid ORDER BY proctime RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW), AVG_SPEED( ST_POINT(lat, lon)) OVER ( PARTITION BY carid ORDER BY proctime RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW) FROM CarStream Filter by region SELECT timestampr, lat, lon, speed FROM CarStream WHERE ST_WITHIN( ST_POINT(lat, lon), ST_POLYGON( ARRAY[ ST_POINT(53.454326,7.334517), ST_POINT(53.682480, 13.906822), ST_POINT(47.761194, 12.607594), ST_POINT(47.722358, 7.601213), ST_POINT(53.454326,7.334517)]))
  • 24. Flink CEP on SQL enhance SQL CEP Syntax SELECT * FROM stream... MATCH_RECOGNIZE ( [row_pattern_partition_by ] [row_pattern_order_by ] [row_pattern_measures ] [row_pattern_rows_per_match ] [row_pattern_skip_to ] PATTERN (row_pattern) [with_in clause] [duration clause] [row_pattern_subset_clause] DEFINE row_pattern_definition_list ) Define pattern matching computation Offer complete syntax coverage for real time CEP analytics SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol MEASURES FINAL FIRST(A.price) AS firstAPrice, FIINAL FIRST(B.price) AS firstBPrice, FINAL FIRST(C.price) AS firstCPrice, FINAL LAST(A.price) AS lastAPrice, FINAL LAST(B.price) AS lastBPrice, FINAL LAST(C.price) AS lastCPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN ((A B C){2}) DEFINE A AS A.price < 50, B AS B.price < 30, C AS C.price < 70 ) # Events: ~2.5M # Matched events: ~ 100K # Stocks: 7 Average latency: ~ 27.13 ms

Notes de l'éditeur

  1. In the past year, what we have done with flink.
  2. I’m Jinkui Shi, come from Huawei Hangzhou office.Now I work on CloudStream Service of Huawei Cloud. I ever worked in Sohu and Alibaba, at then I’m interesting at Spark and microservice. Recent two years, I’m focus developing product with Flink and spark streaming.
  3. Huawei CloudBU founded at June 2017. CloudBU is top business division. At the past half an year, there are about hundreds new service created at HuaweiCloud. I’m Huawei Cloud EI(enterprise intelligence), which include Bigdata service,machine learning and AI services.
  4. Before I join Huawei, there is a Streaming product called StreamSmart writed by C++, it support CQL. The first question why we choose Flink. There lots of streaming framework such as storm, jstrom, heron, kafka stream, apex, samza, nifi, akka stream, beam and so on. Finally we choose apache Flink as our runtime executing engine because of Flink have graceful dataflow Runtime framework, rich stream SQL function, lightweight async checkpoint mechanism, really low latency and hight throughput. After these basic ability Flink also support machine learning and graph, also can run on edge device. Indeed we developed huawei Flink release version that we add some advanced features such as GeoSpatial, Dr. Radu will introduce then.
  5. CS is the first cloud native service that choose Flink as its Runtime computing engine in the world. Cloud Stream service is new service at Huawei Cloud, design at May 2017, after less than three month development we beta it at huawei cloud. At March 7th 2017 we release it officially. CloudStream basic ability is streaming analysis such as ETL/abnormal detective. CloudStream choose Flink as main runtime engine, also support spark streaming and spark other parts. We firstly provide SQL editor.
  6. CloudStream have three parts. Firstly is use case and industry, we provide some template for different use case. Secondly for making the computing easier, the runtime executing engine we support Flink and Spark at same time. So use can run Flink SQL, machine learning algorithm include Spark MLlib, FlinkML, Dl4j framework, graph framework include Spark GraphX and Flink Gelly. Flink IoT enhance feature and CEP enhance feature. Thirdly at runtime having rich connecters is very important. CloudStream now support Flink open source connectors by VPC cluster and HuaweiCloud Service connectors. Apache Bahir is a good connectors toolkit but need more improvement and keep the API compatibility with spark.
  7. This is the main features of cloud stream service. Easy-to-use: we provide sql editor to finish the business online and submit the job directly or just test the SQL. Every job has runtime monitor for execution graph and data stream statistic visualization. Pay-as-you-go: User just pay for what the running job’s costs. The payment unit is SPU called Stream Processing Units which include 1 cpu core and four gigabytes(GB) memory, every SPU just half of one China Yuan per hour. Secure and reliable: CloudStream provide two fully-managed cluster, one is sharing cluster for Flink SQL without UDF, the other is exclusive cluster for Flink Jar job and Spark Jar job. The exclusive need pay extra six SPU for exclusive management. Exclusive cluster just run one user or tenant user’s job. So it’s safest.
  8. We diff the costs between Cloud Stream Service and offline. For the same cpu and memory resource, CS save 42.9 costs, it’s very exciting.
  9. CloudStream now support five kinds of job: Flink SQL, Flink jar job include any UDF, Spark job include any UDF, pyspark jar job, and the edge job which is beta.
  10. The connectors of CloudStream covers open source connectors and Huawei Cloud native service. We also find some problem for improving, first is define unified connector API for the same connector between Flink and Spark such as Kafka, JDBC.. The other is defining general cloud service standard such as object bucket storage.. It’ll be useful for user exchange between Flink and Spark or other framework. I think apache Baihri framework need more efforts.
  11. User just create a job from defined template, and modify the parameter and business logic. After that choose SPU amount, set the checkpoint storage with Object bucket service. Then click submit button. Then the job will be submit to the cluster. In the next half of this year, we’ll publish resource costs estimation feature which cloudstream can auto-estimated how many SPU current job need.
  12. Visualization include SQL editor, job monitoering. SQL visualization, nodebook, job pipline and DSL are ongoing. Visualization cover job developing, job runtime metric monitoring, streaming data sample and sink data show.
  13. chicken ribs is a Chinese story. It means things have not enouth value and a little pity for giving up. The benchmark result is just a reference.
  14. Flink在请求反压计算时,JobManager 会通过 Akka 给每个 TaskManager 发送TriggerStackTraceSample消息。默认情况下,TaskManager 会触发100次 stack trace 采样,每次间隔 50ms(即一次反压检测至少要等待5秒钟)。并将这 100 次采样的结果返回给 JobManager,由 JobManager 来计算反压比率(反压出现的次数/采样的次数)。 We create a private project called FlinkReliabilityBench. It test every Flink API with four measure data skew and backpressure, latency, throughput and GC. It simulate the actuality source streaming data, automatic set different parameter combination, and then statistic by metric and get the best parameter combination and the worse combination, even the crash case.
  15. extensible
  16. Training model in mini-batch or streaming mode.