SlideShare a Scribd company logo
1 of 6
Download to read offline
6/3/2017
1
Alerting Real-time Irregular Traffic Patterns in Spark
with Automatic Data-profiling and Machine Learning
Cognitive insight from spatial temporal big data 时空大数据认知与洞察
Smarter Transportation 智慧交通
1
With the growing use of real time data, it is highly desired to alert irregular traffic
patterns to both the public and traffic management administration of the city.
In this session, we present a Spark application that alerts in real-time irregular
events with a predicted event type using a pre-built prediction model for the
transportation network in a large city in China. Real-time traffic data (such as
GPS, RFID, and surveillance video) are fed to Spark via Spark Streaming and
automatically profiled into discrete traffic indicators and patterns per road
section, the result of which are categorized against historic patterns to determine
the irregularity of the incoming pattern based upon the data profile of the road
section. An event type is therefore be predicted for a found irregular event using
a pre-built spatial-temporal prediction algorithm.
概要 Abstract
2
6/3/2017
2
目标 Goals
3
指标体系
KPIs for Congestion
•路段行程车速 Speed
•路段行驶时间 Time
•拥堵点位置 congestion point
•拥堵点首尾位置 positions
•拥堵队列长度 length
•路段拥堵状态 status change
•路段低速里程占比 ratio-speed
•路段低速时间占比 ratio-time
•路段拥堵区间位置 locations
历史规律
Historic Pattern
•拥堵总体趋势变化 congestion
trend
•判定早晚高峰 identify peaks
•拥堵扩散规律描述 disperse
pettern
•每日累计拥堵时长 congested time
•拥堵状态转换间隔 interval
•路段状态频率统计 statistics
•自由流车速统计 free flow speed
•区间拥堵频率 frequency
•堵点改善判断 finding status
change
趋势预测
Traffic Prediction
•未来车速预测 predict speed
•未来拥堵状态预测 predict status
•未来车流量预测 predict
throughput
•指标变化与历史趋势 historic trend
异常报警
Anomaly Alert
•历史车速描述
•准实时检测交通指标(如流量,车速)
KPIs for realtime traffic
•记录并报告异常发生 monitor and
alert traffic anomaly against
historic patterns and predicted
status.
1. Defined a new set of KPI for road congestions and their patterns of disperse
and use data profiling to automate the process
2. Uncovered the historic patterns for formation, peaking, and dispersing of
road congestions and their correlated factors.
3. Predicts traffic status using spatio-temporal prediction algorithms and alert
the traffic administration for proper preparation to ease the upcoming
congestions.
4. Monitors real time traffic events to be compared with historic patterns and
predicted status to determine whether an anomaly occurs and alerts
路径 the approach
4
6/3/2017
3
指标体系 Data profiled KPI for traffic congestion
5
路段指标刻画
Data
profiled KPI
平均速度
Ave speed
低速里程占比
Ratio – low
speed
路段拥堵状态
Status
低速时间占比
Ratio-time
拥堵点位置
Position
拥堵区间长度
Duration
平均行驶时间
Ave time
历史规律-常发拥堵路段 Historic pattern mining – congested roads
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
435 539 848 818 153 680 146 321 236 642 453 229 737 567
拥堵频数占比
路段编号
6/3/2017
4
历史规律-工作日 VS 休息日 Weekdays vs Weekend
7
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0:05
0:45
1:35
2:35
3:20
4:30
5:30
6:20
7:05
7:45
8:25
9:05
9:45
10:25
11:05
11:45
12:25
13:05
13:45
14:25
15:05
15:45
16:25
17:05
17:45
18:25
19:05
19:45
20:25
21:05
21:45
22:40
23:30
拥堵频率占比
Road-737 Road-146
计算每日累计
拥堵时长
路段平
均速度
路段状态频率
统计
工作日、休
息日拥堵时
长相关性检
验
路段平
均速度
工作日、休息日
拥堵存在差异
Road A工作日
work-day
Road A休息日
weekend
Road B工作日
work day
Road B休息日
weekend
HS路
Road
A
Road
B
历史规律-拥堵总体趋势 Historic pattern mining – congestion trends
8
0
10
20
30
40
50
60
70
80
90
每日累计拥堵时长(分钟)
Road-453
0
10
20
30
40
50
60
每日累计拥堵时长(分钟)
Road-539
0
100
200
300
400
500
600
700
800
每日累计拥堵时长(分钟)
Road-737
6/3/2017
5
• 采用15.7.1-16.1.31的数据预测
20151105(周四)的道路平均车速。
• 右图为快速路的一个路段的预测结果。
• 预测过程中还用到了天气预报数据。
YL Blvd大道
预测(中长期) – 路段平均车速Predicted speed (fast roads)
9
交通状态预测 Short term prediction: 5-30 minutes
10
0
5
10
15
20
25
30
35
路段平均速度(km/h)
Road-642 2015/8/8 平均速度
预测速度 平滑速度 原始速度
邻近法补缺失值
Time
<T
均值滤波
计算每一时刻历史速度
(区分休息日)
合并
(速度、历史速度、天气)
STP预测
未来5分钟到
30分钟速度
速度、天
气、时间戳
历史规律
未来1天,3
天,一周
6/3/2017
6
THANKS

More Related Content

Similar to Spark detects real-time traffic anomalies

Smart Mobility
Smart MobilitySmart Mobility
Smart MobilityinLabFIB
 
SC4 Workshop 2: Josep Maria Salanova - Pilot in Thessaloniki
SC4 Workshop 2: Josep Maria Salanova - Pilot in ThessalonikiSC4 Workshop 2: Josep Maria Salanova - Pilot in Thessaloniki
SC4 Workshop 2: Josep Maria Salanova - Pilot in ThessalonikiBigData_Europe
 
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...RoopTeja Muppalla
 
Big data Europe the transport pilot in Thessaloniki - Josep Maria Salanova
Big data Europe the transport pilot in Thessaloniki - Josep Maria SalanovaBig data Europe the transport pilot in Thessaloniki - Josep Maria Salanova
Big data Europe the transport pilot in Thessaloniki - Josep Maria SalanovaBigData_Europe
 
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptxSanjayLove1
 
IRJET- Online Failure Prediction for Railway Transportation System based ...
IRJET-  	  Online Failure Prediction for Railway Transportation System based ...IRJET-  	  Online Failure Prediction for Railway Transportation System based ...
IRJET- Online Failure Prediction for Railway Transportation System based ...IRJET Journal
 
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...IRJET Journal
 
Trajectory improves data delivery in urban vehicular networks
Trajectory improves data delivery in urban vehicular networks Trajectory improves data delivery in urban vehicular networks
Trajectory improves data delivery in urban vehicular networks Papitha Velumani
 
IRJET- Prediction of Cab Demand using Machine Learning
IRJET- Prediction of Cab Demand using Machine LearningIRJET- Prediction of Cab Demand using Machine Learning
IRJET- Prediction of Cab Demand using Machine LearningIRJET Journal
 
Real time vehicle counting in complex scene for traffic flow estimation using...
Real time vehicle counting in complex scene for traffic flow estimation using...Real time vehicle counting in complex scene for traffic flow estimation using...
Real time vehicle counting in complex scene for traffic flow estimation using...Journal Papers
 
TrafficCameraDangerousDriverDetectionV5InfoPage
TrafficCameraDangerousDriverDetectionV5InfoPageTrafficCameraDangerousDriverDetectionV5InfoPage
TrafficCameraDangerousDriverDetectionV5InfoPageVidur Prasad
 
Spatio-Temporal Data Analysis using Deep Learning
Spatio-Temporal Data Analysis using Deep LearningSpatio-Temporal Data Analysis using Deep Learning
Spatio-Temporal Data Analysis using Deep LearningIRJET Journal
 
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCV
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCVVEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCV
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCVIRJET Journal
 
Deep graph convolutional networks for incident driven traffic speed prediction
Deep graph convolutional networks for incident driven traffic speed predictionDeep graph convolutional networks for incident driven traffic speed prediction
Deep graph convolutional networks for incident driven traffic speed predictionivaderivader
 
KA6423 P57600 Assignment 3
KA6423 P57600 Assignment 3KA6423 P57600 Assignment 3
KA6423 P57600 Assignment 3armada7000
 
TAXI DEMAND PREDICTION IN REAL TIME
TAXI DEMAND PREDICTION IN REAL TIMETAXI DEMAND PREDICTION IN REAL TIME
TAXI DEMAND PREDICTION IN REAL TIMEIRJET Journal
 
Big data and public transport
Big data and public transportBig data and public transport
Big data and public transportTristan Wiggill
 
Smart Traffic Managment System Approaches.pptx
Smart Traffic Managment System Approaches.pptxSmart Traffic Managment System Approaches.pptx
Smart Traffic Managment System Approaches.pptxReetBezboruah
 
Traffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNsTraffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNsIRJET Journal
 

Similar to Spark detects real-time traffic anomalies (20)

Smart Mobility
Smart MobilitySmart Mobility
Smart Mobility
 
SC4 Workshop 2: Josep Maria Salanova - Pilot in Thessaloniki
SC4 Workshop 2: Josep Maria Salanova - Pilot in ThessalonikiSC4 Workshop 2: Josep Maria Salanova - Pilot in Thessaloniki
SC4 Workshop 2: Josep Maria Salanova - Pilot in Thessaloniki
 
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...
A Knowledge Graph Framework for Detecting Traffic Events Using Stationary Cam...
 
Big data Europe the transport pilot in Thessaloniki - Josep Maria Salanova
Big data Europe the transport pilot in Thessaloniki - Josep Maria SalanovaBig data Europe the transport pilot in Thessaloniki - Josep Maria Salanova
Big data Europe the transport pilot in Thessaloniki - Josep Maria Salanova
 
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx
0TH-AN INTELLIGENT TRAFFIC LIGHT CONTROL SYSTEM USING CNN.pptx
 
IRJET- Online Failure Prediction for Railway Transportation System based ...
IRJET-  	  Online Failure Prediction for Railway Transportation System based ...IRJET-  	  Online Failure Prediction for Railway Transportation System based ...
IRJET- Online Failure Prediction for Railway Transportation System based ...
 
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
 
Trajectory improves data delivery in urban vehicular networks
Trajectory improves data delivery in urban vehicular networks Trajectory improves data delivery in urban vehicular networks
Trajectory improves data delivery in urban vehicular networks
 
IRJET- Prediction of Cab Demand using Machine Learning
IRJET- Prediction of Cab Demand using Machine LearningIRJET- Prediction of Cab Demand using Machine Learning
IRJET- Prediction of Cab Demand using Machine Learning
 
Real time vehicle counting in complex scene for traffic flow estimation using...
Real time vehicle counting in complex scene for traffic flow estimation using...Real time vehicle counting in complex scene for traffic flow estimation using...
Real time vehicle counting in complex scene for traffic flow estimation using...
 
TrafficCameraDangerousDriverDetectionV5InfoPage
TrafficCameraDangerousDriverDetectionV5InfoPageTrafficCameraDangerousDriverDetectionV5InfoPage
TrafficCameraDangerousDriverDetectionV5InfoPage
 
Spatio-Temporal Data Analysis using Deep Learning
Spatio-Temporal Data Analysis using Deep LearningSpatio-Temporal Data Analysis using Deep Learning
Spatio-Temporal Data Analysis using Deep Learning
 
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCV
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCVVEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCV
VEHICLES AND TOURIST FREQUENCY TRACKING USING OPENCV
 
proceedings of PSG NCIICT
proceedings of PSG NCIICTproceedings of PSG NCIICT
proceedings of PSG NCIICT
 
Deep graph convolutional networks for incident driven traffic speed prediction
Deep graph convolutional networks for incident driven traffic speed predictionDeep graph convolutional networks for incident driven traffic speed prediction
Deep graph convolutional networks for incident driven traffic speed prediction
 
KA6423 P57600 Assignment 3
KA6423 P57600 Assignment 3KA6423 P57600 Assignment 3
KA6423 P57600 Assignment 3
 
TAXI DEMAND PREDICTION IN REAL TIME
TAXI DEMAND PREDICTION IN REAL TIMETAXI DEMAND PREDICTION IN REAL TIME
TAXI DEMAND PREDICTION IN REAL TIME
 
Big data and public transport
Big data and public transportBig data and public transport
Big data and public transport
 
Smart Traffic Managment System Approaches.pptx
Smart Traffic Managment System Approaches.pptxSmart Traffic Managment System Approaches.pptx
Smart Traffic Managment System Approaches.pptx
 
Traffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNsTraffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNs
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Spark detects real-time traffic anomalies

  • 1. 6/3/2017 1 Alerting Real-time Irregular Traffic Patterns in Spark with Automatic Data-profiling and Machine Learning Cognitive insight from spatial temporal big data 时空大数据认知与洞察 Smarter Transportation 智慧交通 1 With the growing use of real time data, it is highly desired to alert irregular traffic patterns to both the public and traffic management administration of the city. In this session, we present a Spark application that alerts in real-time irregular events with a predicted event type using a pre-built prediction model for the transportation network in a large city in China. Real-time traffic data (such as GPS, RFID, and surveillance video) are fed to Spark via Spark Streaming and automatically profiled into discrete traffic indicators and patterns per road section, the result of which are categorized against historic patterns to determine the irregularity of the incoming pattern based upon the data profile of the road section. An event type is therefore be predicted for a found irregular event using a pre-built spatial-temporal prediction algorithm. 概要 Abstract 2
  • 2. 6/3/2017 2 目标 Goals 3 指标体系 KPIs for Congestion •路段行程车速 Speed •路段行驶时间 Time •拥堵点位置 congestion point •拥堵点首尾位置 positions •拥堵队列长度 length •路段拥堵状态 status change •路段低速里程占比 ratio-speed •路段低速时间占比 ratio-time •路段拥堵区间位置 locations 历史规律 Historic Pattern •拥堵总体趋势变化 congestion trend •判定早晚高峰 identify peaks •拥堵扩散规律描述 disperse pettern •每日累计拥堵时长 congested time •拥堵状态转换间隔 interval •路段状态频率统计 statistics •自由流车速统计 free flow speed •区间拥堵频率 frequency •堵点改善判断 finding status change 趋势预测 Traffic Prediction •未来车速预测 predict speed •未来拥堵状态预测 predict status •未来车流量预测 predict throughput •指标变化与历史趋势 historic trend 异常报警 Anomaly Alert •历史车速描述 •准实时检测交通指标(如流量,车速) KPIs for realtime traffic •记录并报告异常发生 monitor and alert traffic anomaly against historic patterns and predicted status. 1. Defined a new set of KPI for road congestions and their patterns of disperse and use data profiling to automate the process 2. Uncovered the historic patterns for formation, peaking, and dispersing of road congestions and their correlated factors. 3. Predicts traffic status using spatio-temporal prediction algorithms and alert the traffic administration for proper preparation to ease the upcoming congestions. 4. Monitors real time traffic events to be compared with historic patterns and predicted status to determine whether an anomaly occurs and alerts 路径 the approach 4
  • 3. 6/3/2017 3 指标体系 Data profiled KPI for traffic congestion 5 路段指标刻画 Data profiled KPI 平均速度 Ave speed 低速里程占比 Ratio – low speed 路段拥堵状态 Status 低速时间占比 Ratio-time 拥堵点位置 Position 拥堵区间长度 Duration 平均行驶时间 Ave time 历史规律-常发拥堵路段 Historic pattern mining – congested roads 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 435 539 848 818 153 680 146 321 236 642 453 229 737 567 拥堵频数占比 路段编号
  • 4. 6/3/2017 4 历史规律-工作日 VS 休息日 Weekdays vs Weekend 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0:05 0:45 1:35 2:35 3:20 4:30 5:30 6:20 7:05 7:45 8:25 9:05 9:45 10:25 11:05 11:45 12:25 13:05 13:45 14:25 15:05 15:45 16:25 17:05 17:45 18:25 19:05 19:45 20:25 21:05 21:45 22:40 23:30 拥堵频率占比 Road-737 Road-146 计算每日累计 拥堵时长 路段平 均速度 路段状态频率 统计 工作日、休 息日拥堵时 长相关性检 验 路段平 均速度 工作日、休息日 拥堵存在差异 Road A工作日 work-day Road A休息日 weekend Road B工作日 work day Road B休息日 weekend HS路 Road A Road B 历史规律-拥堵总体趋势 Historic pattern mining – congestion trends 8 0 10 20 30 40 50 60 70 80 90 每日累计拥堵时长(分钟) Road-453 0 10 20 30 40 50 60 每日累计拥堵时长(分钟) Road-539 0 100 200 300 400 500 600 700 800 每日累计拥堵时长(分钟) Road-737
  • 5. 6/3/2017 5 • 采用15.7.1-16.1.31的数据预测 20151105(周四)的道路平均车速。 • 右图为快速路的一个路段的预测结果。 • 预测过程中还用到了天气预报数据。 YL Blvd大道 预测(中长期) – 路段平均车速Predicted speed (fast roads) 9 交通状态预测 Short term prediction: 5-30 minutes 10 0 5 10 15 20 25 30 35 路段平均速度(km/h) Road-642 2015/8/8 平均速度 预测速度 平滑速度 原始速度 邻近法补缺失值 Time <T 均值滤波 计算每一时刻历史速度 (区分休息日) 合并 (速度、历史速度、天气) STP预测 未来5分钟到 30分钟速度 速度、天 气、时间戳 历史规律 未来1天,3 天,一周