SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
DATA CENTRIC HPC FOR NUMERICAL WEATHER FORECASTING 
James Faeldon 
Delfin Jay Sabido III 
Karen España 
IBM Philippines, STG Labs
Extreme Weather Events 
•The Philippines is home to devastating typhoons. 
•19 typhoons a year and intense monsoon rains that can cause widespread flooding. 
•Research collaboration by the Philippine Government, University of the Philippines and IBM (2013). 
P 
The strongest typhoons group near the Philippines 
Image courtesy of NOAA 
Typhoon Tracks Eastern Hemisphere 
Before 
After 
Super Typhoon Haiyan (Nov 2013) 
Image courtesy of DigitalGlobe
Coupled Models for Pre-Disaster Planning 
Numerical weather model 
forecasts typhoon track and intensity 
Machine learning model predicts 
affected population and damages 
Optimization model recommends relief supplies pre-positioning and allocation 
Typhoons can be forecasted a few days in advance. 
But we need more reports, better visualization and data exploration tools to reduce analysis cycles and facilitate timely decisions. 
Operations Center
Operational Forecasting Schedule Runs 
Data-Intensive 
Compute-Intensive 
Data-Intensive processes increasingly becoming the 
bottleneck in operational forecasting workflow.
Drivers for Increased Data Processing 
Analytics 
Big Data
Operational Forecasting Data Challenges 
Quality Control 
Sampling 
Verification 
Machine Learning 
Ensemble Forecasts 
Update relief operations plan based on new forecast 
+ 7 historical days 
663 Gb per forecast 
Model Output Statistics 
6-hour 
processing 
and 
analysis 
window 
ETL 
Source 
Qty 
Unit Size 
Total Size 
AWS 
733 
7Kb/day 
5Mb/day 
Satellite 
1 
480Mb/day 
480Mb/day 
Radar 
7 
9Gb/day 
63Gb/day 
Real-time Sensor Data 
Res 
Cells 
Grid Cells 
Total Size 
12km 
5.2 M 
307 x 481 x 35 
81Gb/forecast 
4km 
8.8 M 
619 x 406 x 35 
138Gb/forecast 
Forecast Data
Project Goals 
•Manage and process data arriving in time-sensitive remote sensors and weather forecasts. 
•Reduce data analysis cycles to facilitate timely decisions.
Numerical Weather Model 
Post-Processing 
MapReduce, 
NoSQL Database 
Stream Pre-Processing 
Date Warehouse, OLAP Database 
Weather Sensors 
Observations 
Structured Data 
Data Assimilation 
Forecast Data 
1 
Remote sensor data in various format. 
2 
Quality Control, Interpolation, Sampling, Filtering, Classification 
3 
High Performance Computing 
4 
Store structured and unstructured data for analysis and post- processing 
5 
Business intelligence, data mining, visualization, verification 
6 
Dashboards and Reports 
Automated End-to-End Process 
Decision Support Tool 
Reports
Hardware Infrastructure 
Traditional HPC 
(BlueGene/P) 
Commodity Servers 
(x86) 
Elastic 
Cloud Computing (Virtual Machines) 
In-situ Big Data 
MapReduce 
Real-time Data Processing OLAP Visualization 
Numerical Weather 
Models 
MPP Jobs
Weather Model 
•WRF ARW v3.5 limited area model 
•3.4 hours using 2048 cores BlueGene/P (850Mhz). 
10
Pre-Processing 
•Stream Processing, ETL, R, Python 
•Multi-stage quality control of remote sensor data. 
•Spatio-temporal interpolation and sampling. 
•Star-schema data warehouse. 
•NoSQL with MapReduce. 
NetCDF, 
Image, 
CSV 
Staging Files 
Low-latency 
Stream 
Processing 
ETL 
Custom Scripts 
NoSQL 
Data Warehouse 
BI Cubes 
Observations, 
Forecast Raw 
Data 
Quality Control, Sampling, Filtering 
Structured point or topological data (small <1TB), emphasis on data consistency. 
Gridded high-resolution data (big >1TB), emphasis on availability and scalability. Input to coupled models down the line. 
Data stores for post processing…
Post Processing 
•Business Intelligence Cubes 
•Multi-dimensional analysis 
•Dashboards and reports 
•GIS Integration 
•MapReduce Views (NoSQL) 
•Model Verification 
•Ensemble Forecasts/MOS 
•Ad-Hoc Data Mining 
Multi-Dimensional Cubes 
MapReduce Views 
Reports and Dashboards 
Reports and visualization generated using BI and data visualization tools 
Custom Scripts 
Coupled Models 
Model Output 
Statistics 
Reports and Dashboards 
Down-stream predictive models uses MapReduce views as data source
Current Challenges and Future Directions 
•Improvements in geostatistics: Gridded data to topological features. 
•River basins, flood prone area, political boundaries and other locations of interests 
•Generating statistics makes for very data-intensive processing 
•Potential for parallelization. 
•Efficient stream processing engine of larger tuples with longer sliding windows. 
•Complex quality control and verification requires longer time-series statistics spanning multi-day historical observed and forecasted data. 
•Strategy: can we retain data processing all in-memory, caching, etc.. 
•Efficient MapReduce views on array-based data models and other approaches. 
•Improvements on data warehousing schema. 
•Ongoing improvements for handling spatio-temporal data.
Summary 
•Planning for extreme weather events is a time-critical workflow that involves complex analysis of large data-sets from various sources. 
•Recent advances in Big Data and HPC enables architecture of real-world disaster planning application. 
•Current integration schemes uses intermediary staging files and ETL-like scripts. 
•Better algorithms and techniques are needed to improve performance and integration.
James Faeldon 
jafaeldon@ph.ibm.com 
IBM Philippines, STG Labs

Contenu connexe

Tendances

Computation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterComputation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterAbhishek Sagar
 
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Dawn Wright
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFramesSimeon Fitch
 
Timmons Group ArcGIS Explorer Emergency Operations Solution
Timmons Group ArcGIS Explorer Emergency Operations SolutionTimmons Group ArcGIS Explorer Emergency Operations Solution
Timmons Group ArcGIS Explorer Emergency Operations SolutionTimmons Group
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...Dataconomy Media
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
Large Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopLarge Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopChristoph Körner
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysisSumit Mendiratta
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects listNEWZEN INFOTECH
 
SpaceCurve - Integrating with Hadoop
SpaceCurve - Integrating with HadoopSpaceCurve - Integrating with Hadoop
SpaceCurve - Integrating with HadoopSpacecurve
 
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...Rainer Sternfeld
 
Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...
 Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy... Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...
Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...Databricks
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Rainer Sternfeld
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityJAYAPRAKASH JPINFOTECH
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopNajima Begum
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudThilina Gunarathne
 

Tendances (20)

Computation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterComputation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop Cluster
 
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFrames
 
Timmons Group ArcGIS Explorer Emergency Operations Solution
Timmons Group ArcGIS Explorer Emergency Operations SolutionTimmons Group ArcGIS Explorer Emergency Operations Solution
Timmons Group ArcGIS Explorer Emergency Operations Solution
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Swapnil S resume
Swapnil S resumeSwapnil S resume
Swapnil S resume
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
Large Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopLarge Scale Geo Processing on Hadoop
Large Scale Geo Processing on Hadoop
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysis
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects list
 
SpaceCurve - Integrating with Hadoop
SpaceCurve - Integrating with HadoopSpaceCurve - Integrating with Hadoop
SpaceCurve - Integrating with Hadoop
 
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
 
Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...
 Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy... Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...
Reforming Traditional Machine Learning Algorithms with Spatio-Temporal Analy...
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using Hadoop
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure Cloud
 

Similaire à Data Centric HPC for Numerical Weather Forecasting

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...Istituto nazionale di statistica
 
Renew power - ReLead Case Competition
Renew power - ReLead Case CompetitionRenew power - ReLead Case Competition
Renew power - ReLead Case CompetitionArush Sharma
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big datawebwinkelvakdag
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 

Similaire à Data Centric HPC for Numerical Weather Forecasting (20)

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
 
Renew power - ReLead Case Competition
Renew power - ReLead Case CompetitionRenew power - ReLead Case Competition
Renew power - ReLead Case Competition
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 

Dernier

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Dernier (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Data Centric HPC for Numerical Weather Forecasting

  • 1. DATA CENTRIC HPC FOR NUMERICAL WEATHER FORECASTING James Faeldon Delfin Jay Sabido III Karen España IBM Philippines, STG Labs
  • 2. Extreme Weather Events •The Philippines is home to devastating typhoons. •19 typhoons a year and intense monsoon rains that can cause widespread flooding. •Research collaboration by the Philippine Government, University of the Philippines and IBM (2013). P The strongest typhoons group near the Philippines Image courtesy of NOAA Typhoon Tracks Eastern Hemisphere Before After Super Typhoon Haiyan (Nov 2013) Image courtesy of DigitalGlobe
  • 3. Coupled Models for Pre-Disaster Planning Numerical weather model forecasts typhoon track and intensity Machine learning model predicts affected population and damages Optimization model recommends relief supplies pre-positioning and allocation Typhoons can be forecasted a few days in advance. But we need more reports, better visualization and data exploration tools to reduce analysis cycles and facilitate timely decisions. Operations Center
  • 4. Operational Forecasting Schedule Runs Data-Intensive Compute-Intensive Data-Intensive processes increasingly becoming the bottleneck in operational forecasting workflow.
  • 5. Drivers for Increased Data Processing Analytics Big Data
  • 6. Operational Forecasting Data Challenges Quality Control Sampling Verification Machine Learning Ensemble Forecasts Update relief operations plan based on new forecast + 7 historical days 663 Gb per forecast Model Output Statistics 6-hour processing and analysis window ETL Source Qty Unit Size Total Size AWS 733 7Kb/day 5Mb/day Satellite 1 480Mb/day 480Mb/day Radar 7 9Gb/day 63Gb/day Real-time Sensor Data Res Cells Grid Cells Total Size 12km 5.2 M 307 x 481 x 35 81Gb/forecast 4km 8.8 M 619 x 406 x 35 138Gb/forecast Forecast Data
  • 7. Project Goals •Manage and process data arriving in time-sensitive remote sensors and weather forecasts. •Reduce data analysis cycles to facilitate timely decisions.
  • 8. Numerical Weather Model Post-Processing MapReduce, NoSQL Database Stream Pre-Processing Date Warehouse, OLAP Database Weather Sensors Observations Structured Data Data Assimilation Forecast Data 1 Remote sensor data in various format. 2 Quality Control, Interpolation, Sampling, Filtering, Classification 3 High Performance Computing 4 Store structured and unstructured data for analysis and post- processing 5 Business intelligence, data mining, visualization, verification 6 Dashboards and Reports Automated End-to-End Process Decision Support Tool Reports
  • 9. Hardware Infrastructure Traditional HPC (BlueGene/P) Commodity Servers (x86) Elastic Cloud Computing (Virtual Machines) In-situ Big Data MapReduce Real-time Data Processing OLAP Visualization Numerical Weather Models MPP Jobs
  • 10. Weather Model •WRF ARW v3.5 limited area model •3.4 hours using 2048 cores BlueGene/P (850Mhz). 10
  • 11. Pre-Processing •Stream Processing, ETL, R, Python •Multi-stage quality control of remote sensor data. •Spatio-temporal interpolation and sampling. •Star-schema data warehouse. •NoSQL with MapReduce. NetCDF, Image, CSV Staging Files Low-latency Stream Processing ETL Custom Scripts NoSQL Data Warehouse BI Cubes Observations, Forecast Raw Data Quality Control, Sampling, Filtering Structured point or topological data (small <1TB), emphasis on data consistency. Gridded high-resolution data (big >1TB), emphasis on availability and scalability. Input to coupled models down the line. Data stores for post processing…
  • 12. Post Processing •Business Intelligence Cubes •Multi-dimensional analysis •Dashboards and reports •GIS Integration •MapReduce Views (NoSQL) •Model Verification •Ensemble Forecasts/MOS •Ad-Hoc Data Mining Multi-Dimensional Cubes MapReduce Views Reports and Dashboards Reports and visualization generated using BI and data visualization tools Custom Scripts Coupled Models Model Output Statistics Reports and Dashboards Down-stream predictive models uses MapReduce views as data source
  • 13. Current Challenges and Future Directions •Improvements in geostatistics: Gridded data to topological features. •River basins, flood prone area, political boundaries and other locations of interests •Generating statistics makes for very data-intensive processing •Potential for parallelization. •Efficient stream processing engine of larger tuples with longer sliding windows. •Complex quality control and verification requires longer time-series statistics spanning multi-day historical observed and forecasted data. •Strategy: can we retain data processing all in-memory, caching, etc.. •Efficient MapReduce views on array-based data models and other approaches. •Improvements on data warehousing schema. •Ongoing improvements for handling spatio-temporal data.
  • 14. Summary •Planning for extreme weather events is a time-critical workflow that involves complex analysis of large data-sets from various sources. •Recent advances in Big Data and HPC enables architecture of real-world disaster planning application. •Current integration schemes uses intermediary staging files and ETL-like scripts. •Better algorithms and techniques are needed to improve performance and integration.
  • 15. James Faeldon jafaeldon@ph.ibm.com IBM Philippines, STG Labs