SlideShare une entreprise Scribd logo
1  sur  18
Aashish Chaudhary
aashish.chaudhary@kitware.com
Technical Leader
with
Patrick O’Leary,
Dr. Rama Nemani (NASA),
Chris Harris,
Chris Kotfila, Doruk Aztek,
Andrew Michaelis (NASA)
Open-source Scientific
Computing and Data Analytics
using HDF
July 24th 2017
ESIP Summer
What We Do
at Kitware?
Open Source
and Open
Data is
strongly
encouraged
and practiced
at Kitware
It started with VTK
Parallel Processing and Rendering - Paraview
Computer Vision
Images,
Video,
Point
Clouds
Recognition
by Function
Content-
based
Retrieval
Event &
Activity
Recognition
Anomaly
Detection
3D Extraction
and
Compression
Detection
& Tracking
Medical Computing
Quantitative imaging Electronic health records
Vascular analysis
Surgical guidance
And simulation
Digital pathology Orthopedic analysis
Longitudinal and
population shape
analysis
Interactive medical applications
and visualizations
Community Adaptation
HDF at Kitware
Climate Community High Performance Computing
Extensible Data Model and Format
- Developed to exchange
scientific data between HPC
codes and tools
- Heavy data is stored using
HDF5
Network Common
Data Form
(NetCDF)
- Most projects
use NetCDF4
Medical Community Vision Community
Leading-edge
algorithms for
registering and
segmenting
multidimensional data
ACME
The Accelerated Climate Modeling for Energy
(ACME) project is sponsored by the Earth System
Modeling (ESM) program (Biological and
Environmental Research) with eight national
laboratories and six partner institutions to develop
and apply the most complete, leading-edge climate
and Earth system models to challenging and
demanding climate-change research imperatives.
Most commonly used data format - NetCDF4
Data streaming using OpenDAP
Python Interface for most of the tools
OpenNEX
NEX is a platform for scientific
collaboration, knowledge sharing and
research for the Earth science
community
Global Daily Downscaled Projections (NEX-
GDDP, NetCDF4)
MODIS-Land and Atmosphere (HDF)
Web VisualizationData processing
Gaia
Gaia
Web VisualizationData processing
Pure JS?
HDF5 File Organization
Preprocessing Simulation Postprocessing
Possible Improvements
Streaming and Big Data analytics
- Any useful ingestion of HDF data
into cluster requires ETL pipeline
- For some tools, computation cannot
move close to the data, streaming
support is necessary in such cases
- Optimal read/write on cloud storage
Web-Support
- More tools and projects are moving
to support web-enabled data
analysis and visualization
- Pure JS implementation if possible
Summary
● HDF is widely data format for scientific computing, climate/geospatial
visualization, and in other domains at Kitware
● Recently we have started using HDF for information visualization
● We are looking forward to HDF usage on cloud and web-environment
● Kitware is always looking for strong open source collaborations and is
committed to push open-source scientific computing to its next level
Information
Aashish Chaudhary: aashish.chaudhary@kitware.com
LinkedIn: www.linkedin.com/in/aachaudhary
Kitware: http://www.kitware.com
NASA-NEX: https://nex.nasa.gov/nex
Kitware-AIST: https://github.com/OpenGeoscience/nex
HPC Cloud : http://www.kitware.com/publications/item/view/1784
HPCloud Github: https://github.com/Kitware/HPCCloud

Contenu connexe

Tendances

Tendances (20)

HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
Utilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud ComputingUtilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud Computing
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited DimensionHDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Putting some Spark into HDF5
Putting some Spark into HDF5Putting some Spark into HDF5
Putting some Spark into HDF5
 
Indexing HDF5: A Survey
Indexing HDF5: A SurveyIndexing HDF5: A Survey
Indexing HDF5: A Survey
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Data Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from VenusData Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from Venus
 

Similaire à Open-source Scientific Computing and Data Analytics using HDF

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
Bhadra Gowdra
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
IJECEIAES
 

Similaire à Open-source Scientific Computing and Data Analytics using HDF (20)

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
GLENNA: The Nordic cloud
GLENNA: The Nordic cloud GLENNA: The Nordic cloud
GLENNA: The Nordic cloud
 
The Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectThe Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) Project
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
HDF5 and The HDF Group
HDF5 and The HDF GroupHDF5 and The HDF Group
HDF5 and The HDF Group
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Session 33 - Production Grids
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production Grids
 
Data analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data worldData analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data world
 

Plus de The HDF-EOS Tools and Information Center

Plus de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Open-source Scientific Computing and Data Analytics using HDF

  • 1. Aashish Chaudhary aashish.chaudhary@kitware.com Technical Leader with Patrick O’Leary, Dr. Rama Nemani (NASA), Chris Harris, Chris Kotfila, Doruk Aztek, Andrew Michaelis (NASA) Open-source Scientific Computing and Data Analytics using HDF July 24th 2017 ESIP Summer
  • 2. What We Do at Kitware? Open Source and Open Data is strongly encouraged and practiced at Kitware
  • 4. Parallel Processing and Rendering - Paraview
  • 5. Computer Vision Images, Video, Point Clouds Recognition by Function Content- based Retrieval Event & Activity Recognition Anomaly Detection 3D Extraction and Compression Detection & Tracking
  • 6. Medical Computing Quantitative imaging Electronic health records Vascular analysis Surgical guidance And simulation Digital pathology Orthopedic analysis Longitudinal and population shape analysis Interactive medical applications and visualizations
  • 8. HDF at Kitware Climate Community High Performance Computing Extensible Data Model and Format - Developed to exchange scientific data between HPC codes and tools - Heavy data is stored using HDF5 Network Common Data Form (NetCDF) - Most projects use NetCDF4 Medical Community Vision Community Leading-edge algorithms for registering and segmenting multidimensional data
  • 9. ACME The Accelerated Climate Modeling for Energy (ACME) project is sponsored by the Earth System Modeling (ESM) program (Biological and Environmental Research) with eight national laboratories and six partner institutions to develop and apply the most complete, leading-edge climate and Earth system models to challenging and demanding climate-change research imperatives. Most commonly used data format - NetCDF4 Data streaming using OpenDAP Python Interface for most of the tools
  • 10. OpenNEX NEX is a platform for scientific collaboration, knowledge sharing and research for the Earth science community Global Daily Downscaled Projections (NEX- GDDP, NetCDF4) MODIS-Land and Atmosphere (HDF)
  • 15.
  • 16. Possible Improvements Streaming and Big Data analytics - Any useful ingestion of HDF data into cluster requires ETL pipeline - For some tools, computation cannot move close to the data, streaming support is necessary in such cases - Optimal read/write on cloud storage Web-Support - More tools and projects are moving to support web-enabled data analysis and visualization - Pure JS implementation if possible
  • 17. Summary ● HDF is widely data format for scientific computing, climate/geospatial visualization, and in other domains at Kitware ● Recently we have started using HDF for information visualization ● We are looking forward to HDF usage on cloud and web-environment ● Kitware is always looking for strong open source collaborations and is committed to push open-source scientific computing to its next level
  • 18. Information Aashish Chaudhary: aashish.chaudhary@kitware.com LinkedIn: www.linkedin.com/in/aachaudhary Kitware: http://www.kitware.com NASA-NEX: https://nex.nasa.gov/nex Kitware-AIST: https://github.com/OpenGeoscience/nex HPC Cloud : http://www.kitware.com/publications/item/view/1784 HPCloud Github: https://github.com/Kitware/HPCCloud