SlideShare une entreprise Scribd logo
1  sur  36
Unidata’s Common Data Model

John Caron
Unidata/UCAR
Nov 2006
Goals / Overview
• Look at the landscape of scientific
datasets from a few thousand feet up.
• What semantics are needed to make
these useful?
– georeferencing
– specialized subsetting
What’s a Data Model?
• An Abstract Data Model describes data objects
and what methods you can use on them.
• An API is the interface to the Data Model for a
specific programming language
• A file format is a way to persist the objects in
the Data Model.
• An Abstract Data Model removes the details of
any particular API and the persistence format.
Common Data Model Layers
Scientific Datatypes
Point

Trajectory
Radial

Grid

Station
Swath

Coordinate Systems

Data Access

Profile
Application

Scientific Datatypes
Datatype Adapter

NetCDF-Java
version 2.2
architecture

NetcdfDataset
ADDE

CoordSystem Builder
NetcdfFile

THREDDS
I/O service provider
OPeNDAP

Catalog.xml
NcML
NcML

NetCDF-3

NIDS

NetCDF-4

GRIB

HDF5

GINI
Nexrad
…

DMSP
NetCDF-4 and
Common Data Model
(Data Access Layer)
I/O Service Provider
Implementations
•
•
•
•
•
•

General: NetCDF, HDF5, OPeNDAP
Gridded: GRIB-1, GRIB-2
Radar: NEXRAD level 2 and 3, DORADE
Point: BUFR, ASCII
Satellite: DMSP, GINI
In development
– NOAA: GOES (Knapp/Nelson), many others
Coordinate Systems needed
• NetCDF, OPeNDAP, HDF data models do
not have integrated coordinate systems
– so georeferencing not part of API
– Need conventions to specify (eg CF-1,
COARDS, etc)

• Contrast GRIB, HDF-EOS, other
specialized formats
NetCDF Coordinate Variables
dimensions:
lat = 64;
lon = 128;
variables:
float lat(lat);
float lon(lon);
double temperature(lat,lon);
Coordinate Variables
– One-dimension variable with same
name as its dimension
– Strictly monotonic values
– No missing values
The coordinates of a point (i,j,k) is
{CV1(i), CV2(j), CV3(k)}
Limitations of 1D Coordinate Variables
• Non lat/lon horizontal grids:
float temperature(y,x)
float lat(y, x);
float lon(y, x);
• Trajectory data:
float NKoreaRadioactivity(pt);
float lat(pt);
float lon(pt);
float altitude(pt);
float time(pt)
General Coordinates in CF-1.0
float P(y,x);
P:coordinates = “lat lon”;
float lat(y, x);
float lon(y, x);
float Sr90(pt);
Sr90:coordinates
= “lat lon altitude time”;
Coordinate Systems (abstract)
• A Coordinate System for a data variable is
a set of Coordinate Variables2 such that the
coordinates of the (i,j,k) data point is
{CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…}
previous was {CV1(i), CV2(j), CV3(k)}

• The dimensions of each Coordinate
Variable must be a subset of the
dimensions of the data variable.
Need Coordinate Axis Types
float gridData(t,z,y,x);
float time(t);
float y(y);
float x(x);
float lat(y,x);
float lon(y,x);
float height(t,z,y,x);

float radialData(radial, gate)
float distance(gate)
float azimuth(radial)
float elevation(radial)
float time(radial)
The same??
float stationObs(pt);
float lat(pt);
float lon(pt);
float z(pt);
float time(pt);

float trajectory(pt);
float lat(pt);
float lon(pt);
float z(pt);
float time(pt);
Revised Coordinate Systems
1. Specify Coordinate Variables
2. Specify Coordinate Types
(time, lat, lon, projection x, y, height,
pressure, z, radial, azimuth, elevation)

3. Specify connectivity (implicit or
explicit) between data points
– Implicit: Neighbors in index space are
(connected) neighbors in coordinate
space. Allows efficient searching.
Gridded Data
float gridData(t,z,y,x);
float time(t); // Time
float y(y); // GeoX
float x(x); // GeoY
float z(t,z,y,x); // Height or Pressure
• Cartesian

coordinates
• All dimensions are connected

Connected means
Neighbors in index space
are neighbors in
coordinate space
Coordinate Systems UML
Scientific Data Types
• Based on datasets Unidata is familiar with
– APIs are evolving

• How are data points connected?
• Intended to scale to large, multifile
collections
• Intended to support “specialized queries”
– Space, Time

• Corresponding “standard” NetCDF file
conventions
Gridded Data
• Cartesian

coordinates
• All dimensions are connected
• x, y, z, time
• recently added runtime and ensemble
• refactored into GridDatatype interface
float gridData(t,z,y,x);
float time(t);
float y(y);
float x(x);
float lat(y,x);
float lon(y,x);
float height(t,z,y,x);
GridDatatype methods
CoordinateAxis getTaxis();
CoordinateAxis getXaxis();
CoordinateAxis getYaxis();
CoordinateAxis getZaxis();
Projection getProjection();
int[] findXYindexFromCoord( double x_coord,
double y_coord);
LatLonRect getLatLonBoundingBox();
Array getDataSlice (Range[] …)
GridDatatype makeSubset (Range[] …)
Radial Data
• Polar

coordinates
• All dimensions are connected
• Not separate time dimension
radialData(radial, gate) :
distance(gate)
azimuth(radial)
elevation(radial)
time(radial)
Swath
• lat/lon

coordinates
• not separate time dimension
• all dimensions are connected
swathData(line,cell)
lat(line,cell)
lon(line,cell)
time(line)
z(line,cell) ??
Point Observation Data
• Set

of measurements at the
same point in space and time
• Point dimension not connected
float obs1(pt);
float obs2(pt);
float lat(pt);
float lon(pt);
float z(pt);
float time(pt);
Structure {
lat, lon, z, time;
v1, v2, ...
} obs( pt);
PointObsDataset Methods
// Iterator<StructureData>
Iterator getData(
LatLonRect boundingBox,
Date start, Date end);
Time series Station Data
Structure {
name;
lat, lon, z;
Structure{
time;
v1, v2, ...
} obs(*); // connected
} stn(stn); // not connected
StationObs Methods
// List<Station>
List getStations(
LatLonRect boundingBox);
// Iterator<StructureData>
Iterator getData(
Station s,
Date start, Date end);
Trajectory Data
• pt dimension is connected
• Collection dimension not
connected
Structure {
lat, lon, z, time;
v1, v2, ...
} obs(pt); // connected
Structure {
name;
Structure {
lat, lon, z, time;
v1, v2, ...
} obs(*); // connected
} traj(traj) // not connected
Profiler/Sounding Station Data
Structure {
name;
lat, lon, time;
Structure {
z;
v1, v2, ...
} obs(*); // connected
} loc(nloc); // not connected
Structure {
name;
lat, lon;
Structure {
time,
Structure {
z;
v1, v2, ...
} obs(*); // connected
} time(*); // connected
} stn(stn); // not connected
Unstructured Grid
• Pt dimension not connected
• Looks the same as point data
• Need to specify the connectivity
explicitly
float unstructGrid(t,z,pt);
float lat(pt);
float lon(pt);
float time(t);
float height(z);
Data Types Summary
• Data access through a standard API
• Convenient georeferencing
• Specialized subsetting methods
– Efficiency for large datasets
Payoff
N + M instead of N * M things on your TODO List!
File Format
#1

CDM

Visualization
&Analysis

NetCDF file
File Format
#2
OpenDAP Server
File Format
#N

WCS Service

Web Service
THREDDS Data Server
HTTP Tomcat Server

Catalog.xml
THREDDS Server

•OPeNDAP
•HTTPServer
•WCS

NetCDF-Java
library

hostname.edu

Datasets

IDD Data

Application
Next: DataType Aggregation
•
•

Work at the CDM DataType level, know (some)
data semantics
Forecast Model Collection
–
–

•

Combine multiple model forecasts into single
dataset with two time dimensions
With NOAA/IOOS (Steve Hankin)

Point/Station/Trajectory/Profile Data
–
–

Allow space/time queries, return nested sequences
Start from / standardize “Dapper conventions”
Forecast
Model
Collections
Conclusion
• Standardized Data Access in good shape
– HDF5, NetCDF, OPeNDAP
– Write an IOSP for proprietary formats (Java)

• But that’s not good enough!
• To do:
– Standard representations of coordinate
systems
– Classifications of data types, standard
services for them

Contenu connexe

Tendances

My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
Dung Trương
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
sscdotopen
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
Dmitriy Lyubimov
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Qian Lin
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 

Tendances (14)

EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
3.1,2,3 pushdown automata definition, moves &amp; id
3.1,2,3 pushdown automata   definition, moves &amp; id3.1,2,3 pushdown automata   definition, moves &amp; id
3.1,2,3 pushdown automata definition, moves &amp; id
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Bfs algorithm & its application
Bfs algorithm & its applicationBfs algorithm & its application
Bfs algorithm & its application
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 

En vedette

Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology Sharing
The HDF-EOS Tools and Information Center
 

En vedette (13)

Plans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 DataPlans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
 
Data model
Data modelData model
Data model
 
Unidata Overview 3.6.15
Unidata Overview 3.6.15Unidata Overview 3.6.15
Unidata Overview 3.6.15
 
ESIP presentation on DMRC 7.14.15
ESIP presentation on DMRC 7.14.15ESIP presentation on DMRC 7.14.15
ESIP presentation on DMRC 7.14.15
 
Web-based On-demand Global NDVI Data Services
Web-based On-demand Global NDVI Data ServicesWeb-based On-demand Global NDVI Data Services
Web-based On-demand Global NDVI Data Services
 
コードを書きやすくしてくれる Xcode の基本機能 #NSStudy #devsap
コードを書きやすくしてくれる Xcode の基本機能 #NSStudy #devsapコードを書きやすくしてくれる Xcode の基本機能 #NSStudy #devsap
コードを書きやすくしてくれる Xcode の基本機能 #NSStudy #devsap
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology Sharing
 
SIXTH SENSE TECHNOLOGY (PRANAV MISTRY) -WEAR YOUR WORLD!!!
SIXTH SENSE TECHNOLOGY (PRANAV MISTRY) -WEAR YOUR WORLD!!!SIXTH SENSE TECHNOLOGY (PRANAV MISTRY) -WEAR YOUR WORLD!!!
SIXTH SENSE TECHNOLOGY (PRANAV MISTRY) -WEAR YOUR WORLD!!!
 
Sixth Sense Technology
Sixth Sense TechnologySixth Sense Technology
Sixth Sense Technology
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
Trabalho encadernado 1(2017) vigas curvas
Trabalho encadernado 1(2017)   vigas curvasTrabalho encadernado 1(2017)   vigas curvas
Trabalho encadernado 1(2017) vigas curvas
 
Digital Marketing seminar at VRIT
Digital Marketing seminar at VRITDigital Marketing seminar at VRIT
Digital Marketing seminar at VRIT
 
The sixth sense technology complete ppt
The sixth sense technology complete pptThe sixth sense technology complete ppt
The sixth sense technology complete ppt
 

Similaire à Unidata's Common Data Model

Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
Roberto Trasarti
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Kostis Kyzirakos
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
INRIA-OAK
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
Bahul Neel Upadhyaya
 
What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift Awesome
Sokna Ly
 

Similaire à Unidata's Common Data Model (20)

Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Opensource gis development - part 2
Opensource gis development - part 2Opensource gis development - part 2
Opensource gis development - part 2
 
Tale of Two Models
Tale of Two ModelsTale of Two Models
Tale of Two Models
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
R Spatial Analysis using SP
R Spatial Analysis using SPR Spatial Analysis using SP
R Spatial Analysis using SP
 
A Divine Data Comedy
A Divine Data ComedyA Divine Data Comedy
A Divine Data Comedy
 
The STL
The STLThe STL
The STL
 
060128 Galeon Rept
060128 Galeon Rept060128 Galeon Rept
060128 Galeon Rept
 
ST-Toolkit, a Framework for Trajectory Data Warehousing
ST-Toolkit, a Framework for Trajectory Data WarehousingST-Toolkit, a Framework for Trajectory Data Warehousing
ST-Toolkit, a Framework for Trajectory Data Warehousing
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
 
What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift Awesome
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGIS
 
#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols
 

Plus de The HDF-EOS Tools and Information Center

Plus de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Unidata's Common Data Model

  • 1. Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006
  • 2. Goals / Overview • Look at the landscape of scientific datasets from a few thousand feet up. • What semantics are needed to make these useful? – georeferencing – specialized subsetting
  • 3. What’s a Data Model? • An Abstract Data Model describes data objects and what methods you can use on them. • An API is the interface to the Data Model for a specific programming language • A file format is a way to persist the objects in the Data Model. • An Abstract Data Model removes the details of any particular API and the persistence format.
  • 4. Common Data Model Layers Scientific Datatypes Point Trajectory Radial Grid Station Swath Coordinate Systems Data Access Profile
  • 5. Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset ADDE CoordSystem Builder NetcdfFile THREDDS I/O service provider OPeNDAP Catalog.xml NcML NcML NetCDF-3 NIDS NetCDF-4 GRIB HDF5 GINI Nexrad … DMSP
  • 6. NetCDF-4 and Common Data Model (Data Access Layer)
  • 7. I/O Service Provider Implementations • • • • • • General: NetCDF, HDF5, OPeNDAP Gridded: GRIB-1, GRIB-2 Radar: NEXRAD level 2 and 3, DORADE Point: BUFR, ASCII Satellite: DMSP, GINI In development – NOAA: GOES (Knapp/Nelson), many others
  • 8. Coordinate Systems needed • NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems – so georeferencing not part of API – Need conventions to specify (eg CF-1, COARDS, etc) • Contrast GRIB, HDF-EOS, other specialized formats
  • 9. NetCDF Coordinate Variables dimensions: lat = 64; lon = 128; variables: float lat(lat); float lon(lon); double temperature(lat,lon);
  • 10. Coordinate Variables – One-dimension variable with same name as its dimension – Strictly monotonic values – No missing values The coordinates of a point (i,j,k) is {CV1(i), CV2(j), CV3(k)}
  • 11. Limitations of 1D Coordinate Variables • Non lat/lon horizontal grids: float temperature(y,x) float lat(y, x); float lon(y, x); • Trajectory data: float NKoreaRadioactivity(pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)
  • 12. General Coordinates in CF-1.0 float P(y,x); P:coordinates = “lat lon”; float lat(y, x); float lon(y, x); float Sr90(pt); Sr90:coordinates = “lat lon altitude time”;
  • 13. Coordinate Systems (abstract) • A Coordinate System for a data variable is a set of Coordinate Variables2 such that the coordinates of the (i,j,k) data point is {CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…} previous was {CV1(i), CV2(j), CV3(k)} • The dimensions of each Coordinate Variable must be a subset of the dimensions of the data variable.
  • 14. Need Coordinate Axis Types float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x); float radialData(radial, gate) float distance(gate) float azimuth(radial) float elevation(radial) float time(radial)
  • 15. The same?? float stationObs(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); float trajectory(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);
  • 16. Revised Coordinate Systems 1. Specify Coordinate Variables 2. Specify Coordinate Types (time, lat, lon, projection x, y, height, pressure, z, radial, azimuth, elevation) 3. Specify connectivity (implicit or explicit) between data points – Implicit: Neighbors in index space are (connected) neighbors in coordinate space. Allows efficient searching.
  • 17. Gridded Data float gridData(t,z,y,x); float time(t); // Time float y(y); // GeoX float x(x); // GeoY float z(t,z,y,x); // Height or Pressure • Cartesian coordinates • All dimensions are connected Connected means Neighbors in index space are neighbors in coordinate space
  • 19. Scientific Data Types • Based on datasets Unidata is familiar with – APIs are evolving • How are data points connected? • Intended to scale to large, multifile collections • Intended to support “specialized queries” – Space, Time • Corresponding “standard” NetCDF file conventions
  • 20. Gridded Data • Cartesian coordinates • All dimensions are connected • x, y, z, time • recently added runtime and ensemble • refactored into GridDatatype interface float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);
  • 21. GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)
  • 22. Radial Data • Polar coordinates • All dimensions are connected • Not separate time dimension radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)
  • 23. Swath • lat/lon coordinates • not separate time dimension • all dimensions are connected swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??
  • 24. Point Observation Data • Set of measurements at the same point in space and time • Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); Structure { lat, lon, z, time; v1, v2, ... } obs( pt);
  • 25. PointObsDataset Methods // Iterator<StructureData> Iterator getData( LatLonRect boundingBox, Date start, Date end);
  • 26. Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected
  • 27. StationObs Methods // List<Station> List getStations( LatLonRect boundingBox); // Iterator<StructureData> Iterator getData( Station s, Date start, Date end);
  • 28. Trajectory Data • pt dimension is connected • Collection dimension not connected Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected
  • 29. Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected
  • 30. Unstructured Grid • Pt dimension not connected • Looks the same as point data • Need to specify the connectivity explicitly float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);
  • 31. Data Types Summary • Data access through a standard API • Convenient georeferencing • Specialized subsetting methods – Efficiency for large datasets
  • 32. Payoff N + M instead of N * M things on your TODO List! File Format #1 CDM Visualization &Analysis NetCDF file File Format #2 OpenDAP Server File Format #N WCS Service Web Service
  • 33. THREDDS Data Server HTTP Tomcat Server Catalog.xml THREDDS Server •OPeNDAP •HTTPServer •WCS NetCDF-Java library hostname.edu Datasets IDD Data Application
  • 34. Next: DataType Aggregation • • Work at the CDM DataType level, know (some) data semantics Forecast Model Collection – – • Combine multiple model forecasts into single dataset with two time dimensions With NOAA/IOOS (Steve Hankin) Point/Station/Trajectory/Profile Data – – Allow space/time queries, return nested sequences Start from / standardize “Dapper conventions”
  • 36. Conclusion • Standardized Data Access in good shape – HDF5, NetCDF, OPeNDAP – Write an IOSP for proprietary formats (Java) • But that’s not good enough! • To do: – Standard representations of coordinate systems – Classifications of data types, standard services for them

Notes de l'éditeur

  1. Diversity of formats:
  2. Appropriate design decision for General formats
  3. Need more dynamic system for real time and very large datasets. Catalog is a file, but these are services, that is, code. Show IDD Server catalog – show sattellite DQC, then show radar DQC