STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
Towards a More Efficient Paradigm of Storing and Querying Spatial Data on the Semantic Web
1. AGO (Awesemantic-Geo): Towards a More Efficient Paradigm of
Storing and Querying Spatial Data on the Semantic Web
ESIP SemTech Telecon - 2019-05-28
Blake Regalia1
2019/05/28
1STKO Lab, University of California, Santa Barbara, USA
blake.regalia@gmail.com
2. On the Web of Linked Data:
Geometry data beyond simple points and bounding boxes (e.g., multi/polylines,
multi/polygons, collections, etc.) should not always be encoded using RDF
Literals for several reasons including readability, performance, and reusability.
Instead, geometric representations of features should be referenced using RDF
Named Nodes (i.e., IRIs – specifically, URLs) that support content-negotiation
for the raw geometry data.
Geometric attributes and topological relations should be precomputed and
materialized as RDF in the triplestore alongside the source data.
blake.regalia@gmail.com
3. The W3C Basic Geo Vocabulary, circa 2003, set out to:
“[explore] the possibilities of representing mapping/location data in RDF, and does not
attempt to address many of the issues covered in the professional GIS world”
Established a simple, minimalistic vocabulary for describing points with a latitude
and longitude value using the WGS84 reference datum.
Considered good enough for the basic needs of many!
blake.regalia@gmail.com
4. Resource about London Heathrow Airport using W3C Basic Geo in 2003-01-10.
blake.regalia@gmail.com
5. Needs of geospatial data not met by W3C Basic Geo:
• Support for Coordinate Reference Systems (CRS)
• Geometries beyond Point such as polylines, polygons, etc.
• Separation of entity and geometry, which have cardinality of 1-to-many
• Merging of geometries (would result in cross product of lat/lngs)
blake.regalia@gmail.com
6. ‘NeoGeo’ drafted a vocabulary that extended W3C Basic Geo to support more
geometry types. It encoded a geometry’s entire structure in RDF:
blake.regalia@gmail.com
8. 1 dbr:Perth rdfs:label "Perth"@en ;
2 geosparql:hasGeometry [
3 geosparql:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(115.85888671875
31.952222824097)"^^geosparql:wktLiteral
→
→
4 ] ;
GeoSPARQL addressed all the issues because it:
• separated places from their geometric representations via a blank node
• joined CRS, latitude and longitude values (in fact the entire geometry) into a
single RDF literal.
blake.regalia@gmail.com
9. Geometries from the Web are dirty, especially given the heterogenous nature of Linked Data.
This makes querying topology not so straightforward as GeoSPARQL intends:
blake.regalia@gmail.com
10. Computing topology on demand (e.g., using GeoSPARQL functions) can be very
expensive given complex geometries:
blake.regalia@gmail.com
12. The National Map as Linked Data
Convert National Map datasets (e.g., GNIS, NHD, Transportation data, and so
on) to Linked Data.
DLG data consists of many high-resolution geometries; multi/polylines and
multi/polygons.
Starting with the NHD alone, consists of more than 100 GB of geometry data in
binary format.
Complex geometries pose a challenge to computing spatial queries on-demand
(e.g., querying topology across all feature)
blake.regalia@gmail.com
13. Figure 2 National Hydrology Dataset (NHD) coverage for California
blake.regalia@gmail.com
14. In addition to storing the raw textual data of an RDF Literal (including all
whitespaces, order of coordinates, etc.), GeoSPARQL-enabled triplestores will
also perform spatial indexing, often creating a copy of the geometries in a
binary format.
The National Hydrology Dataset (NHD) geometries for California take up 4.0 GB in
binary format. To be GeoSPARQL compatible, this dataset alone requires 10.1
GB, an approx 2.5x increase in storage requirement.
blake.regalia@gmail.com
15. GeoSPARQL explicitly allows geometry data to be stored in RDF Literals as either
WKT or GML (two human-readable formats).
However, beyond simple points and bounding boxes, these geometries are not
human-readable:
blake.regalia@gmail.com
16. Shortcomings of using RDF Literals to store geometry data:
• Geometries have interesting attributes that are static and easily
precomputed, but RDF Literals cannot store properties (i.e., they are leaf
nodes)
• RDF Literals are not linkable across the Web of Linked Data, making them
less reusable
• Client applications are able to make use of binary formats but are forced
to cope with less efficient formats required by GeoSPARQL
• Server-side triplestores that support GeoSPARQL will usually store a
redundant copy of the geometry in a binary format anyway for spatial
operations
blake.regalia@gmail.com
17. Raw Geometry in RDF
Storing human-readable serializations for geometry:
• Because of duplication and overhead, consumes approximately 2.5 times the amount
of storage space as binary
• Does little for transparency since long strings of coordinates are not human-readable
• As a data format, does not facilitate spatial querying; systems rely on copies of
geometry in binary formats
• Are not optimal for storage/transmission (e.g., web clients)
Furthermore, using RDF Literals:
• Obstructs the ability to include metadata about the geometry itself (cannot assign
properties to RDF Literals)
• Prevents the reuse of geometries within a dataset (e.g., cannot create composite
geometries)
• Limits the interlinking of geometries across the Web of Linked Data (Bob cannot
simply reference geometries in Alice’s dataset)
blake.regalia@gmail.com
19. 1 osm:relation_2316598
2 rdfs:label "Western Australia"@en ;
3 geosparql:hasGeometry [
4 geosparql:asWKT
‘<http://www.opengis.net/def/crs/EPSG/0/4326>
POLYGON((128.9999986 -14.4290140, 128.9999714
-14.8798443, ...)) geosparql:wktLiteral
→
→
→
5 ] ;
6 # instead... get rid of blank node and use URI
7 ago:hasGeometry ex:WesternAustraliaPolygon ;
blake.regalia@gmail.com
20. Our approach (nicknamed AGO) is to:
• Eliminate the need to store human-readable representations of geometries
beyond simple points and bounding boxes
• Require each geometry is represented using an IRI (i.e., an RDF Named
Node)
• Those IRIs should be URLs that support content-negotiation to dereference
a geometry for either (a) it’s RDF metadata or (b) the raw geometry data in a
format requested by the client and supported by the server.
blake.regalia@gmail.com
21. Additionally, we recommend:
• Precompute (static) geometric attributes and materialize them for each
geometry (area, perimeter, centroid, etc.)
• Leverage geometry data to aid in precomputing topological relations
between features (using OWA) and materialize them on the features
blake.regalia@gmail.com
22. Attributes of a geometry are easily accessible:
blake.regalia@gmail.com
23. What a query looks like using AGO:
blake.regalia@gmail.com
24. Compatibility:
• Allow geometry data to persist in a spatially-indexed geodatabase,
accessible to triplestore for on-demand spatial querying (e.g., for
computing distances, unions, intersections, etc.) and accessible to client by
virtue of dereferencing server
• Remains compatible with GeoSPARQL query functions in practice!
blake.regalia@gmail.com
29. Content-Negotiation for Geometry Data
Dereference the IRI of a geometry to fetch its data in a variety of formats.
curl "https://usgs.link/geometry/feature?id=42" -H "Accept:$MIME_TYPE"
MIME Type Description Returns
text/html Web interface <!DOCTYPE html><html lang="en">...
text/plain Well-Known Text POLYGON((113.1016 -38.062 ...))
application/gml+xml GML <gml:Polygon><gml:Exterior>...
application/vnd.geo+json GeoJSON {"type":"Polygon","coordinates":...}
application/octet-stream Well-Known Binary 01 06 00 00 20 E6 10 00 00 01...
Client web applications may choose a format that streamlines downloading and rendering
geometries on a map.
blake.regalia@gmail.com
30. In summary, a comparison of strategies to storing and using geometry:
Trait GeoSPARQL NeoGeo AGO
Efficient geometry storage
Geometry can persist externally 1
Content-negotiation for geometry format
Uniform RDF structure
Composite geometries
Determine geometry type 2
Access bounding box 2
Access raw geometry 2
1 = Geometry can persist in a local geodatabase or even on a remote system and without copies.
2 = From the triples’ RDF data alone (e.g., without using SPARQL).
blake.regalia@gmail.com
31. Publications
Regalia, B., Janowicz, K., and McKenzie, G. (2019) Computing and Querying
Strict, Approximate, and Metrically-Refined Topological Relations in Linked
Geographic Data. Transactions in GIS, 2019
Regalia, B., Janowicz, K., Mai, G., Varanka, D., and Usery, L. E.. (2018)
GNIS-LD: Serving and Visualizing the Geographic Names Information
System As Linked Data. ESWC 2018
Regalia, B., Janowicz, K., and McKenzie, G. (2017) Revisiting the
Representation of and Need for Raw Geometries on the Linked Data Web.
LDOW@ WWW 2017
Regalia, B., Janowicz, K. and Gao, S. (2016) VOLT: A Provenance Producing,
Transparent SPARQL Proxy for the On-Demand Computation of Linked Data
and its Applications to Spatiotemporally Dependent Data. ESWC 2016
blake.regalia@gmail.com