SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Harvard Hypermap:
An Open Source Framework for Making the World's
Geospatial Information more Accessible
http://worldmap.harvard.edu
Benjamin Lewis, Paolo Corti, Wendy Guan
American Association of Geographers
Boston, Massachusetts
April, 2017
Where are the geo-data?
Web Map Services
• A powerful, modern way to provide geo-data
via the Internet
• Distributed across thousands of servers
globally
• No central directory
• Tremendous variety in publishing service
formats (WMS, WFS, Esri REST, KML, etc.)
• Difficult to find, and difficult to use
WorldMap
OpenLayers,
Leaflet
Esri clients
Any map client
Distributed
Map
Services
Service
Crawler
(common
crawl)
Uptime
Checker
Map
Service
Registry
Service
Caching
Re-
projection
Service
API
User
submitted
services
Time
Miner
Solution: build a registry
Harvard HHypermap Highlights
• Supports comprehensive search
– Visualize distribution of results
– Support space and time
– Fast
• Facilitates sharing between registries
• Accepts user contributions
• Improves over time
• Software is open source
Spatial Data Infrastructure (SDI)
with Modern Search
Apache Lucene: a search engine library
ElasticSearchSolr
Spatial
facet
Temporal
facet
Topic
cloud
Result
highlight
Query
spellcheck
Relevance
tuning...
Implementation in WorldMap
HHypermap as WorldMap’s search engine
• Provides access to about 100,000 layers
• 25,000 are from WorldMap itself, the rest are
on remote servers
• Accesses 13,000 remote map services
• A fraction of existing online map services
• More are being added
Current search on WorldMap
• Supports faceted search by
– Geographic space (heatmap for any number of
results)
– Time (temporal histogram)
– Subject matter (topic clouds)
– Source (data owner, publisher)
• and potentially other dimensions
Video demo (3 minutes): https://vimeo.com/164167343 (narrated by Ben Lewis)
HHypermap Architecture
Built on open source
software:
• Celery
• RabbitMQ
• Django
• Lucene
– Solr
– Elasticsearch
• MapProxy
• Memcached
• OWSLib
• PostgreSQL
• PostGIS
• pycsw
Automated Gathering of Map Service
Endpoints to Harvest
• Search web for signatures using the Common
Crawl (CC) archive
• Store as compressed Web Archive (WARC)
formatted files on Amazon S3.
• Employ multiple machines to process the data in
parallel on Amazon EC2
• Use Hadoop/YARN framework, execute
Map/Reduce functions to aggregate information
about URLs to spatial assets
• Collect URLs for later processing and harvesting
Signatures for Hadoop search
• OGC Services
– Look for "?request getcapabilities" and not “test” in the href
URL
• ESRI Rest Services
– Look for “/arcgis/rest/services” in the target-DOMAIN-URI of
the WARC Response Header text
• KML or KMZ files
– Look for an href URL ending in .kml or .kmz files
• Compressed shapefiles
– Look for “shape” or “shp” and string ending with “.zip” in the
href URL
• Tile Servers
– Look for “tile” or “tiles” and string ending with “.png” in the href
URL
User Submit Service Endpoint to Harvest
Time Miner – to enrich metadata
• Temporal metadata for geospatial datasets is often
weak.
• In a crowd-sourced data repository, data creators and
contributors often do not create detailed metadata.
• Many data sets have temporal properties, but time is
often ambiguously defined, mentioned as
unstructured text in the title, abstract, and elsewhere.
• Time is often not referred to using a standard
date/time format such as ISO 8601, but as descriptive
text.
Time Miner Logic Implementation
1. Look for date in the date range (lower) section of the
metadata and choose the earlier date. (Date: from
Metadata)
2. If there is no #1 above, look in top date in metadata but
only use it if it is 2010 or earlier. (Date: from Metadata)
3. If there is no #2 above, look for 4 digit numbers in title
first, then abstract, which are less than or equal to 2016
(present year) (Date: Detected)
4. If there IS a date in #3 above, check to see whether there
is a CE or AD or BCE or BC after it and apply math
accordingly (Date: Detected)
5. If there is no #3 above, look for 1, 2, or 3 digit numbers
with associated CE, AD, BCE, BC, and apply math
accordingly (Date: Detected)
Time Miner: in addition to dates,
recognize some periods
• ca. 2100-1600 BCE Xia, Hsia
• ca. 1600-1050 BCE Shang
• ca. 1046-256 BCE Zhou, Chou
• 221-206 BCE Qin, Ch'in
• 206 BCE-220 CE Han
• 581-618 CE Sui
• 618-906 Tang, T'ang
• 960-1279 Song, Sung
• 1279-1368 Yuan
• 1368-1644 Ming
• 1644-1912 Qing, Ch'ing
Source: http://afe.easia.columbia.edu/timelines/china_timeline.htm
Each layer monitored for uptime
Scalable to support virtually
any number of datasets
Another project the “Billion Object Platform”
(BOP) demonstrates how robust this platform is!
http://terranodo.io/angular-search/#/search
Some examples of catalogues
that could be brought together
• ArcGIS Open Data – Esri collection of 44,000 open datasets and growing.
• Geodata.gov, Geoplatform.gov – The U.S. Federal government has built a data
sharing platform for U.S. data using CKAN software and ArcGIS Online.
• INSPIRE Geoportal – Spatial data portal of the European Commission.
• GEOSS registry – Group on Earth Observations registry of 850 map service
collections.
• Geopole.org – CSW catalogue service providing access to 400,000 layers.
• Geoblacklight - Platform developed by Stanford and other universities to provide
fast search access to geospatial library holdings.
• OpenGeoPortal – Platform developed by Tufts and other universities to provide
fast search access to geospatial library holdings.
• Geonetwork – Geospatial catalogue maintained by the Food and Agriculture
Organization of the United Nations.
• Spatineo.com – Commercial service which is currently monitoring 40,000 web
services containing 899,000 layers.
• New York Public Library Collection
• David Rumsey Collection
• Many CKAN portals
• Many Thredds servers
Help us improve the system
• Try it out. If you can’t find services you know are out
there, submit them to us and we will add them.
• If you would like to harvest metadata from other
systems, write a connector. We will provide guidance.
• If you would like to bring such search into other
applications besides WorldMap, write a client. We will
help.
• If you would like to set up your own HHypermap
registry instance, the code is available.
• If you have other features you would like, let us know.
Harvard Hypermap:
An Open Source Framework for Making the World's
Geospatial Information more Accessible
http://worldmap.harvard.edu
Benjamin Lewis, Paolo Corti, Wendy Guan

Contenu connexe

Similaire à Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial Information more Accessible

WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 
Integrating Data for Archaeology
Integrating Data for ArchaeologyIntegrating Data for Archaeology
Integrating Data for Archaeologyariadnenetwork
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersStanislav Ronzhin
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIslibrarywebchic
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semanticsplan4all
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebVictor de Boer
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Amazon Web Services
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 
ARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityariadnenetwork
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructurePaolo Corti
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 

Similaire à Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial Information more Accessible (20)

WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Integrating Data for Archaeology
Integrating Data for ArchaeologyIntegrating Data for Archaeology
Integrating Data for Archaeology
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegisters
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIs
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Data-Driven Civic Innovation
Data-Driven Civic InnovationData-Driven Civic Innovation
Data-Driven Civic Innovation
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
ARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperability
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data Infrastructure
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 

Plus de Paolo Corti

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019Paolo Corti
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesPaolo Corti
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016Paolo Corti
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitariePaolo Corti
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Paolo Corti
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Paolo Corti
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionPaolo Corti
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...Paolo Corti
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Paolo Corti
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Paolo Corti
 

Plus de Paolo Corti (12)

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze Umanitarie
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk Reduction
 
Geonode 2.0
Geonode 2.0Geonode 2.0
Geonode 2.0
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1
 

Dernier

Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 

Dernier (20)

Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 

Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial Information more Accessible

  • 1. Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible http://worldmap.harvard.edu Benjamin Lewis, Paolo Corti, Wendy Guan American Association of Geographers Boston, Massachusetts April, 2017
  • 2. Where are the geo-data?
  • 3. Web Map Services • A powerful, modern way to provide geo-data via the Internet • Distributed across thousands of servers globally • No central directory • Tremendous variety in publishing service formats (WMS, WFS, Esri REST, KML, etc.) • Difficult to find, and difficult to use
  • 4. WorldMap OpenLayers, Leaflet Esri clients Any map client Distributed Map Services Service Crawler (common crawl) Uptime Checker Map Service Registry Service Caching Re- projection Service API User submitted services Time Miner Solution: build a registry
  • 5. Harvard HHypermap Highlights • Supports comprehensive search – Visualize distribution of results – Support space and time – Fast • Facilitates sharing between registries • Accepts user contributions • Improves over time • Software is open source
  • 6. Spatial Data Infrastructure (SDI) with Modern Search Apache Lucene: a search engine library ElasticSearchSolr Spatial facet Temporal facet Topic cloud Result highlight Query spellcheck Relevance tuning...
  • 8. HHypermap as WorldMap’s search engine • Provides access to about 100,000 layers • 25,000 are from WorldMap itself, the rest are on remote servers • Accesses 13,000 remote map services • A fraction of existing online map services • More are being added
  • 9. Current search on WorldMap • Supports faceted search by – Geographic space (heatmap for any number of results) – Time (temporal histogram) – Subject matter (topic clouds) – Source (data owner, publisher) • and potentially other dimensions
  • 10. Video demo (3 minutes): https://vimeo.com/164167343 (narrated by Ben Lewis)
  • 11. HHypermap Architecture Built on open source software: • Celery • RabbitMQ • Django • Lucene – Solr – Elasticsearch • MapProxy • Memcached • OWSLib • PostgreSQL • PostGIS • pycsw
  • 12. Automated Gathering of Map Service Endpoints to Harvest • Search web for signatures using the Common Crawl (CC) archive • Store as compressed Web Archive (WARC) formatted files on Amazon S3. • Employ multiple machines to process the data in parallel on Amazon EC2 • Use Hadoop/YARN framework, execute Map/Reduce functions to aggregate information about URLs to spatial assets • Collect URLs for later processing and harvesting
  • 13. Signatures for Hadoop search • OGC Services – Look for "?request getcapabilities" and not “test” in the href URL • ESRI Rest Services – Look for “/arcgis/rest/services” in the target-DOMAIN-URI of the WARC Response Header text • KML or KMZ files – Look for an href URL ending in .kml or .kmz files • Compressed shapefiles – Look for “shape” or “shp” and string ending with “.zip” in the href URL • Tile Servers – Look for “tile” or “tiles” and string ending with “.png” in the href URL
  • 14. User Submit Service Endpoint to Harvest
  • 15. Time Miner – to enrich metadata • Temporal metadata for geospatial datasets is often weak. • In a crowd-sourced data repository, data creators and contributors often do not create detailed metadata. • Many data sets have temporal properties, but time is often ambiguously defined, mentioned as unstructured text in the title, abstract, and elsewhere. • Time is often not referred to using a standard date/time format such as ISO 8601, but as descriptive text.
  • 16. Time Miner Logic Implementation 1. Look for date in the date range (lower) section of the metadata and choose the earlier date. (Date: from Metadata) 2. If there is no #1 above, look in top date in metadata but only use it if it is 2010 or earlier. (Date: from Metadata) 3. If there is no #2 above, look for 4 digit numbers in title first, then abstract, which are less than or equal to 2016 (present year) (Date: Detected) 4. If there IS a date in #3 above, check to see whether there is a CE or AD or BCE or BC after it and apply math accordingly (Date: Detected) 5. If there is no #3 above, look for 1, 2, or 3 digit numbers with associated CE, AD, BCE, BC, and apply math accordingly (Date: Detected)
  • 17. Time Miner: in addition to dates, recognize some periods • ca. 2100-1600 BCE Xia, Hsia • ca. 1600-1050 BCE Shang • ca. 1046-256 BCE Zhou, Chou • 221-206 BCE Qin, Ch'in • 206 BCE-220 CE Han • 581-618 CE Sui • 618-906 Tang, T'ang • 960-1279 Song, Sung • 1279-1368 Yuan • 1368-1644 Ming • 1644-1912 Qing, Ch'ing Source: http://afe.easia.columbia.edu/timelines/china_timeline.htm
  • 18. Each layer monitored for uptime
  • 19. Scalable to support virtually any number of datasets Another project the “Billion Object Platform” (BOP) demonstrates how robust this platform is! http://terranodo.io/angular-search/#/search
  • 20. Some examples of catalogues that could be brought together • ArcGIS Open Data – Esri collection of 44,000 open datasets and growing. • Geodata.gov, Geoplatform.gov – The U.S. Federal government has built a data sharing platform for U.S. data using CKAN software and ArcGIS Online. • INSPIRE Geoportal – Spatial data portal of the European Commission. • GEOSS registry – Group on Earth Observations registry of 850 map service collections. • Geopole.org – CSW catalogue service providing access to 400,000 layers. • Geoblacklight - Platform developed by Stanford and other universities to provide fast search access to geospatial library holdings. • OpenGeoPortal – Platform developed by Tufts and other universities to provide fast search access to geospatial library holdings. • Geonetwork – Geospatial catalogue maintained by the Food and Agriculture Organization of the United Nations. • Spatineo.com – Commercial service which is currently monitoring 40,000 web services containing 899,000 layers. • New York Public Library Collection • David Rumsey Collection • Many CKAN portals • Many Thredds servers
  • 21. Help us improve the system • Try it out. If you can’t find services you know are out there, submit them to us and we will add them. • If you would like to harvest metadata from other systems, write a connector. We will provide guidance. • If you would like to bring such search into other applications besides WorldMap, write a client. We will help. • If you would like to set up your own HHypermap registry instance, the code is available. • If you have other features you would like, let us know.
  • 22. Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible http://worldmap.harvard.edu Benjamin Lewis, Paolo Corti, Wendy Guan