This document discusses how FME can be used to work with big data. It provides examples of using FME to load and export data from Amazon DynamoDB and MarkLogic databases. The DynamoDB demo shows how to spatially index and store any type of data in DynamoDB. The MarkLogic demos demonstrate using FME to convert data to XML/GML and load it into MarkLogic, and to query MarkLogic and export XML results for conversion to other formats. Big data provides opportunities for FME, and the cloud model is a natural fit for working with large datasets.
2. Agenda
What is Big Data
Big Data Challenges
FME and Big Data
Big Data Technologies
DynamoDB Workflow
MarkLogic Workflow
3.
4. Big Data and Cloud
Big Data needs big resources
Big data stores
Big processing power
Big bandwidth
Cloud technology gives you this for fraction of
traditional cost!
5. Big Data and FME
Big Data is a new data
“classification” for FME.
Big Data is no different than
other data to FME
FME Cloud is a natural fit for
data in the Cloud
FME makes it easy to leverage the power of Big Data
6. Big Data and FME Support
Amazon S3
Limitless internet based
storage
Amazon RDS
See blog article on Amazon RDS (PostGIS/SQLServer/Oracle)
Amazon DynamoDB
NoSQL limitless database service
Amazon RedShift
Petabyte scale database warehouse service.
Google BigQuery
Superfast append only tables
MarkLogic
Large XML based NoSQL database
7. Big Data Challenges
Loading Data
Lacks Spatial Support
Big Data Analysis
Querying and Exporting Data
8. Why Demo FME with
MarkLogic and DynamoDB?
Different from other
databases supported by
FME
15. DynamoDB Demo –
Spatially Locate and Store Any document or Web Resource
Generate Geohash
index
Write Document to
S3 and Link to
DynamoDB
location
16. DynamoDB Demo –
Retrieve any stored document
Write URI Link to
DynamoDB
Generate Geohash
index
location
17. What is ?
NoSQL database – XML optimized
Powerful search and analysis
Native spatial support
XML based data model (GML, XML, etc.)
Deploy on Hadoop HDFS
18. FME and MarkLogic – A Natural Fit
Convert data to XML/GML*
Easily load XML into MarkLogic with FME
Process and convert XML results
FME 2014: New schema based GML Writer
19. Demo #1a - Loading MarkLogic
Convert GIS / CAD
data to GML (XML)
Compose REST request
to PUT to MarkLogic
database
20. 1.Convert GIS / CAD data into Valid GML
2.Generate Key Fields
3.Build insert message
4.Execute PUT REST call
MarkLogic accepts any valid XML – just PUT it!
Loading GIS to MarkLogic
22. Demo #1b Exporting from MarkLogic
GET Query to find
URI’s for features
of interest
GET Query using URI’s to
get feature
XML/GML, then
Conversion to format of
choice (CAD, GIS …)
/WFS
23. Exporting XML from MarkLogic
1. Query database via GET request
2. Parse search result and compose GET feature request
3. Extract attributes and geometry from result
4. Validate and write XML Result
24. Exporting XML from MarkLogic
Search GET request to find URI based on query:
http://localhost:8003/v1/keyvalue?element=comment&value=AIXM.Chicago
Document Retrieval GET request based on URI:
http://localhost:8003/v1/documents?uri=/docs/myXML_653c46c3-fdfb-4837-ae1c-
49735dd29356.xml
25. AIXM from MarkLogic via FMEServer
http://UHURA/fmedatastreaming/Demos/QueryMarkLogicDB.
fmw?Element=airportCode&Value=CYVR
/AIXM
27. Summary
Big Data = big new opportunities
FME great for working with Big Data
Cloud model is a natural fit for Big Data
This is just the beginning - more to come!
Video plays here - what is big dataFuzzy term sort of like “cloud”. What does big data look like?As a catch-all term, “big data” can be pretty nebulous, in the same way that the term “cloud” covers diverse technologies. Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. Are these all really the same thing? To clarify matters, the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data. They’re a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them. Most probably you will contend with each of the Vs to one degree or another.
Big data holds all of it
Loading DataConversion: big data not spatial friendly (CAD, GIS)Expensive to upload / downloadGeoreferencing and spatial indexingmost big data repositories have limited geospatialBig Data AnalysisQuerying and Exporting DataTricky to find and access stored dataNeed to generate appropriate keys on load
Loading DataConversion: big data not spatial friendly (CAD, GIS)Expensive to upload / downloadGeoreferencing and spatial indexingmost big data repositories have limited geospatialBig Data AnalysisQuerying and Exporting DataTricky to find and access stored dataNeed to generate appropriate keys on load
Demo #1 Limitless Spatial Indexed Database:Geohash spatial indexStore Vector DataStore Raster DataStore Lidar DataStore geotagged images by locationStore and associate any document with a location
Big data repository – scale as big as you wantNoSQL database – optimized for XML / GMLPowerful search and analysis (BI, semantic queries)Stores location, not just geohashXML based data model – rapid XML exportStore any documents: GML, XML (metadata)Deploy on Hadoop HDFS
As applicable (e.g. cant convert raster to gml!)Important to emphasize FME2014’s new schema based GML writer which allows FME to convert almost any CAD / GIS or even BIM data to GML or CityGML. This makes FME a very powerful loader tool for MarkLogicFME - A Natural Fit to support MarkLogic:Converts almost any spatial data to GMLWrite almost any XML with XMLTemplaterLoading XML into MarkLogic is a simple HTTP PUT operation easily done with HTTPUploaderQuery, process and reconvert XML results
Converting features to GML/XML usually involves a GeometryExtractor transformer or some combination of CoordinateExtractor and XMLTemplaterKey fields can be captured from the source data or use UUIDGenerator to generate unique IDs for URIs etc.Build insert message with XMLTemplaterExecute REST PUT call with HTTPUploader
Converting features to GML/XML usually involves a GeometryExtractor transformer or some combination of CoordinateExtractor and XMLTemplaterKey fields can be captured from the source data or use UUIDGenerator to generate unique IDs for URIs etc.Build insert message with XMLTemplaterExecute REST PUT call with HTTPUploaderExample update message:<?xml version="1.0" encoding="UTF-8"?><xml><docID>{fme:get-attribute("_uuid")}</docID><docAuthor>{fme:get-attribute("user")}</docAuthor><modType>{fme:get-attribute("updateType")}</modType><UpdateDate>{fme:get-attribute("_timestamp")}</UpdateDate><filePath>{fme:get-attribute("filePath")}</filePath><comment>{fme:get-attribute("comment")}</comment><doc_xml>{fme:get-xml-attribute("_file_contents")}</doc_xml></xml>
…
Emphasize that this workspace can support the retrieval of any type of XML/GML regardless of schema. The same query workspace can be used to retrieve AIXM, INSPIRE or any other type of XML/GML.StringConcatenator composes search GET request based on input parametersHTTPFetcher sends search GET request to MarkLogicXMLFlattener flattens the response so result.uri can be exposedSecond StringConcatenatorcomposes document GET request based on matching URISecond HTTPFetcher sends document retrieval GET request to MarkLogicXMLFragmenter pulls out the doc_xml from the MarkLogic responseXML writer outputs the XML as a file or streams it to the FMEServer client once workspace is publishedSearch GET request to find URI based on query:http://localhost:8003/v1/keyvalue?element=comment&value=AIXM.ChicagoDocument Retrieval GET request based on URI:http://localhost:8003/v1/documents?uri=/docs/myXML_653c46c3-fdfb-4837-ae1c-49735dd29356.xml
Emphasize that for this demo the previous workspace was published to FME Server to make a feature service hosted by FMEServer on top of MarkLogic. The example here supports a simple REST based XML data stream.We could easily use this approach to build a FMEServer hosted WFS on top of MarkLogic.
Its important to emphasize what is going on here, especially if you are not playing any of the demo movies.This demo shows Inspector reading AIXM5 GML directly from the GET query: http://UHURA/fmedatastreaming/Demos/QueryMarkLogicDB.fmw?Element=airportCode&Value=CYVRThe query goes to FMEServer’s data streaming serviceFMEServer uses the URL parameters to run the published QueryMarkLogicDB.fmw workspace.QueryMarkLogicDB.fmw uses the values of Element and Value to build a search request and send that to MarkLogicQueryMarkLogicDB.fmw uses the URI from MarkLogic’s search result to compose and submit a document request to MarkLogicQueryMarkLogicDB.fmw extracts the feature XML from the MarkLogic’s document response and streams it back to the FMEServer client
Extra slides hidden for short version of presentation. These are included here since most presenters will not have immediate access to DynamoDB or MarkLogic.This just shows how FME can read XML from MarkLogic and use the GeometryReplacer to covert it to virtually any format FME supports
Shows how FME can be used to integrateMarkLogic and ArcGIS Server.These are the steps to move data from MarkLogic to Arc Server Feature Service
Shows how FME can be used to integrateMarkLogic and ArcGIS Server.These are the steps to move data from Arc Server Feature Service to MarkLogic. Note this workflow could be event driven, real time or as a scheduled update.
Workspace showing data flow from ArcServer toMarkLogic. REST call to feature service retrieves the feature of interest.JSON is extracted and GeometryReplacer generates an FME geometry from it.GeometryExtractor renders the FME geometry as GMLGML is added to an XML update message and posted to MarkLogic