SlideShare une entreprise Scribd logo
1  sur  28
VECTOR SPATIAL DATA
STORAGE SCHEME BASED
ON HADOOP
Presented By :-
ANANT KUMAR
Mtech-CSE
1450006
OBJECTIVE
• Cloud computing technology is changing the mode of the spatial information
industry which is applied and provides new ideas for it.
• Since Hadoop platform provides easy expansion, high performance, high fault
tolerance and other advantages, we propose a novel vector spatial data storage
schema based on it to solve the problems on how to use cloud computing
technology to directly manage spatial data and present data topological relations.
• Firstly, vector spatial data storage schema is designed based on column-oriented
storage structures and key/value mapping to express spatial topological relations.
• Secondly, we design middleware and merge with vector spatial data storage
schema in order to directly store spatial data and present geospatial data access
refinement schemes based on GeoTools toolkit.
• Thirdly, we verify the middleware and the data storage schema through Hadoop
cluster experiments. Comprehensive experiments demonstrate that our proposal is
efficient and applicable to directly storing large-scale vector spatial data and
timely express spatial topological relations.
INDEX
 INTRODUCTION
 CLOUND COPUTING
 CLOUD PROPERTIES
 CLOUD COMPUTING INFRASTRUCTURE
 CLASSIFICATION OF CLOUD COMPUTING BASED ON SERVICE PROVIDED
 WHAT IS HADOOP
 HADOOP COMPNENTS
 HADOOP DISTRIBUTION FILE SYSTEM
 DATA STORAGE BASED ON HADOOP
 HBASE DATABASES’S STORAGE MECHANISM
 SPATIAL DATA
 DESIGINING VECTOR SPATIAL DATA STORAGE SCHEME
 VECTOR SPATIAL OBJECT MODEL
 VECTOR SPATIAL DATA LOGICAL STORAGE
 VECTOR SPATIAL DATA PHYSICAL STORAGE
 DEVELOPING MIDDILEWARE BASED ON GEO
 EXPRIMENTAL RESULTS
 CONCLUSIONS AND FUTURE WORK
 REFERENCES
INTRODUCTION
• Spatial data is the basis of GIS applications.
• GIS. With the advancements of data acquisition techniques, large amounts of geospatial
data have been collected from multiple data sources, such as satellite observations,
remotely sensed imagery, aerial photography, and model simulations.
• The geospatial data are growing exponentially to PB (Petabyte) scale even EB (Exabyte)
scale . As this presents a great challenge to the traditional database storage, especially in
terms of vector spatial data storage due to its complex structure, the traditional spatial
database storage is facing a series of questions such as poor scalability and low efficiency
of data storage.
• With the superiority in scalability and data storage efficiency, Hadoop, and large-scale
distributed data management platform in general, provides an efficient way for large-scale
vector spatial data storage.
• Many scholars have done data storage based on Cloud computing technology.
• Applied the MapReduce model to process spatial data. Researched geospatial data
storage, geospatial data index based on Hadoop platform.
• Hadoop platform with Oracle Spatial database in attribute data query and concluded that
Hadoop is more efficient in data query.
• Jifeng Cui, etc the heterogeneous geospatial data organization storage based on Google's
GFS to solve the problem of multi-source geospatial data storage and query efficiency.
• Relevant challenge ,how to use unstructured database to directly store spatial data.
CLOUD COMPUTING
• What is the “cloud”?
• Easier to explain with examples:
• Gmail is in the cloud
• Amazon (AWS) EC2 and S3 are the cloud
• Google AppEngine is the cloud
• SimpleDB is in the cloud
• “Cloud computing is the delivery of computing as a service rather
than a product, whereby shared resources, software, and information
are provided to computers and other devices as a utility (like the
electricity grid) over a network (typically the Internet). “
CLOUD PROPERTIES
• Cloud offers:
• Scalability : means that you (can) have infinite resources, can handle unlimited
number of users
• Reliability (hopefully!)
• Availability (24x7)
• Elasticity : you can add or remove computer nodes and the end user will not be
affected/see the improvement quickly.
• Multi-tenancy : enables sharing of resources and costs across a large pool of
users. Lower cost, higher utilization… but other issues: e.g. security.
CLOUD COMPUTING INFRASTRUCTURE
• Computation model: Map Reduce* partion the joband reduce it
• Storage model: HDFS*
• Other computation models: HPC/Grid Computing
• Network structure
Types of cloud
• Public Cloud: Computing infrastructure is hosted at the vendor’s premises.
• Private Cloud: Computing architecture is dedicated to the customer and is not shared with other
organizations.
• Hybrid Cloud: Organizations host some critical, secure applications in private clouds. The not so
critical applications are hosted in the public cloud
• Cloud bursting: the organization uses its own infrastructure for normal usage, but cloud is used for peak loads.
• Community Cloud
CLASSIFICATION OF CLOUD COMPUTING
BASED ON SERVICE PROVIDED
• Infrastructure as a service (IaaS)
• Offering hardware related services using the principles of cloud computing. These could include
storage services (database or disk storage) or virtual servers.
• Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale.
• Platform as a Service (PaaS)
• Offering a development platform on the cloud.
• Google’s Application Engine, Microsofts Azure, Salesforce.com’s force.com
.
• Software as a service (SaaS)
• Including a complete software offering on the cloud. Users can access a
software application hosted by the cloud vendor on pay-per-use basis. This
is a well-established sector.
• Salesforce.coms’ offering in the online Customer Relationship Management
(CRM) space, Googles gmail and Microsofts hotmail, Google docs.
WHAT IS HADOOP??
• Hadoop is a software framework for distributed processing of large datasets
across large clusters of computers
• Large datasets  Terabytes or petabytes of data
• Large clusters  hundreds or thousands of nodes
• Hadoop is open-source implementation for Google MapReduce
• Hadoop is based on a simple programming model called MapReduce
• Hadoop is based on a simple data model, any data will fit
• Download from hadoop.apache.org
• To install locally, unzip and set JAVA_HOME
• Details: hadoop.apache.org/core/docs/current/quickstart.html
• Three ways to write jobs:
• Java API
• Hadoop Streaming (for Python, Perl, etc)
• Pipes API (C++)
HADOOP COMPONENTS
• Distributed file system (HDFS)
• Single namespace for entire cluster
• Replicates data 3x for fault-tolerance
• MapReduce framework
• Executes user jobs specified as “map” and “reduce” functions
• Manages work distribution & fault-tolerance
HADOOP DISTRIBUTION FILE SYSYEM
• Files split into 128MB blocks
• Blocks replicated across several
datanodes (usually 3)
• Single namenode stores metadata
(file names, block locations, etc)
• Optimized for large files,
sequential reads
• Files are append-only
Namenode
Datanodes
1
2
3
4
1
2
4
2
1
3
1
3
3
2
4
DISTRIBUTED FILE SYSTEM HDFS
• HDFS, and Hadoop Distributed File System,
is the main application of the Hadoop
distributed file system and is shown in
Figure . And it stores data in blocks.
• Each block is the same size and its default
size is 64 MB.
• A HDFS cluster is the composition of
NameNode and a number of DataNodes.
• Clients firstly read metadata information
through NameNode, and then access the
data through the appropriate DataNode.
• If clients NameNode would record the
metadata store data in the DataNode,
information of data
The Architecture of HDFS [9]
NameNode
Recording metadata information: directory,
the number of copies, file name, location,
the number of copies, file name, location,
Recording
metadata
Writing
data
Writing
Data
Node
reading
data
Data block
Data Node
Data block
Data Node
Data block
Data NOde
Client
Client
Reading
metadata
DATA STORAGE BASED ON HADOOP
• Hadoop, and large-scale distributed data processing in general, is
rapidly becoming an important skill sets for many programmers .
• It is the core composition of distributed file system HDFS, distributed
unstructured database
• HBase and distributed parallel computing framework MapReduce.
• The key to the distinctions of Hadoop is accessible, robust and
scalable.
• Today, Hadoop is a core part of the computing infrastructure for many
web companies, such as Yahoo, Facebook, LinkedIn, and Twitter.
Many more traditional businesses, such as media and telecom, are
beginning to adopt this system, too.
HBASE DATABASE’S STORAGE MECHANISM
• It can use the local file system and HDFS.
• But HBase has the ability of processing data and improving data reliability and the robustness of the system if
it uses HDFS as its file system..
• HBase stores data on disk in a column -oriented format, and it is distinctly different from traditional columnar
databases:
• HBase excels at providing key-based access to a specific cell of data, or a sequential range of cells.
• Each row of the same table can have very different column and each column has a time version that is called
“timestamp”.
• Timestamp records the update of database that indicates an updated version.
• The logical view of the HBase database has two columns family: c1 and c2 that are shown by the Table 1.
• Each row of data expresses the update through timestamp.
• Each “column cluster” is saved through Several files and different column clusters are stored separately.
• The feature that is different from traditionally row-oriented database.
• In a row-oriented system, indexes tables as additional structures are built to get fast results for queries.
• HBase database does not need “additional storage for data indexes”, because data is stored with the indexes
itself.
• Data loading is executed faster in a column-oriented system than in a row-oriented one.
• all data of each row is stored together in a row-oriented and all data of in a column is stored together in a
column-oriented database.
• column-oriented system makes all columns parallel load to reduce the time of data loading.
HBASE DATABASE’S STORAGE
MECHANISM(CONTD.)
RowKey
Time Column Family:c1 Column Family:c2
Stamp info value Attribute value
t6 c1:1 Value1
r1
t5 c1:2 Value2 c2:1 Value1
t4 c2:2 Value2
t3
r2
t2 c1:1 Value1
t1 c1:2 Value2
Table 1. HBase Storage Data Logic View
Figure The Relationship between HBase and HDFS
Hadoop Map Reduce
Hbase
HDFS
Zookeeper
DESIGNING VECTOR SPATIAL DATA
STORAGE SCHEMA
• vector data model:A representation of the world using points, lines, and
polygons.
• Vector models are useful for storing data that has discrete boundaries, such
as country borders, land parcels, and streets.
• Vector data is more complex than raster data (grid of cells. Raster models
are useful for storing data that varies continuously, photograph,
a satellite image, a surface of chemical concentrations etc) organization
because it not only considers the scale, layers, point, line, surface and other
factors also involves “complex spatial topological relations”.
• We should design vector data storage schema that fits on the Hadoop
distributed platform in order to take advantage of Hadoop storage.
• This schema would offer an efficient organization and complete the storage
of vector spatial data in the unstructured database platform.
WHAT IS SPATIAL DATA ??
• Spatial Refers to space
• Spatial data refers to all types of data objects or elements that are present in a
geographical space or horizon.
• It enables the global finding and locating of individuals or devices anywhere in
the world.
• Spatial data is also known as geospatial data, spatial information or geographic
information.
• Spatial data is used in geographical information system (GIS) and other
geoloaction or positioning services.
• Spatial Consists of points, lines,polygon and other geographic data and geometric
premitives which can be mapped location,stored with an object as meta data or
used by a communication system to locate the user devices
• Spatial data may be classified as “scaler or vector data”.
VECTOR SPATIAL OBJECT MODEL
• OGC simple factor model that is proposed by International Association of Open
GIS is shown in Figure to share geospatial information and geospatial services.
• we use OGC simple factor model to design vector object model
• order to achieve the better interoperability of heterogeneous spatial database.
VECTOR SPATIAL DATA LOGICAL STORAGE
• Vector data consists of coordinate data, attribute data, topology data .
• We designed vector spatial data storage schema based on the HBase database
storage model according to the characteristics of vector data.
• Vector spatial data logical storage schema is shown by the Table 2 and it
contains three columns family which are coordinate, attribute, topology,
respectively recording coordinate information, attribute information and
topology information of data.
• Each data type in the storage system is string and is parsed into appropriate
data type in accordance with the Table 3 (The dictionary storage structure of
vector data type)
• We can cleverly design “RowKey” in accordance with the actual situation
and usage scenarios to obtain a collection of the results in the query and
good performance.
• The variable “tableName” represents the table’s name and the variable
familys contains some columns family in a table:
Vector Spatial Data Logical Storage(Contd.)
public static void creatTable(String tableName, String[] familys) throws Exception {
HBaseAdmin admin = new HBaseAdmin(conf);
if (admin.tableExists(tableName)) { System.out.println("table already exists!");
} else{
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
For (int i=o;i<familys.length;i++) {
tableDesc.addFamily(new HcolumnDescriptor(familys[i])); }
admin CreateTable(tableDesc);
Vector Spatial Data Logical Storage(Contd.)
Column Family: Column Family: Column Family:
RowKey TimeStamp Coordinate Attribute topology
Info value Attribute value Topology value
T8 Info:1 (x, y)
T7 Info:2 coordinate
T6 Attribute:1 Value1
T5 Attribute:2 Value2
T4
T3 Topo:1 Value1
T2 Info:1 (x, y)
T1 Info:2 coordinate
Time
Column Family: Column Family: Column Family:
Coordinate Attribute Topology
Stamp
info Value Attribute value Topology value
T8 “(x, y)”
“double
”
T7
“Coordinate
“string”
”
“Attribute1
T6 “int”
”
T5
“Attribute2
“string”
”T3 “Topo1” “int”
Table 3. The Dictionary Storage Structure for Vector Data Types
Table 2. Vector Spatial Data Storage View
VECTOR SPATIAL DATA PHYSICAL STORAGE
• HBase stores data on disk in a column-oriented format although the logical view
consists of many lines.
• The physical storage that the RowKey is Fea_ID1 in the Table 2 is shown by the
Table 4, Table 5 and Table 6.
• From these tables, it concluded that the blank column on the logic view in Table 2
could not be stored on the physical model actually.
• It is different from “relational database” when we design data storage model and
develop procedure.
• In HBase database, we don’t need built additional indexes, so data is stored
within indexes themselves.
• In the data query, Vector spatial data storage model based on HBase database only
read the columns required in a query.
• This queried way makes it provide a better performance for analytical request.
Vector Spatial Data Physical Storage(Contd.)
Row Key Time Stamp
Column Family: Coordinate
info value
Fea_ID1
T8 Info:1 (x, y)
T7 Info:2 coordinate
Row Key Time Stamp Column Family: Attribute
Attribute value
Fea_ID1 T6 “Attribute1” “int”
T5 “Attribute2” “string”
Row Key
Time Column Family: Topology
Stamp Topology value
Fea_ID1 T3 Topo:1 Value1
Table 4. The Physical Storage of Coordinate Column
Table 5. The Physical Storage of Attribute Column
Table 6. The Physical Storage of Technical Column
Developing Middleware Based on GeoTools
Due to expensive cost of commercial GIS(Geographic Information System) software.
GeoTools is an open source GIS toolkit developed from free soft foundation with the Java language
code.
 it contains a lot of open source GIS projects and standards-based GIS interface, provides a lot of
GIS algorithm, and possesses a good performance in the reading and writing of various data formats.
 In this experiment, we use GeoTools-2.7.5 open source project to read shapefile data from client
and make the appropriate conversion to use the “put ()” method to import data into HBase database
by Data Store, Feature Source, Feature Collection class libraries.
we design middleware and develop these methods and vector spatial data storage schema to access
and display the vector spatial data based on GeoTools toolkit.
According to; HBase database query mechanism, we use “get ( ) and san ( )” method to search data
from database. Get ( ) method acquires a single record and scan ( ) method requires range queries on
spatial data by limiting setStartRow ( ) and setStopRow ( ).
//Get() reading a single record HTable table = new HTable(conf, tableName);
Get get = new Get(rowKey.getBytes()); Result rs = table.get(get);
• Bytes[] ret=rs.getValue((Familys+":"+Column)) ; //Scan() requiring range queries
International Journal of Database Theory and Application
HTable table = new HTable(conf, tableName);
Scan s = new Scan(); s.setStartRow(startRow); s.setStopRow(stopRow);
ResultScanner rs = table.getScanner(s);
Experimental Results
 In Hadoop client, we use the scan ( ) method to query the
J48E023023 road layer data from 1:50000 vector data.
To complete the inquiry process Hadoop platform takes
1.26 seconds, while the Oracle Spatial platform takes 1.34
seconds.
Due to Hadoop platform is designed to manage large files,
it suffers performance penalty while managing a large
amount of data.
It can be seen that the efficiency of data storage is not too
high because the amount of data is too small.
HBase database is used to manage massive spatial data
efficiently
 And it expands nodes to obtain more storage space and
improve computational efficiency.
Finally, we use the “middleware” to develop the Feature (
), FeatureBuilder ( ), FeatureCollection ( ),
ShapefileDataStore ( )
class libraries to create shapefile for the reading data in
order to show this data by the middleware.
Road Layer Data Shown
CONCLUSION AND FUTURE WORK
we analyze HDFS distributed file system and HBase distributed database storage
mechanism and offer the vector spatial data storage schema based on Hadoop open
source distributed cloud storage platform.
Finally, we design middleware to merge with vector spatial data storage schema and
verify the effectiveness and availability of the vector spatial data storage schema
through the experiment.
 it also provides an effective way for large-scale vector spatial data storage and for
many companies which are committed to study Hadoop to store large-scale data.
Theoretically, according to Hadoop data storage strategy, we overcome poor
scalability, low efficiency and other problems of traditional relational database to
provide unlimited storage space and high reading and writing performance for large-
scale spatial data.
 we should design “excellent spatial data partition strategy and build distributed
spatial index structure with high performance to efficiently manage large-scale spatial
data”.
 Future work should look at spatial data partition strategy and distributed spatial index
structure with the goal of further enhancing data management effectiveness.
REFERENCES
• Y. Zhong, J. Han, T. Zhang and J. Fang, “A distributed geospatial data storage and processing framework for large-scale WebGIS”, 20th
International Conference on Geoinformatics (GEOINFORMATICS), Hong Kong China, (2012) June 15-17.
• S. Sakr, A. Liu, D. M. Batista and M. Alomari, “A Survey of Large Scale Data Management Approaches in Cloud Environments”,
Communications Surveys & Tutorials, vol. 13, no. 3, (2011), pp. 311-336.
• X. H. Liu, J. Han and Y. Zhong, “Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS”,
IEEE International Conference on Cluster Computing and Workshops, New Orleans, Louisiana, (2009) August 31- September 4, pp.
1- 8.
• D.-W. Zhang, F.-Q. Sun, X. Cheng and C. Liu, “Research on hadoop-based enterprise file cloud storage system”, 3rd International
Conference on in Awareness Science and Technology (iCAST), Dalian China, (2011) September 27-30, pp. 434-437.
• A. Cary, Z. Sun, V. Hristidis and N. Rishe, “Experiences on Processing Spatial Data with MapReduce, the 21st International
Conference on Scientific and Statistical Database Management”, New Orleans, LA, USA, (2009) June 02-04, pp. 1-18.
• Y. Gang Wang and S. Wang, “Research and Implementation on Spatial Data Storage and Operation Based on Hadoop Platform”,
Second IITA International Conference on Geoscience and Remote Sensing, Qingdao China, (2010) August 28-31, pp. 275-278.
• J. Cui, C. Li, C. Xing and Y. Zhang, “The framework of a distributed file system for geospatial data management”, Proceedings of IEEE
CCIS, (2011), pp. 183-187.
• C. Lam, “Hadoop in Action”, Manning Publications, (2010).
• K. Shvachko, K. Hairong, S. Radia and R. Chansler, “The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage
Systems and Technologies (MSST), Washington, DC: IEEE Computer Society, (2010) May 3-7, pp. 1-10
• D. Borthakur, “The Hadoop Distributed File System: Architecture and design”, (2008).
• M. Loukides and J. Steele, “HBase: The Definitive Guide”, Published by O’Reilly Media: First Edition, (2009) September.
• OGC, http://www.opengeospatial.org/standards, (2012) April 31.
THANK YOU

Contenu connexe

Tendances

The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Ingestion and Historization in the Data Lake
Ingestion and Historization in the Data LakeIngestion and Historization in the Data Lake
Ingestion and Historization in the Data Lakeiamtodor
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 
Database awareness
Database awarenessDatabase awareness
Database awarenesskloia
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKnoldus Inc.
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 

Tendances (20)

The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Ingestion and Historization in the Data Lake
Ingestion and Historization in the Data LakeIngestion and Historization in the Data Lake
Ingestion and Historization in the Data Lake
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Database awareness
Database awarenessDatabase awareness
Database awareness
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 

En vedette

Components of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISComponents of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISKaium Chowdhury
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatialSumant Diwakar
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial dataRahul Kumar
 
Black Arrow Trading 2014
Black Arrow Trading 2014Black Arrow Trading 2014
Black Arrow Trading 2014BLACK ARROW
 
2 141015202430-conversion-gate01
2 141015202430-conversion-gate012 141015202430-conversion-gate01
2 141015202430-conversion-gate01Dadu Ggt
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentationvictor closa
 
Pocket Editor View vFinal3
Pocket Editor View vFinal3Pocket Editor View vFinal3
Pocket Editor View vFinal3Sandhya Simhan
 
No touch porfis de fernando jose duarte tipton
No touch porfis de fernando jose duarte tiptonNo touch porfis de fernando jose duarte tipton
No touch porfis de fernando jose duarte tipton12345tyuiop
 
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3b
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3bD&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3b
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3bvbenner
 
Take Ten From Our PARCC Place
Take Ten From Our PARCC PlaceTake Ten From Our PARCC Place
Take Ten From Our PARCC PlaceKaren Simmons
 

En vedette (17)

Utility Dispatch Phone System Upgrade
Utility Dispatch Phone System UpgradeUtility Dispatch Phone System Upgrade
Utility Dispatch Phone System Upgrade
 
Components of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISComponents of Spatial Data Quality in GIS
Components of Spatial Data Quality in GIS
 
Spatial Data Model
Spatial Data ModelSpatial Data Model
Spatial Data Model
 
Spatial databases
Spatial databasesSpatial databases
Spatial databases
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial data
 
Black Arrow Trading 2014
Black Arrow Trading 2014Black Arrow Trading 2014
Black Arrow Trading 2014
 
2 141015202430-conversion-gate01
2 141015202430-conversion-gate012 141015202430-conversion-gate01
2 141015202430-conversion-gate01
 
tugas
tugastugas
tugas
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Pocket Editor View vFinal3
Pocket Editor View vFinal3Pocket Editor View vFinal3
Pocket Editor View vFinal3
 
wku rugby, 2014-2015
wku rugby, 2014-2015wku rugby, 2014-2015
wku rugby, 2014-2015
 
No touch porfis de fernando jose duarte tipton
No touch porfis de fernando jose duarte tiptonNo touch porfis de fernando jose duarte tipton
No touch porfis de fernando jose duarte tipton
 
English word
English wordEnglish word
English word
 
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3b
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3bD&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3b
D&co wkshp channels_7breakthroughideas_preview_slides_26mar15_v3b
 
SS City Plots Sector 85 Gurgaon
SS City Plots Sector 85 Gurgaon SS City Plots Sector 85 Gurgaon
SS City Plots Sector 85 Gurgaon
 
Take Ten From Our PARCC Place
Take Ten From Our PARCC PlaceTake Ten From Our PARCC Place
Take Ten From Our PARCC Place
 

Similaire à Vector Spatial Data Storage Scheme Based on Hadoop

VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computingSachin Gowda
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfSumanthReddy540432
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptxShreyasKv13
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxVishalBH1
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 

Similaire à Vector Spatial Data Storage Scheme Based on Hadoop (20)

VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdf
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptx
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptx
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Anju
AnjuAnju
Anju
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Dernier

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Dernier (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

Vector Spatial Data Storage Scheme Based on Hadoop

  • 1. VECTOR SPATIAL DATA STORAGE SCHEME BASED ON HADOOP Presented By :- ANANT KUMAR Mtech-CSE 1450006
  • 2. OBJECTIVE • Cloud computing technology is changing the mode of the spatial information industry which is applied and provides new ideas for it. • Since Hadoop platform provides easy expansion, high performance, high fault tolerance and other advantages, we propose a novel vector spatial data storage schema based on it to solve the problems on how to use cloud computing technology to directly manage spatial data and present data topological relations. • Firstly, vector spatial data storage schema is designed based on column-oriented storage structures and key/value mapping to express spatial topological relations. • Secondly, we design middleware and merge with vector spatial data storage schema in order to directly store spatial data and present geospatial data access refinement schemes based on GeoTools toolkit. • Thirdly, we verify the middleware and the data storage schema through Hadoop cluster experiments. Comprehensive experiments demonstrate that our proposal is efficient and applicable to directly storing large-scale vector spatial data and timely express spatial topological relations.
  • 3. INDEX  INTRODUCTION  CLOUND COPUTING  CLOUD PROPERTIES  CLOUD COMPUTING INFRASTRUCTURE  CLASSIFICATION OF CLOUD COMPUTING BASED ON SERVICE PROVIDED  WHAT IS HADOOP  HADOOP COMPNENTS  HADOOP DISTRIBUTION FILE SYSTEM  DATA STORAGE BASED ON HADOOP  HBASE DATABASES’S STORAGE MECHANISM  SPATIAL DATA  DESIGINING VECTOR SPATIAL DATA STORAGE SCHEME  VECTOR SPATIAL OBJECT MODEL  VECTOR SPATIAL DATA LOGICAL STORAGE  VECTOR SPATIAL DATA PHYSICAL STORAGE  DEVELOPING MIDDILEWARE BASED ON GEO  EXPRIMENTAL RESULTS  CONCLUSIONS AND FUTURE WORK  REFERENCES
  • 4. INTRODUCTION • Spatial data is the basis of GIS applications. • GIS. With the advancements of data acquisition techniques, large amounts of geospatial data have been collected from multiple data sources, such as satellite observations, remotely sensed imagery, aerial photography, and model simulations. • The geospatial data are growing exponentially to PB (Petabyte) scale even EB (Exabyte) scale . As this presents a great challenge to the traditional database storage, especially in terms of vector spatial data storage due to its complex structure, the traditional spatial database storage is facing a series of questions such as poor scalability and low efficiency of data storage. • With the superiority in scalability and data storage efficiency, Hadoop, and large-scale distributed data management platform in general, provides an efficient way for large-scale vector spatial data storage. • Many scholars have done data storage based on Cloud computing technology. • Applied the MapReduce model to process spatial data. Researched geospatial data storage, geospatial data index based on Hadoop platform. • Hadoop platform with Oracle Spatial database in attribute data query and concluded that Hadoop is more efficient in data query. • Jifeng Cui, etc the heterogeneous geospatial data organization storage based on Google's GFS to solve the problem of multi-source geospatial data storage and query efficiency. • Relevant challenge ,how to use unstructured database to directly store spatial data.
  • 5. CLOUD COMPUTING • What is the “cloud”? • Easier to explain with examples: • Gmail is in the cloud • Amazon (AWS) EC2 and S3 are the cloud • Google AppEngine is the cloud • SimpleDB is in the cloud • “Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet). “
  • 6. CLOUD PROPERTIES • Cloud offers: • Scalability : means that you (can) have infinite resources, can handle unlimited number of users • Reliability (hopefully!) • Availability (24x7) • Elasticity : you can add or remove computer nodes and the end user will not be affected/see the improvement quickly. • Multi-tenancy : enables sharing of resources and costs across a large pool of users. Lower cost, higher utilization… but other issues: e.g. security.
  • 7. CLOUD COMPUTING INFRASTRUCTURE • Computation model: Map Reduce* partion the joband reduce it • Storage model: HDFS* • Other computation models: HPC/Grid Computing • Network structure Types of cloud • Public Cloud: Computing infrastructure is hosted at the vendor’s premises. • Private Cloud: Computing architecture is dedicated to the customer and is not shared with other organizations. • Hybrid Cloud: Organizations host some critical, secure applications in private clouds. The not so critical applications are hosted in the public cloud • Cloud bursting: the organization uses its own infrastructure for normal usage, but cloud is used for peak loads. • Community Cloud
  • 8. CLASSIFICATION OF CLOUD COMPUTING BASED ON SERVICE PROVIDED • Infrastructure as a service (IaaS) • Offering hardware related services using the principles of cloud computing. These could include storage services (database or disk storage) or virtual servers. • Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale. • Platform as a Service (PaaS) • Offering a development platform on the cloud. • Google’s Application Engine, Microsofts Azure, Salesforce.com’s force.com . • Software as a service (SaaS) • Including a complete software offering on the cloud. Users can access a software application hosted by the cloud vendor on pay-per-use basis. This is a well-established sector. • Salesforce.coms’ offering in the online Customer Relationship Management (CRM) space, Googles gmail and Microsofts hotmail, Google docs.
  • 9. WHAT IS HADOOP?? • Hadoop is a software framework for distributed processing of large datasets across large clusters of computers • Large datasets  Terabytes or petabytes of data • Large clusters  hundreds or thousands of nodes • Hadoop is open-source implementation for Google MapReduce • Hadoop is based on a simple programming model called MapReduce • Hadoop is based on a simple data model, any data will fit • Download from hadoop.apache.org • To install locally, unzip and set JAVA_HOME • Details: hadoop.apache.org/core/docs/current/quickstart.html • Three ways to write jobs: • Java API • Hadoop Streaming (for Python, Perl, etc) • Pipes API (C++)
  • 10. HADOOP COMPONENTS • Distributed file system (HDFS) • Single namespace for entire cluster • Replicates data 3x for fault-tolerance • MapReduce framework • Executes user jobs specified as “map” and “reduce” functions • Manages work distribution & fault-tolerance
  • 11. HADOOP DISTRIBUTION FILE SYSYEM • Files split into 128MB blocks • Blocks replicated across several datanodes (usually 3) • Single namenode stores metadata (file names, block locations, etc) • Optimized for large files, sequential reads • Files are append-only Namenode Datanodes 1 2 3 4 1 2 4 2 1 3 1 3 3 2 4
  • 12. DISTRIBUTED FILE SYSTEM HDFS • HDFS, and Hadoop Distributed File System, is the main application of the Hadoop distributed file system and is shown in Figure . And it stores data in blocks. • Each block is the same size and its default size is 64 MB. • A HDFS cluster is the composition of NameNode and a number of DataNodes. • Clients firstly read metadata information through NameNode, and then access the data through the appropriate DataNode. • If clients NameNode would record the metadata store data in the DataNode, information of data The Architecture of HDFS [9] NameNode Recording metadata information: directory, the number of copies, file name, location, the number of copies, file name, location, Recording metadata Writing data Writing Data Node reading data Data block Data Node Data block Data Node Data block Data NOde Client Client Reading metadata
  • 13. DATA STORAGE BASED ON HADOOP • Hadoop, and large-scale distributed data processing in general, is rapidly becoming an important skill sets for many programmers . • It is the core composition of distributed file system HDFS, distributed unstructured database • HBase and distributed parallel computing framework MapReduce. • The key to the distinctions of Hadoop is accessible, robust and scalable. • Today, Hadoop is a core part of the computing infrastructure for many web companies, such as Yahoo, Facebook, LinkedIn, and Twitter. Many more traditional businesses, such as media and telecom, are beginning to adopt this system, too.
  • 14. HBASE DATABASE’S STORAGE MECHANISM • It can use the local file system and HDFS. • But HBase has the ability of processing data and improving data reliability and the robustness of the system if it uses HDFS as its file system.. • HBase stores data on disk in a column -oriented format, and it is distinctly different from traditional columnar databases: • HBase excels at providing key-based access to a specific cell of data, or a sequential range of cells. • Each row of the same table can have very different column and each column has a time version that is called “timestamp”. • Timestamp records the update of database that indicates an updated version. • The logical view of the HBase database has two columns family: c1 and c2 that are shown by the Table 1. • Each row of data expresses the update through timestamp. • Each “column cluster” is saved through Several files and different column clusters are stored separately. • The feature that is different from traditionally row-oriented database. • In a row-oriented system, indexes tables as additional structures are built to get fast results for queries. • HBase database does not need “additional storage for data indexes”, because data is stored with the indexes itself. • Data loading is executed faster in a column-oriented system than in a row-oriented one. • all data of each row is stored together in a row-oriented and all data of in a column is stored together in a column-oriented database. • column-oriented system makes all columns parallel load to reduce the time of data loading.
  • 15. HBASE DATABASE’S STORAGE MECHANISM(CONTD.) RowKey Time Column Family:c1 Column Family:c2 Stamp info value Attribute value t6 c1:1 Value1 r1 t5 c1:2 Value2 c2:1 Value1 t4 c2:2 Value2 t3 r2 t2 c1:1 Value1 t1 c1:2 Value2 Table 1. HBase Storage Data Logic View Figure The Relationship between HBase and HDFS Hadoop Map Reduce Hbase HDFS Zookeeper
  • 16. DESIGNING VECTOR SPATIAL DATA STORAGE SCHEMA • vector data model:A representation of the world using points, lines, and polygons. • Vector models are useful for storing data that has discrete boundaries, such as country borders, land parcels, and streets. • Vector data is more complex than raster data (grid of cells. Raster models are useful for storing data that varies continuously, photograph, a satellite image, a surface of chemical concentrations etc) organization because it not only considers the scale, layers, point, line, surface and other factors also involves “complex spatial topological relations”. • We should design vector data storage schema that fits on the Hadoop distributed platform in order to take advantage of Hadoop storage. • This schema would offer an efficient organization and complete the storage of vector spatial data in the unstructured database platform.
  • 17. WHAT IS SPATIAL DATA ?? • Spatial Refers to space • Spatial data refers to all types of data objects or elements that are present in a geographical space or horizon. • It enables the global finding and locating of individuals or devices anywhere in the world. • Spatial data is also known as geospatial data, spatial information or geographic information. • Spatial data is used in geographical information system (GIS) and other geoloaction or positioning services. • Spatial Consists of points, lines,polygon and other geographic data and geometric premitives which can be mapped location,stored with an object as meta data or used by a communication system to locate the user devices • Spatial data may be classified as “scaler or vector data”.
  • 18. VECTOR SPATIAL OBJECT MODEL • OGC simple factor model that is proposed by International Association of Open GIS is shown in Figure to share geospatial information and geospatial services. • we use OGC simple factor model to design vector object model • order to achieve the better interoperability of heterogeneous spatial database.
  • 19. VECTOR SPATIAL DATA LOGICAL STORAGE • Vector data consists of coordinate data, attribute data, topology data . • We designed vector spatial data storage schema based on the HBase database storage model according to the characteristics of vector data. • Vector spatial data logical storage schema is shown by the Table 2 and it contains three columns family which are coordinate, attribute, topology, respectively recording coordinate information, attribute information and topology information of data. • Each data type in the storage system is string and is parsed into appropriate data type in accordance with the Table 3 (The dictionary storage structure of vector data type) • We can cleverly design “RowKey” in accordance with the actual situation and usage scenarios to obtain a collection of the results in the query and good performance. • The variable “tableName” represents the table’s name and the variable familys contains some columns family in a table:
  • 20. Vector Spatial Data Logical Storage(Contd.) public static void creatTable(String tableName, String[] familys) throws Exception { HBaseAdmin admin = new HBaseAdmin(conf); if (admin.tableExists(tableName)) { System.out.println("table already exists!"); } else{ HTableDescriptor tableDesc = new HTableDescriptor(tableName); For (int i=o;i<familys.length;i++) { tableDesc.addFamily(new HcolumnDescriptor(familys[i])); } admin CreateTable(tableDesc);
  • 21. Vector Spatial Data Logical Storage(Contd.) Column Family: Column Family: Column Family: RowKey TimeStamp Coordinate Attribute topology Info value Attribute value Topology value T8 Info:1 (x, y) T7 Info:2 coordinate T6 Attribute:1 Value1 T5 Attribute:2 Value2 T4 T3 Topo:1 Value1 T2 Info:1 (x, y) T1 Info:2 coordinate Time Column Family: Column Family: Column Family: Coordinate Attribute Topology Stamp info Value Attribute value Topology value T8 “(x, y)” “double ” T7 “Coordinate “string” ” “Attribute1 T6 “int” ” T5 “Attribute2 “string” ”T3 “Topo1” “int” Table 3. The Dictionary Storage Structure for Vector Data Types Table 2. Vector Spatial Data Storage View
  • 22. VECTOR SPATIAL DATA PHYSICAL STORAGE • HBase stores data on disk in a column-oriented format although the logical view consists of many lines. • The physical storage that the RowKey is Fea_ID1 in the Table 2 is shown by the Table 4, Table 5 and Table 6. • From these tables, it concluded that the blank column on the logic view in Table 2 could not be stored on the physical model actually. • It is different from “relational database” when we design data storage model and develop procedure. • In HBase database, we don’t need built additional indexes, so data is stored within indexes themselves. • In the data query, Vector spatial data storage model based on HBase database only read the columns required in a query. • This queried way makes it provide a better performance for analytical request.
  • 23. Vector Spatial Data Physical Storage(Contd.) Row Key Time Stamp Column Family: Coordinate info value Fea_ID1 T8 Info:1 (x, y) T7 Info:2 coordinate Row Key Time Stamp Column Family: Attribute Attribute value Fea_ID1 T6 “Attribute1” “int” T5 “Attribute2” “string” Row Key Time Column Family: Topology Stamp Topology value Fea_ID1 T3 Topo:1 Value1 Table 4. The Physical Storage of Coordinate Column Table 5. The Physical Storage of Attribute Column Table 6. The Physical Storage of Technical Column
  • 24. Developing Middleware Based on GeoTools Due to expensive cost of commercial GIS(Geographic Information System) software. GeoTools is an open source GIS toolkit developed from free soft foundation with the Java language code.  it contains a lot of open source GIS projects and standards-based GIS interface, provides a lot of GIS algorithm, and possesses a good performance in the reading and writing of various data formats.  In this experiment, we use GeoTools-2.7.5 open source project to read shapefile data from client and make the appropriate conversion to use the “put ()” method to import data into HBase database by Data Store, Feature Source, Feature Collection class libraries. we design middleware and develop these methods and vector spatial data storage schema to access and display the vector spatial data based on GeoTools toolkit. According to; HBase database query mechanism, we use “get ( ) and san ( )” method to search data from database. Get ( ) method acquires a single record and scan ( ) method requires range queries on spatial data by limiting setStartRow ( ) and setStopRow ( ). //Get() reading a single record HTable table = new HTable(conf, tableName); Get get = new Get(rowKey.getBytes()); Result rs = table.get(get); • Bytes[] ret=rs.getValue((Familys+":"+Column)) ; //Scan() requiring range queries International Journal of Database Theory and Application HTable table = new HTable(conf, tableName); Scan s = new Scan(); s.setStartRow(startRow); s.setStopRow(stopRow); ResultScanner rs = table.getScanner(s);
  • 25. Experimental Results  In Hadoop client, we use the scan ( ) method to query the J48E023023 road layer data from 1:50000 vector data. To complete the inquiry process Hadoop platform takes 1.26 seconds, while the Oracle Spatial platform takes 1.34 seconds. Due to Hadoop platform is designed to manage large files, it suffers performance penalty while managing a large amount of data. It can be seen that the efficiency of data storage is not too high because the amount of data is too small. HBase database is used to manage massive spatial data efficiently  And it expands nodes to obtain more storage space and improve computational efficiency. Finally, we use the “middleware” to develop the Feature ( ), FeatureBuilder ( ), FeatureCollection ( ), ShapefileDataStore ( ) class libraries to create shapefile for the reading data in order to show this data by the middleware. Road Layer Data Shown
  • 26. CONCLUSION AND FUTURE WORK we analyze HDFS distributed file system and HBase distributed database storage mechanism and offer the vector spatial data storage schema based on Hadoop open source distributed cloud storage platform. Finally, we design middleware to merge with vector spatial data storage schema and verify the effectiveness and availability of the vector spatial data storage schema through the experiment.  it also provides an effective way for large-scale vector spatial data storage and for many companies which are committed to study Hadoop to store large-scale data. Theoretically, according to Hadoop data storage strategy, we overcome poor scalability, low efficiency and other problems of traditional relational database to provide unlimited storage space and high reading and writing performance for large- scale spatial data.  we should design “excellent spatial data partition strategy and build distributed spatial index structure with high performance to efficiently manage large-scale spatial data”.  Future work should look at spatial data partition strategy and distributed spatial index structure with the goal of further enhancing data management effectiveness.
  • 27. REFERENCES • Y. Zhong, J. Han, T. Zhang and J. Fang, “A distributed geospatial data storage and processing framework for large-scale WebGIS”, 20th International Conference on Geoinformatics (GEOINFORMATICS), Hong Kong China, (2012) June 15-17. • S. Sakr, A. Liu, D. M. Batista and M. Alomari, “A Survey of Large Scale Data Management Approaches in Cloud Environments”, Communications Surveys & Tutorials, vol. 13, no. 3, (2011), pp. 311-336. • X. H. Liu, J. Han and Y. Zhong, “Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS”, IEEE International Conference on Cluster Computing and Workshops, New Orleans, Louisiana, (2009) August 31- September 4, pp. 1- 8. • D.-W. Zhang, F.-Q. Sun, X. Cheng and C. Liu, “Research on hadoop-based enterprise file cloud storage system”, 3rd International Conference on in Awareness Science and Technology (iCAST), Dalian China, (2011) September 27-30, pp. 434-437. • A. Cary, Z. Sun, V. Hristidis and N. Rishe, “Experiences on Processing Spatial Data with MapReduce, the 21st International Conference on Scientific and Statistical Database Management”, New Orleans, LA, USA, (2009) June 02-04, pp. 1-18. • Y. Gang Wang and S. Wang, “Research and Implementation on Spatial Data Storage and Operation Based on Hadoop Platform”, Second IITA International Conference on Geoscience and Remote Sensing, Qingdao China, (2010) August 28-31, pp. 275-278. • J. Cui, C. Li, C. Xing and Y. Zhang, “The framework of a distributed file system for geospatial data management”, Proceedings of IEEE CCIS, (2011), pp. 183-187. • C. Lam, “Hadoop in Action”, Manning Publications, (2010). • K. Shvachko, K. Hairong, S. Radia and R. Chansler, “The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Washington, DC: IEEE Computer Society, (2010) May 3-7, pp. 1-10 • D. Borthakur, “The Hadoop Distributed File System: Architecture and design”, (2008). • M. Loukides and J. Steele, “HBase: The Definitive Guide”, Published by O’Reilly Media: First Edition, (2009) September. • OGC, http://www.opengeospatial.org/standards, (2012) April 31.