SlideShare une entreprise Scribd logo
1  sur  19
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 1
CS828-1501C-01
ThienSi (TS) Le
Colorado Technical University
Professor: Dr. Kathreen Hargiss
Phase 5: Individual Project
Data Migration in Schemaless NoSQL Databases
March 15, 2015
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 2
Abstract
The short research paper in Phase 5, Individual Project of the course CS828-1501C-01
Advanced Topics in Database Systems discusses the concepts of NoSQL databases such
as Cassandra, Mongo, Neo4J, and Riak, and so forth. They adopt the Aggregate Data
Model that are supporting the application-oriented aggregates, embracing schema-less
data, running on the cluster platform in distributed network, and often making the trade-
off between the data consistency and other useful properties. This research paper will
describe the associated concepts of NoSQL’s schemalessness, then focus on data
migration especially on how to ensure the data stored in the databases matched with the
implicit schema embedded in the applications when the implicit schema has experienced
a change. The in-depth discussion, that will also cover the general principles of
conducting data migration, test strategy in NoSQL databases, consists of four main
sections:
A. The concept of NoSQL databases
This section discusses a noDefinition of NoSQLdatabases with distinct
characteristics, a brief comparison between NoSQL and traditional relational databases,
and NoSQL database’s recent emergence in Internet-centric services.
B. Aggregate Data Models
This section covers an aggregate data model and discusses some pros and cons.
C. Schemalessness and Implicit Schema
One of the primary discussions is description of the central concepts of the
schemaless database and implicit schema in NoSQL databases.
D. Data Migration in NoSQL database with implicit schema
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 3
This section describes an in-depth discussion of data migration with implicit
schema. It covers the principles, strategy, test options of data migration in application
code that contains implicit schema with two demonstration examples.
The paper will also provide a list of references used in this individual project at
the end of this document.
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 4
Data Migration in Schemaless NoSQL Databases
In a modern era of data and information, several novel standards in computing,
automation and technologies that have emerged in computing, automation, and
technologies have produced enormous amounts of electronic data. The corporations,
governments, the academic community in both public and private sectors have turned to
database management systems (DBMS) to assist them operating enterprises and
conducting business locally and globally in very competitive market. According to
Bloomberg Businessweek (2011), many companies in Fortune 500 have used the
traditional relational DBMS from one vendor to another to conduct and control their
business. However, with a vast amount of electronic and nonuniform data and custom
data fields generated by Web estates and services such as Cloud Computing, Business
Intelligence, Science & Technology, etc., NoSQL database that is a schema-free or
schemaless database with an aggregate data model has emerged as a solution to handle
big data (Chen, Chiang and Storey, 2012). Data migration becomes a primary issue to
many companies with multiple types of applications in web service, e-commerce,
business intelligence, e-government and politics, smart health, security and public safety.
A. NoSQL Databases
NoSQL is an acronym for Not Only Structured Query Language (Hargiss, 2015).
1. What is a definition of NoSQL database?
According to Sadalage and Fowler (2012), NoSQL databases have a few distinct
characteristics:
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 5
- They do not use SQL (Structured Query Language).
- They are usually open-source projects.
- Most of the NoSQL databases are driven by the enterprises’ need to run on
clusters.
- They are based on the needs of the early 21th century Web estates.
- They are polyglot persistent. That means NoSQL databases use different data
stores in various circumstances.
- and maybe one of the most unusual characteristics is NoSQL database operates
without a schema. (i.e., schema-free, schemaless, implicit schema).
With a crude set of distinct characteristics above, the NoSQL database is not
definitional. There is no standard for NoSQL databases. Therefore, Sadalage et al. (2012)
defined a NoSQL database as a noDefinition!
2. NoSQL data base versus the traditional Relational DBMS
NoSQL system is a non-relational data storage system that does not require a
relation schema, joins concept with some level of tolerance to ACID properties. A
NoSQL database management system has recently emerged as an alternative database
management system (DBMS) to the traditional relational database system (RDBMS)
(Connolly and Begg, 2014) because of several typical reasons:
a. RDBMS’s database cannot contain universal complex be-all or end-all relations.
b. There are other database languages with other data storage tools for databases.
c. A NoSQLsolution is more acceptable and suitable for a client’s advanced
internet-centric applications and services.
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 6
d. NoSQL database provides more freedom, horizontal scalability, and flexibility.
3. The emergence of the NoSQL database
Sadalage et al. (2012) believe that RDBMS has a strictly structured table of
relations that is no longer suitable for modern in-memory data structures such as
Facebook, Twitter with large data needs. In addition, other applications for cloud-based
applications, e.g., Amazon S3, dynamically-typed languages and open-source driven
community drive NoSQL DBMS’s such as Cassandra, CouchDB, Neo4J, Hbase
emerging recently. NoSQL database appears as a solution for a client’s advanced Web-
based applications and services.
B. Aggregate Data Models
The NoSQL database provides a friendly implementation and usage as an
alternative to traditional relational DBMS to developers and end-users. The NoSQL
database requires more programming but less database design. On the positive aspect, it
offers flexible schema or schema-less. It allows quicker and cheaper setup. It has
massively vertical or horizontal scalability. It relaxes data consistency for higher
performance and availability. However, on the negative aspect, it uses no declarative
query language. As a result, it requires more programming to obtain needed information.
Since it relaxes data consistency, there are fewer guarantees of meaningful information.
In addition, while the traditional relational databases could not handle the issues of the
big data, expandable horizontal scalability, complex data format, sophisticated
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 7
manageability, NoSQLdatabases employ a map-reduce computation task (Date, 2006). A
Map-reduce is a programming database model that uses a parallel and distributed
algorithm to process and generate large sets of data in databases on big clusters of servers
and processors with Mappers and Reducers. Notice that the outcomes of the Mappers and
the Reducers are stored as the materialized views in cached memory (Sadalage et al.,
2012).
1. The NoSQL databases’ aggregate data model
In contrast with a traditional relational database using the strict entity-relation
model, the NoSQLdatabases use an aggregate data model contains aggregate data. The
aggregate data is a complex structured record of the nested data. The aggregate data,
called an aggregate by Evan (2004), is a collection of related objects treated as a unit of
data. The aggregate data model is an aggregate oriented data model for a unique
NoSQLsolution. It that consists of four model categories: key-value, document, column-
family, and graph (Sadalage et al., 2012). The NoSQL database usually uses two primary
aggregate data models: Key-value or the big hash table (e.g., Amazone S3, Voldemort,
Scalaris) and schema-less (e.g., Cassandra, CouchDB, Neo4J).
2. Some Pros and Cons of the aggregate data models
There are some pros and cons of these aggregate data models. In a key-value
model, the Pros are: very fast, very scalable, a simple model, and able to distribute
horizontally. The Cons are many data structures or objects cannot be easily modeled as
key-value pairs. On the other hand, a schema-less model, the Pros are the schema-less
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 8
data model is richer than key/value pairs, eventual consistency, many are distributed and
it still provides excellent performance and scalability. Its Cons are there are no ACID
transactions or joins.
C. Schemalessness and Implicit Schema
A central theme of NoSQLdatabases is that they are schemaless. Schemalessness
has a big impact on changes of database’s structure. Users should exercise the control of
storing data so that they can access both old and new data.
1. Main concept of the schemalessness in NoSQL database
A NoSQLdatabase is ignorant of the schema (that is a defined structure such as a
table, column, data type for storing data and its attributes). A NoSQL database cannot use
the schema to store and retrieve data efficiently. It does not even apply its validation
upon that data to ensure that different applications do not manipulate data in an
inconsistent way. However, a schemaless NoSQL database provides freedom and
flexibility on data storage (Moniruzzaman and Hossain, 2013). With the schemaless
characteristic, NoSQL database allows users to store data casually. In advanced Internet-
centric services in e-commerce in the digital market, the aggregate records contain
correctly nonuniform data where its record has a different set of fields in a schemaless
database. For example, a key-value store allows users to store any data they desire in the
database. Users can efficiently store data and comfortably change data storage as they
learn more about their project. They can also add new things as they discover them
(Pankowski, 2002).
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 9
2. Implicit schema in NoSQL database
Since NoSQLdatabase is schema-free, to access aggregate records or nonuniform
data, users are required to write a program such as scripts that mostly relies on some form
of implicit schema. The implicit schema is a set of assumptions about the data’s structure
in the code that manipulate the data. A schemaless database shifts a strict fixed schema
into the application code that accesses data. That means users need to dig into application
code to understand data and its associated information (Sadalage et al., 2012). If the
application code is well structured, users are able to deduce the implicit schema for useful
data and its related information. Otherwise, they may be stuck on data access. In other
words, with implicit schema, users are required more programming skills but less design
experience.
3. A primary problem of data access with the implicit schema
Since application code in the schemaless NoSQL database contains the implicit
schema, it becomes problematic if multiple applications, developed by different
developers, access the same database. To reduce the problems, users can encapsulate all
database interaction within a single application and integrate it with other applications
using Web services. Another approach is to delineate different areas of an aggregate for
access by various applications.
D. Data Migration in NoSQL databases with implicit schema
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 10
In general, data migration is a process to transfer data between storage types,
formats, databases, computer systems. In system implementation, database integration,
upgrade or consolidation, data migration is a key deliberation. It is usually achieved
programmatically by automated migration.( datamigrationpro.com, 2009).
1. Data migration with implicit in NoSQL database
In NoSQL database, the schemalessness provides freedom and flexibility in data
migration within an aggregate record. During developing with NoSQL databases,
designers, who do not think about schema, consider other aspects such as how keys are
assigned and what is data structure inside a value object in key-value stores or types of
relationships with graph databases. Even though there is no fixed schema, data is stored
in memory with implicit schema that is defined and contained in application code. If the
application code can not parse the data from its database, a schema mismatch or data
inconsistency will occur (cisco.com). Notice that to access multiple aggregate records or
change the aggregate boundaries, the data migration with implicit schema becomes
complex as it is in the RDBMS. It is even more complex when users do not understand a
set of assumptions about the data’s structure in the application code that manipulate the
data in aggregate records.
2. Principles of the data migration in NoSQL databases with implicit schema
Data migration process in NoSQL databases is similar to other data migration
processes except some minor change in requirements from the implicit schema. The
efficient data migration has some primary mapping phases that include data extraction,
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 11
data loading, data verification with minimum of data loss and preserving consistencies.
Data cleansing is commonly performed to improve data quality. In the principles (Katzoff
, datamigrationpro.com, 2014) , data migration in NoSQLdatabases with implicit schema
maybe consists of five phases (Design, extraction, cleansing, loading, and verification)
for applications from moderate to high complexity to match the requirements of the
implicit schema. Three phases of five phases are mentioned below because they are
essential:
- Data extraction: It is a process of retrieving data out of homogeneous or
heterogeneous, unstructured data source for further data processing.
- Data loading: It is a part of the ETL (extract, transform, load) process to load data
into a final target database.
- Data verification: It is a process to check different types of data for accuracy,
inconsistencies after data migration is done.
According to Katzoff (2014), for an efficient process, data migration strategy may
have ten steps as shown below:
a. Planning – Identify the baseline and legal original.
b. Analysis and data discovery – Determine if metadata in the sources is sufficient
for target document process.
c. Tool selection -
d. Master data management – Harmonize key-value pairs and workflow process.
e. Tool configuration -
f. Data cleansing
g. Dry runs
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 12
i. Formal testing
j. Production execution
k. Post production support
After data migration is performed on NoSQL database, there are several options
to minimize migration error by testing. Testing options for data migration in NoSQL
database with implicit schema include a de facto approach data and content migration
based on the sampling of some subset of random data selected and inspected. Some
options are pre-migration testing, formal design review, post-integration testing, user
acceptance testing, and production testing.
3. Example 1 - MongoDB’s data migration
Data migration in NoSQL database such as MongoDB with implicit schema is an
example to show that implicit schema changes do matter when there are a deployed
applications and existing production data in a document data store with a data model :
customer, order, and orderItems as shown below:
MongoDB’s document data code is shown below:
{
“_id”: “31415926AB47E98374D”
“customerid”: “CTU_online”
“name”: “CS828-1501C-01 Inc”
“since”: “01/04/2015”
“order”: {
“oderid”: “18319888”, “orderdate”:01/04/2015”,
“orderItems”:
[{“product”: “Database Course”,
“price”: 2122.00}]
}
}
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 13
Application code for implicit schema to write this document structure to
MongoDB is:
BasicDBObject orderItem = new BasicDBObject();
orderItem.put(“product”, productName);
orderItem.put(“price”, price);
orderItems.add(orderItem);
Code to read the document back from the MongoDB database is:
BasicDBObject item = (BasicDBObject) orderItem;
String productName = item.getString(“product”);
Double price = item.getDouble(“price”);
Adding preferredShippingType is changing the objects does not require any
change in database because the MongoDB does not care that different documents do not
follow the same schema. All that needs ti be deployed is the applications only.The code
has to ensure that documents that do not have the preferredShippingType attribute can be
spared.
If discountedPrice is introduced and price is renamed to fullPrice, a developer
renames price attribute to fullPrice then adds discountedPrice attribute as below:
{
“_id”: “261003OPOELALKJDK”
“customerid”: “CTU_offline”
“name”: “RES860-1501C-01 Inc”
“since”: “01/04/2015”
“order”: {
“oderid”: “18319888”,
“orderdate”:03/21/2015”,
“orderItems”:
[{“product”: “Research Course”,
“fullPrice”: 2214.00,
“discountedPrice”: 2122.00}]
}
}
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 14
Once the change is deployed, new customers and orders can be saved and read
back properly. However, the price of the product for existing orders can not be read
because now the code looks for fullPrice while the document has only price attribute.
4. Example 2 - Incremental migration
(Source: Chapter 12: Schema Migration from “NoSQL distilled: a brief guide to the
emerging world of polyglot persistence” by Sadalage & Fowler (2012))
Data migration with implicit schema has a risk of data loss, schema mismatch,
attribute removal in new aggregate records. When the application changes its code,
implicit schema is also changed. In consequence, new data may not have all attributes
as the old data does. Before the implicit schema changes, developers can use incremental
migration to ensure that the new code can still parse data. The document with price and
fullPrice attributes from the example 1 is displayed:
BasicDBObject item = (BasicDBObject) orderItem;
String productName = item.getString(“product”);
Double price = item.getDouble(“price”);
If (fullPrice == null)
{
fullPrice = item.getDouble(“fullPrice”);
}
Double discountedPrice = item.getDouble(“discoutedPrice”);
When writing the document back, the old attribute price is not saved:
BasicDBObject orderItem = new BasicDBObject();
orderItem.put(“product”, productName);
orderItem.put(“fullPrice”, price);
orderItem.put(“discountedPrice”, discountedPrice);
orderItems.add(orderItem);
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 15
When using incremental migration, there could be many versions of the object
that can translate the old schema to the new schema. While saving the object back, it is
saved using the new object. This gradual migration of data helps the application evolve
faster.
Conclusion
The short research paper discusses the concepts of NoSQL databases with
adopting adopt the Aggregate Data Model that are supporting the application-oriented
aggregates, embracing schema-less data, running on the cluster platform in distributed
network, and often making the trade-off between the data consistency and other useful
properties. It focuses on the associated concepts of NoSQL’s schemalessness and
emphasizes data migration in NoSQL databases with implicit schema. The in-depth
discussion, that also covers the general principles of conducting data migration, test
strategy in NoSQL databases, consists of four main sections: (1) the concepts of NoSQL
databases, (2) aggregate data models, (3) schemalessness and implicit schema, and (4)
data migration in NoSQL database with implicit schema. A final note is whether the
NoSQL databases are able to handle Big Data with the implicit schemas in data-driven
era in the early 21th century?
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 16
REFERENCES
1. Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business Intelligence and Analytics:
From Big Data to Big Impact. MIS Quarterly, 36(4), 1165-1188.
2. Connolly, T. M., & Begg, C. E. (2014). Database Systems: A Practical Approach to
Design, Implementation, and Management. New Jersey, NJ: Pearson
3. Date, C. J., 2006). The relational database dictionary: A comprehensive glossary of
relational terms and concepts, with illustrative examples. "O'Reilly Media, Inc.". pp.
59–. ISBN 978-1-4493-9115-7.
4. Hargiss, K. (2015). Chat session 9 (Lecture) of NoSQLdatabase. Information retrieved
from presentation slides.
5. McNurlin, B. C., Ralph H. Sprague, J., & Bui, T. (2009). Information Systems
Management in Practice (Eighth Edition ed.). Upper Saddle River: Pearson Prentice Hall.
6. Moniruzzaman, A. B. M., & Hossain, S. A. (2013). Nosql database: New era of
databases for big data analytics-classification, characteristics and comparison.arXiv
preprint arXiv:1307.0191.
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 17
7. Pankowski, T. (2002). PathLog: a Query Language for Schemaless Databases of
Partially Labeled Objects. Fundamenta Informaticae, 49(4), 369.
8. Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging
world of polyglot persistence. Pearson Education.
9. http://www.datamigrationpro.com/data-migration-articles/2009/11/30/how-to-
implement-an-effective-data-migration-testing-strateg.html.
10. http://en.wikipedia.org/wiki/Data_migration.
11. https://msdn.microsoft.com/en-us/library/ms174467.aspx.
12. http://www.cisco.com/c/en/us/td/docs/security/ise/1-
3/migration_guide/b_ise_MigrationGuide/b_ise_MigrationGuide12_chapter_011.html.
13. http://www.computerweekly.com/feature/An-ABC-guide-to-data-migration.
14. http://www.laserfiche.com/support/webhelp/Laserfiche/9.0/en-
US/AdminGuide/Content/Basic_Principles_of_the_Migration_Proc.
15. http://www.webopedia.com/TERM/D/data_migration.html.
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 18
APPENDIX
CS828 Phase 5 Individual Project: Grade: A Score: 200 pt 3/16/2015
Current Grade Average: A (955/955)
ThienSi...
Congratulations on a well written paper used to discuss the general principles of
conducting data migration in NoSQL databases. You clearly presented thoughts as
how to ensure the data stored in the databases matched with the “Implicit Schema”
embedded in the applications when the “Implicit Schema” has experienced a
change....excellent work!
Proficient: The submitted work exceeds the project criteria requirements. It
demonstrates a comprehensive understanding of course material and meets the
course objectives with proficiency.
Dr. Kathleen Hargiss.
Data Migration in Schemaless NoSQL Databases
Version 1.0 Page 19

Contenu connexe

Tendances

MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsDataminingTools Inc
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databasesBryan Corpuz
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Zhang Bo
 
Introduction to database with ms access.hetvii
Introduction to database with ms access.hetviiIntroduction to database with ms access.hetvii
Introduction to database with ms access.hetvii07HetviBhagat
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
overview of database concept
overview of database conceptoverview of database concept
overview of database conceptgourav kottawar
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke LonerganLuke Lonergan
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsnehabsairam
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representationRuhull
 
Database Management Systems 1
Database Management Systems 1Database Management Systems 1
Database Management Systems 1Nickkisha Farrell
 
Database Concepts 101
Database Concepts 101Database Concepts 101
Database Concepts 101Amit Garg
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 
Brief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas moslehBrief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas moslehFas (Feisal) Mosleh
 

Tendances (18)

MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database Concepts
 
Database and types of database
Database and types of databaseDatabase and types of database
Database and types of database
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Introduction to database with ms access.hetvii
Introduction to database with ms access.hetviiIntroduction to database with ms access.hetvii
Introduction to database with ms access.hetvii
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
overview of database concept
overview of database conceptoverview of database concept
overview of database concept
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke Lonergan
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortals
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representation
 
Database Management Systems 1
Database Management Systems 1Database Management Systems 1
Database Management Systems 1
 
Database Concepts 101
Database Concepts 101Database Concepts 101
Database Concepts 101
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 
Brief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas moslehBrief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas mosleh
 

En vedette

CS844 U1 Individual Project
CS844 U1 Individual ProjectCS844 U1 Individual Project
CS844 U1 Individual ProjectThienSi Le
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual ProjectThienSi Le
 
RES804 P6 Individual Project - Prospectus
RES804 P6 Individual Project - ProspectusRES804 P6 Individual Project - Prospectus
RES804 P6 Individual Project - ProspectusThienSi Le
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual ProjectThienSi Le
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual ProjectThienSi Le
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101ThienSi Le
 
CS844 U4 Individual Project
CS844 U4 Individual ProjectCS844 U4 Individual Project
CS844 U4 Individual ProjectThienSi Le
 

En vedette (7)

CS844 U1 Individual Project
CS844 U1 Individual ProjectCS844 U1 Individual Project
CS844 U1 Individual Project
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual Project
 
RES804 P6 Individual Project - Prospectus
RES804 P6 Individual Project - ProspectusRES804 P6 Individual Project - Prospectus
RES804 P6 Individual Project - Prospectus
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual Project
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual Project
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101
 
CS844 U4 Individual Project
CS844 U4 Individual ProjectCS844 U4 Individual Project
CS844 U4 Individual Project
 

Similaire à CS828 P5 Individual Project v101

A Comparative Study of NoSQL and Relational Database.pdf
A Comparative Study of NoSQL and Relational Database.pdfA Comparative Study of NoSQL and Relational Database.pdf
A Comparative Study of NoSQL and Relational Database.pdfJennifer Roman
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
I.J. Information Technology and Computer Science, 2016, 12, 59.docx
I.J. Information Technology and Computer Science, 2016, 12, 59.docxI.J. Information Technology and Computer Science, 2016, 12, 59.docx
I.J. Information Technology and Computer Science, 2016, 12, 59.docxwilcockiris
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
 
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
 
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxvvpadhu
 
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdf
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdfSQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdf
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdfssusere444941
 
Assignment_4
Assignment_4Assignment_4
Assignment_4Kirti J
 

Similaire à CS828 P5 Individual Project v101 (20)

A Comparative Study of NoSQL and Relational Database.pdf
A Comparative Study of NoSQL and Relational Database.pdfA Comparative Study of NoSQL and Relational Database.pdf
A Comparative Study of NoSQL and Relational Database.pdf
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
No sql database
No sql databaseNo sql database
No sql database
 
I.J. Information Technology and Computer Science, 2016, 12, 59.docx
I.J. Information Technology and Computer Science, 2016, 12, 59.docxI.J. Information Technology and Computer Science, 2016, 12, 59.docx
I.J. Information Technology and Computer Science, 2016, 12, 59.docx
 
NoSQL
NoSQLNoSQL
NoSQL
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
 
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdf
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdfSQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdf
SQL OR NoSQL DATABASES? CRITICAL DIFFERENCES.pdf
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 

CS828 P5 Individual Project v101

  • 1. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 1 CS828-1501C-01 ThienSi (TS) Le Colorado Technical University Professor: Dr. Kathreen Hargiss Phase 5: Individual Project Data Migration in Schemaless NoSQL Databases March 15, 2015
  • 2. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 2 Abstract The short research paper in Phase 5, Individual Project of the course CS828-1501C-01 Advanced Topics in Database Systems discusses the concepts of NoSQL databases such as Cassandra, Mongo, Neo4J, and Riak, and so forth. They adopt the Aggregate Data Model that are supporting the application-oriented aggregates, embracing schema-less data, running on the cluster platform in distributed network, and often making the trade- off between the data consistency and other useful properties. This research paper will describe the associated concepts of NoSQL’s schemalessness, then focus on data migration especially on how to ensure the data stored in the databases matched with the implicit schema embedded in the applications when the implicit schema has experienced a change. The in-depth discussion, that will also cover the general principles of conducting data migration, test strategy in NoSQL databases, consists of four main sections: A. The concept of NoSQL databases This section discusses a noDefinition of NoSQLdatabases with distinct characteristics, a brief comparison between NoSQL and traditional relational databases, and NoSQL database’s recent emergence in Internet-centric services. B. Aggregate Data Models This section covers an aggregate data model and discusses some pros and cons. C. Schemalessness and Implicit Schema One of the primary discussions is description of the central concepts of the schemaless database and implicit schema in NoSQL databases. D. Data Migration in NoSQL database with implicit schema
  • 3. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 3 This section describes an in-depth discussion of data migration with implicit schema. It covers the principles, strategy, test options of data migration in application code that contains implicit schema with two demonstration examples. The paper will also provide a list of references used in this individual project at the end of this document.
  • 4. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 4 Data Migration in Schemaless NoSQL Databases In a modern era of data and information, several novel standards in computing, automation and technologies that have emerged in computing, automation, and technologies have produced enormous amounts of electronic data. The corporations, governments, the academic community in both public and private sectors have turned to database management systems (DBMS) to assist them operating enterprises and conducting business locally and globally in very competitive market. According to Bloomberg Businessweek (2011), many companies in Fortune 500 have used the traditional relational DBMS from one vendor to another to conduct and control their business. However, with a vast amount of electronic and nonuniform data and custom data fields generated by Web estates and services such as Cloud Computing, Business Intelligence, Science & Technology, etc., NoSQL database that is a schema-free or schemaless database with an aggregate data model has emerged as a solution to handle big data (Chen, Chiang and Storey, 2012). Data migration becomes a primary issue to many companies with multiple types of applications in web service, e-commerce, business intelligence, e-government and politics, smart health, security and public safety. A. NoSQL Databases NoSQL is an acronym for Not Only Structured Query Language (Hargiss, 2015). 1. What is a definition of NoSQL database? According to Sadalage and Fowler (2012), NoSQL databases have a few distinct characteristics:
  • 5. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 5 - They do not use SQL (Structured Query Language). - They are usually open-source projects. - Most of the NoSQL databases are driven by the enterprises’ need to run on clusters. - They are based on the needs of the early 21th century Web estates. - They are polyglot persistent. That means NoSQL databases use different data stores in various circumstances. - and maybe one of the most unusual characteristics is NoSQL database operates without a schema. (i.e., schema-free, schemaless, implicit schema). With a crude set of distinct characteristics above, the NoSQL database is not definitional. There is no standard for NoSQL databases. Therefore, Sadalage et al. (2012) defined a NoSQL database as a noDefinition! 2. NoSQL data base versus the traditional Relational DBMS NoSQL system is a non-relational data storage system that does not require a relation schema, joins concept with some level of tolerance to ACID properties. A NoSQL database management system has recently emerged as an alternative database management system (DBMS) to the traditional relational database system (RDBMS) (Connolly and Begg, 2014) because of several typical reasons: a. RDBMS’s database cannot contain universal complex be-all or end-all relations. b. There are other database languages with other data storage tools for databases. c. A NoSQLsolution is more acceptable and suitable for a client’s advanced internet-centric applications and services.
  • 6. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 6 d. NoSQL database provides more freedom, horizontal scalability, and flexibility. 3. The emergence of the NoSQL database Sadalage et al. (2012) believe that RDBMS has a strictly structured table of relations that is no longer suitable for modern in-memory data structures such as Facebook, Twitter with large data needs. In addition, other applications for cloud-based applications, e.g., Amazon S3, dynamically-typed languages and open-source driven community drive NoSQL DBMS’s such as Cassandra, CouchDB, Neo4J, Hbase emerging recently. NoSQL database appears as a solution for a client’s advanced Web- based applications and services. B. Aggregate Data Models The NoSQL database provides a friendly implementation and usage as an alternative to traditional relational DBMS to developers and end-users. The NoSQL database requires more programming but less database design. On the positive aspect, it offers flexible schema or schema-less. It allows quicker and cheaper setup. It has massively vertical or horizontal scalability. It relaxes data consistency for higher performance and availability. However, on the negative aspect, it uses no declarative query language. As a result, it requires more programming to obtain needed information. Since it relaxes data consistency, there are fewer guarantees of meaningful information. In addition, while the traditional relational databases could not handle the issues of the big data, expandable horizontal scalability, complex data format, sophisticated
  • 7. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 7 manageability, NoSQLdatabases employ a map-reduce computation task (Date, 2006). A Map-reduce is a programming database model that uses a parallel and distributed algorithm to process and generate large sets of data in databases on big clusters of servers and processors with Mappers and Reducers. Notice that the outcomes of the Mappers and the Reducers are stored as the materialized views in cached memory (Sadalage et al., 2012). 1. The NoSQL databases’ aggregate data model In contrast with a traditional relational database using the strict entity-relation model, the NoSQLdatabases use an aggregate data model contains aggregate data. The aggregate data is a complex structured record of the nested data. The aggregate data, called an aggregate by Evan (2004), is a collection of related objects treated as a unit of data. The aggregate data model is an aggregate oriented data model for a unique NoSQLsolution. It that consists of four model categories: key-value, document, column- family, and graph (Sadalage et al., 2012). The NoSQL database usually uses two primary aggregate data models: Key-value or the big hash table (e.g., Amazone S3, Voldemort, Scalaris) and schema-less (e.g., Cassandra, CouchDB, Neo4J). 2. Some Pros and Cons of the aggregate data models There are some pros and cons of these aggregate data models. In a key-value model, the Pros are: very fast, very scalable, a simple model, and able to distribute horizontally. The Cons are many data structures or objects cannot be easily modeled as key-value pairs. On the other hand, a schema-less model, the Pros are the schema-less
  • 8. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 8 data model is richer than key/value pairs, eventual consistency, many are distributed and it still provides excellent performance and scalability. Its Cons are there are no ACID transactions or joins. C. Schemalessness and Implicit Schema A central theme of NoSQLdatabases is that they are schemaless. Schemalessness has a big impact on changes of database’s structure. Users should exercise the control of storing data so that they can access both old and new data. 1. Main concept of the schemalessness in NoSQL database A NoSQLdatabase is ignorant of the schema (that is a defined structure such as a table, column, data type for storing data and its attributes). A NoSQL database cannot use the schema to store and retrieve data efficiently. It does not even apply its validation upon that data to ensure that different applications do not manipulate data in an inconsistent way. However, a schemaless NoSQL database provides freedom and flexibility on data storage (Moniruzzaman and Hossain, 2013). With the schemaless characteristic, NoSQL database allows users to store data casually. In advanced Internet- centric services in e-commerce in the digital market, the aggregate records contain correctly nonuniform data where its record has a different set of fields in a schemaless database. For example, a key-value store allows users to store any data they desire in the database. Users can efficiently store data and comfortably change data storage as they learn more about their project. They can also add new things as they discover them (Pankowski, 2002).
  • 9. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 9 2. Implicit schema in NoSQL database Since NoSQLdatabase is schema-free, to access aggregate records or nonuniform data, users are required to write a program such as scripts that mostly relies on some form of implicit schema. The implicit schema is a set of assumptions about the data’s structure in the code that manipulate the data. A schemaless database shifts a strict fixed schema into the application code that accesses data. That means users need to dig into application code to understand data and its associated information (Sadalage et al., 2012). If the application code is well structured, users are able to deduce the implicit schema for useful data and its related information. Otherwise, they may be stuck on data access. In other words, with implicit schema, users are required more programming skills but less design experience. 3. A primary problem of data access with the implicit schema Since application code in the schemaless NoSQL database contains the implicit schema, it becomes problematic if multiple applications, developed by different developers, access the same database. To reduce the problems, users can encapsulate all database interaction within a single application and integrate it with other applications using Web services. Another approach is to delineate different areas of an aggregate for access by various applications. D. Data Migration in NoSQL databases with implicit schema
  • 10. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 10 In general, data migration is a process to transfer data between storage types, formats, databases, computer systems. In system implementation, database integration, upgrade or consolidation, data migration is a key deliberation. It is usually achieved programmatically by automated migration.( datamigrationpro.com, 2009). 1. Data migration with implicit in NoSQL database In NoSQL database, the schemalessness provides freedom and flexibility in data migration within an aggregate record. During developing with NoSQL databases, designers, who do not think about schema, consider other aspects such as how keys are assigned and what is data structure inside a value object in key-value stores or types of relationships with graph databases. Even though there is no fixed schema, data is stored in memory with implicit schema that is defined and contained in application code. If the application code can not parse the data from its database, a schema mismatch or data inconsistency will occur (cisco.com). Notice that to access multiple aggregate records or change the aggregate boundaries, the data migration with implicit schema becomes complex as it is in the RDBMS. It is even more complex when users do not understand a set of assumptions about the data’s structure in the application code that manipulate the data in aggregate records. 2. Principles of the data migration in NoSQL databases with implicit schema Data migration process in NoSQL databases is similar to other data migration processes except some minor change in requirements from the implicit schema. The efficient data migration has some primary mapping phases that include data extraction,
  • 11. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 11 data loading, data verification with minimum of data loss and preserving consistencies. Data cleansing is commonly performed to improve data quality. In the principles (Katzoff , datamigrationpro.com, 2014) , data migration in NoSQLdatabases with implicit schema maybe consists of five phases (Design, extraction, cleansing, loading, and verification) for applications from moderate to high complexity to match the requirements of the implicit schema. Three phases of five phases are mentioned below because they are essential: - Data extraction: It is a process of retrieving data out of homogeneous or heterogeneous, unstructured data source for further data processing. - Data loading: It is a part of the ETL (extract, transform, load) process to load data into a final target database. - Data verification: It is a process to check different types of data for accuracy, inconsistencies after data migration is done. According to Katzoff (2014), for an efficient process, data migration strategy may have ten steps as shown below: a. Planning – Identify the baseline and legal original. b. Analysis and data discovery – Determine if metadata in the sources is sufficient for target document process. c. Tool selection - d. Master data management – Harmonize key-value pairs and workflow process. e. Tool configuration - f. Data cleansing g. Dry runs
  • 12. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 12 i. Formal testing j. Production execution k. Post production support After data migration is performed on NoSQL database, there are several options to minimize migration error by testing. Testing options for data migration in NoSQL database with implicit schema include a de facto approach data and content migration based on the sampling of some subset of random data selected and inspected. Some options are pre-migration testing, formal design review, post-integration testing, user acceptance testing, and production testing. 3. Example 1 - MongoDB’s data migration Data migration in NoSQL database such as MongoDB with implicit schema is an example to show that implicit schema changes do matter when there are a deployed applications and existing production data in a document data store with a data model : customer, order, and orderItems as shown below: MongoDB’s document data code is shown below: { “_id”: “31415926AB47E98374D” “customerid”: “CTU_online” “name”: “CS828-1501C-01 Inc” “since”: “01/04/2015” “order”: { “oderid”: “18319888”, “orderdate”:01/04/2015”, “orderItems”: [{“product”: “Database Course”, “price”: 2122.00}] } }
  • 13. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 13 Application code for implicit schema to write this document structure to MongoDB is: BasicDBObject orderItem = new BasicDBObject(); orderItem.put(“product”, productName); orderItem.put(“price”, price); orderItems.add(orderItem); Code to read the document back from the MongoDB database is: BasicDBObject item = (BasicDBObject) orderItem; String productName = item.getString(“product”); Double price = item.getDouble(“price”); Adding preferredShippingType is changing the objects does not require any change in database because the MongoDB does not care that different documents do not follow the same schema. All that needs ti be deployed is the applications only.The code has to ensure that documents that do not have the preferredShippingType attribute can be spared. If discountedPrice is introduced and price is renamed to fullPrice, a developer renames price attribute to fullPrice then adds discountedPrice attribute as below: { “_id”: “261003OPOELALKJDK” “customerid”: “CTU_offline” “name”: “RES860-1501C-01 Inc” “since”: “01/04/2015” “order”: { “oderid”: “18319888”, “orderdate”:03/21/2015”, “orderItems”: [{“product”: “Research Course”, “fullPrice”: 2214.00, “discountedPrice”: 2122.00}] } }
  • 14. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 14 Once the change is deployed, new customers and orders can be saved and read back properly. However, the price of the product for existing orders can not be read because now the code looks for fullPrice while the document has only price attribute. 4. Example 2 - Incremental migration (Source: Chapter 12: Schema Migration from “NoSQL distilled: a brief guide to the emerging world of polyglot persistence” by Sadalage & Fowler (2012)) Data migration with implicit schema has a risk of data loss, schema mismatch, attribute removal in new aggregate records. When the application changes its code, implicit schema is also changed. In consequence, new data may not have all attributes as the old data does. Before the implicit schema changes, developers can use incremental migration to ensure that the new code can still parse data. The document with price and fullPrice attributes from the example 1 is displayed: BasicDBObject item = (BasicDBObject) orderItem; String productName = item.getString(“product”); Double price = item.getDouble(“price”); If (fullPrice == null) { fullPrice = item.getDouble(“fullPrice”); } Double discountedPrice = item.getDouble(“discoutedPrice”); When writing the document back, the old attribute price is not saved: BasicDBObject orderItem = new BasicDBObject(); orderItem.put(“product”, productName); orderItem.put(“fullPrice”, price); orderItem.put(“discountedPrice”, discountedPrice); orderItems.add(orderItem);
  • 15. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 15 When using incremental migration, there could be many versions of the object that can translate the old schema to the new schema. While saving the object back, it is saved using the new object. This gradual migration of data helps the application evolve faster. Conclusion The short research paper discusses the concepts of NoSQL databases with adopting adopt the Aggregate Data Model that are supporting the application-oriented aggregates, embracing schema-less data, running on the cluster platform in distributed network, and often making the trade-off between the data consistency and other useful properties. It focuses on the associated concepts of NoSQL’s schemalessness and emphasizes data migration in NoSQL databases with implicit schema. The in-depth discussion, that also covers the general principles of conducting data migration, test strategy in NoSQL databases, consists of four main sections: (1) the concepts of NoSQL databases, (2) aggregate data models, (3) schemalessness and implicit schema, and (4) data migration in NoSQL database with implicit schema. A final note is whether the NoSQL databases are able to handle Big Data with the implicit schemas in data-driven era in the early 21th century?
  • 16. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 16 REFERENCES 1. Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly, 36(4), 1165-1188. 2. Connolly, T. M., & Begg, C. E. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management. New Jersey, NJ: Pearson 3. Date, C. J., 2006). The relational database dictionary: A comprehensive glossary of relational terms and concepts, with illustrative examples. "O'Reilly Media, Inc.". pp. 59–. ISBN 978-1-4493-9115-7. 4. Hargiss, K. (2015). Chat session 9 (Lecture) of NoSQLdatabase. Information retrieved from presentation slides. 5. McNurlin, B. C., Ralph H. Sprague, J., & Bui, T. (2009). Information Systems Management in Practice (Eighth Edition ed.). Upper Saddle River: Pearson Prentice Hall. 6. Moniruzzaman, A. B. M., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics-classification, characteristics and comparison.arXiv preprint arXiv:1307.0191.
  • 17. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 17 7. Pankowski, T. (2002). PathLog: a Query Language for Schemaless Databases of Partially Labeled Objects. Fundamenta Informaticae, 49(4), 369. 8. Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 9. http://www.datamigrationpro.com/data-migration-articles/2009/11/30/how-to- implement-an-effective-data-migration-testing-strateg.html. 10. http://en.wikipedia.org/wiki/Data_migration. 11. https://msdn.microsoft.com/en-us/library/ms174467.aspx. 12. http://www.cisco.com/c/en/us/td/docs/security/ise/1- 3/migration_guide/b_ise_MigrationGuide/b_ise_MigrationGuide12_chapter_011.html. 13. http://www.computerweekly.com/feature/An-ABC-guide-to-data-migration. 14. http://www.laserfiche.com/support/webhelp/Laserfiche/9.0/en- US/AdminGuide/Content/Basic_Principles_of_the_Migration_Proc. 15. http://www.webopedia.com/TERM/D/data_migration.html.
  • 18. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 18 APPENDIX CS828 Phase 5 Individual Project: Grade: A Score: 200 pt 3/16/2015 Current Grade Average: A (955/955) ThienSi... Congratulations on a well written paper used to discuss the general principles of conducting data migration in NoSQL databases. You clearly presented thoughts as how to ensure the data stored in the databases matched with the “Implicit Schema” embedded in the applications when the “Implicit Schema” has experienced a change....excellent work! Proficient: The submitted work exceeds the project criteria requirements. It demonstrates a comprehensive understanding of course material and meets the course objectives with proficiency. Dr. Kathleen Hargiss.
  • 19. Data Migration in Schemaless NoSQL Databases Version 1.0 Page 19