Influencing policy (training slides from Fast Track Impact)
Thesis presentation
1. Automatic Data Migration into the Cloud
MS Thesis: Computer Science & Software
Engineering
By: Kushal Mehra
Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire
Concordia University
1
2. MOTIVATION
Famous Applications (Facebook, Google Blogger,
Twitter.) depends upon NoSQL.
Some of advantages Of NoSQL Databases:
High Scalability.
High reading and writing Performance.
Availability at low cost.
Suitable Applications.
Big Data
Geographical Data.
Concordia University
2
3. Agenda
• Introduction
• Review of Previous Studies
• Proposed Model
• Experiment and Results
• Future work and Conclusion
Concordia University
3
5. INTRODUCTION AND PROBLEM
Relational Database Vs. NoSQL Database
Cloud database definition.
Existing Problem in Relational Database.
•Scale up
•Scale out
Concordia University
6
8. INTRODUCTION AND PROBLEM
Data Migration
Enterprises seek to migrate their massive relational
databases to the NoSQL databases.
The process of transferring data between storage
types, formats, or computer systems is called data
migration
Concordia University
11
9. INTRODUCTION AND PROBLEM
Importance of Data Migration
One of survey estimated that the data migration market would reach $906
million by 2012
Concordia University
12
10. REVIEW OF PREVIOUS STUDIES
Previous Studies
There are large number of works available for data
migration.
Some of them are :
Schema Conversion.
ETL.
Integrated Model
Concordia University
13
11. REVIEW OF PREVIOUS STUDIES
Previous Studies
Thakar et al. and Chanchary et al. migrated a large
relational database to the cloud database (2010) .
Calil et al. proposed a SimpleSQL, a relational layer
over Amazon SimpleDB (2012).
Concordia University
14
12. Limitations of Existing Work
Existing Migration methods are not sufficient for
data migration:
Lack Migration strategy.
Application Adaption.
Sharding.
Existing migrate data from the legacy system to
relational database.
Concordia University
15
13. Amazon SimpleDB
SimpleDB is a web service which provides structured
data storage in the cloud.
Multi Value Attribute.
Concordia University
16
15. Characteristics of NoSQL Databases
No Normalization.
No Joins.
Schemaless.
Data Type.
Concordia University
18
16. Characteristics of NoSQL Databases
Some of the cloud database that have same data
Model and characteristics.
CLOUD DATABASE
Amazon SimpleDB
MongoDB
CouchDB
Oracle NoSql
Concordia University
19
19. PROPOSED MODEL
Migration Methods
We Propose four Migration Methods.
• Type 1: complete relational database to one
domain.
• Type 2: multiple tables to one domain.
• Type 3: a table to one domain.
• Type 4: normalization to denormalization and
tables to domain.
Each Method is independent of the other and is capable
of migrating entire relational database.
Concordia University
25
25. PROPOSED MODEL
Type 1 Migration
Uses Mapping Strategy 2 (Ms2).
Migrate Entire relational database.
Exists only a single domain in cloud database.
Number of items = number of rows in the entire
relational database.
Concordia University
34
26. PROPOSED MODEL
Type 2 Migration
Uses Mapping Strategy 1 (Ms1) and Mapping
Strategy 2 (Ms2).
Migrate tables and their data to one domain.
Migrate a table to one domain.
Concordia University
36
27. PROPOSED MODEL
Type 3 Migration
Uses Mapping Strategy 1 (Ms1).
Migrate a table to one domain in a cloud database.
Implicit Conversion.
Concordia University
38
28. PROPOSED MODEL
Type 4 Migration
Uses Mapping Strategy 1 (Ms1) and Mapping
Strategy 3 (Ms3).
Migrates denormalized tables to one domain in a
cloud database.
Migrate a single table and data to a one domain.
Explicit Conversion of columns.
Concordia University
40
29. PROPOSED MODEL
Migration Method Usage
Type 1 < 10 GB
Type 2
Data size is more than 10 GB and Joins to be
performed.
Type 3
Needs same semantics as of relational database
and database size is more than 10GB
Type 4
Denormalization.
Data size is more than 10 GB and Joins to be
performed.
42
30. Sharding and Redundancy in Migration Methods
Sharding: Sharding is the process of storing data
records across multiple domains.
Type1 does not support sharding.
Type2, Typ3, Type 4 Supports sharding.
Redundancy: Data redundancy is the superfluity of
data.
Concordia University
43
31. EXPERIMENTS
Implementation Details
Source System : can be Oracle, MySQL or Microsoft
SQL Server.
Destination System: Our destination system is a
cloud database which supports key-value pairs.
We use Microsoft .Net Framework 3.5, Microsoft IIS
7.0 and MicrosoftSQL Server 2008 R2.
C# library of SimpleDB to perform all necessary
action for migrating the data.
44
32. EXPERIMENTS
Experiment
Migrated the relational database to Amazon
Simpledb.
A relational database of the “online
bookstore”application.
The sample database consists of thirteen tables and
sample data
45
37. Application Adaptation
Code Generation
We propose an interface which will
assist the developer to generate code
automatically.
This includes the basic usage of:
Select.
Insert.
Delete.
Update queries.
Concordia University
52
42. Comparison of Migration Methods
Migration Methods
Type 1
Type 2
Type 3
Type 4
<10GB
>10GB
Sharding
Joins
Limited to one
domain
Limited to
one domain
Cross domain
Limited to one
domain
Denormalzed Data
Storage cost
Nearly same of
Type 2, Type3
Nearly same
of Type 1,
Type3
Nearly same of
Type 2, Type3
Less than Type 1,
Type 2, Type3
Storage Space
Computation Time
Smallest
Larger than
Type1
Concordia University
58
Highest
Larger than
Type2
44. CONCLUSION AND FUTURE WORK
Conclusion and Future Direction
This thesis proposes four diverse methods to migrate
relational databases to cloud databases.
Each method is independent of the other.
Successfully migrated relational database to the
NoSQL database.
Proposes an Interface for code generation.
Concordia University
60
45. CONCLUSION AND FUTURE WORK
Future Direction
Migration of :
Stored procedure.
Triggers.
User-Defined Functions.
Concordia University
61
46. Publications
K. Mehra, Y. Yan and D. Lemire. Automatic data
migration to the cloud. In the Sixth International
workshop on Cloud Data Management (CloudDB
2014), submitted.
K. Mehra, Y. Yan and D. Lemire. Automatic data
migration into the cloud. IEEE Services 2014,
Manuscript.
62