SlideShare une entreprise Scribd logo
1  sur  47
Automatic Data Migration into the Cloud
MS Thesis: Computer Science & Software
Engineering

By: Kushal Mehra
Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire

Concordia University
1
MOTIVATION
 Famous Applications (Facebook, Google Blogger,







Twitter.) depends upon NoSQL.
Some of advantages Of NoSQL Databases:
 High Scalability.
High reading and writing Performance.
Availability at low cost.
Suitable Applications.
Big Data
Geographical Data.
Concordia University
2
Agenda

•  Introduction
•  Review of Previous Studies
•  Proposed Model
• Experiment and Results
• Future work and Conclusion

Concordia University
3
Section 1: Introduction

Concordia University
4
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database
 Cloud database definition.
 Existing Problem in Relational Database.
•Scale up
•Scale out

Concordia University
6
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database

•Scale up

Concordia University
7
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database

 Scale out

Concordia University
8
INTRODUCTION AND PROBLEM

Data Migration
 Enterprises seek to migrate their massive relational

databases to the NoSQL databases.
 The process of transferring data between storage

types, formats, or computer systems is called data
migration

Concordia University
11
INTRODUCTION AND PROBLEM

Importance of Data Migration

One of survey estimated that the data migration market would reach $906
million by 2012
Concordia University
12
REVIEW OF PREVIOUS STUDIES

Previous Studies
 There are large number of works available for data

migration.
 Some of them are :
 Schema Conversion.
 ETL.
 Integrated Model

Concordia University
13
REVIEW OF PREVIOUS STUDIES

Previous Studies
 Thakar et al. and Chanchary et al. migrated a large

relational database to the cloud database (2010) .

 Calil et al. proposed a SimpleSQL, a relational layer

over Amazon SimpleDB (2012).

Concordia University
14
Limitations of Existing Work
 Existing Migration methods are not sufficient for

data migration:
 Lack Migration strategy.
Application Adaption.
Sharding.

Existing migrate data from the legacy system to

relational database.
Concordia University

15
Amazon SimpleDB
 SimpleDB is a web service which provides structured

data storage in the cloud.
 Multi Value Attribute.

Concordia University
16
Amazon SimpleDB
Relational Database

SimpleDB

Table

Domain

Row

Item

Column

Attribute

Value

Value(s)

Table1 : Relational database and SimpleDB
equivalence
Concordia University
17
Characteristics of NoSQL Databases
 No Normalization.
 No Joins.
 Schemaless.
 Data Type.

Concordia University
18
Characteristics of NoSQL Databases
 Some of the cloud database that have same data

Model and characteristics.
CLOUD DATABASE
Amazon SimpleDB

MongoDB
CouchDB
Oracle NoSql

Concordia University
19
Section 2: Proposed Model

Concordia University
21
PROPOSED MODEL

Data Migration Model

Relational-Cloud Mapping
22
PROPOSED MODEL

Migration Methods
 We Propose four Migration Methods.

• Type 1: complete relational database to one
domain.
• Type 2: multiple tables to one domain.
• Type 3: a table to one domain.
• Type 4: normalization to denormalization and
tables to domain.
 Each Method is independent of the other and is capable
of migrating entire relational database.
Concordia University
25
PROPOSED MODEL

Migration Methods

Concordia University
26
Mapping Strategies

Concordia University
27
PROPOSED MODEL

Mapping Strategy 1 (MS1)

28
PROPOSED MODEL

Mapping Strategy 2 (MS2)

Concordia University
30
PROPOSED MODEL

Mapping Strategy 3 (MS3)

Concordia University
32
PROPOSED MODEL

Type 1 Migration

 Uses Mapping Strategy 2 (Ms2).

 Migrate Entire relational database.
 Exists only a single domain in cloud database.
 Number of items = number of rows in the entire

relational database.
Concordia University
34
PROPOSED MODEL

Type 2 Migration

 Uses Mapping Strategy 1 (Ms1) and Mapping

Strategy 2 (Ms2).
 Migrate tables and their data to one domain.

 Migrate a table to one domain.

Concordia University
36
PROPOSED MODEL

Type 3 Migration

 Uses Mapping Strategy 1 (Ms1).

 Migrate a table to one domain in a cloud database.
 Implicit Conversion.

Concordia University
38
PROPOSED MODEL

Type 4 Migration

 Uses Mapping Strategy 1 (Ms1) and Mapping

Strategy 3 (Ms3).
 Migrates denormalized tables to one domain in a

cloud database.
 Migrate a single table and data to a one domain.
 Explicit Conversion of columns.
Concordia University
40
PROPOSED MODEL

Migration Method Usage
 Type 1 < 10 GB
 Type 2

 Data size is more than 10 GB and Joins to be

performed.
 Type 3
 Needs same semantics as of relational database
and database size is more than 10GB
 Type 4
 Denormalization.
 Data size is more than 10 GB and Joins to be
performed.
42
Sharding and Redundancy in Migration Methods
 Sharding: Sharding is the process of storing data

records across multiple domains.
 Type1 does not support sharding.
 Type2, Typ3, Type 4 Supports sharding.

 Redundancy: Data redundancy is the superfluity of

data.

Concordia University
43
EXPERIMENTS

Implementation Details

 Source System : can be Oracle, MySQL or Microsoft

SQL Server.
 Destination System: Our destination system is a

cloud database which supports key-value pairs.
 We use Microsoft .Net Framework 3.5, Microsoft IIS

7.0 and MicrosoftSQL Server 2008 R2.
 C# library of SimpleDB to perform all necessary

action for migrating the data.
44
EXPERIMENTS

Experiment

 Migrated the relational database to Amazon

Simpledb.
 A relational database of the “online

bookstore”application.
 The sample database consists of thirteen tables and

sample data

45
EXPERIMENTS

Type 1 Migration

Concordia University
46
EXPERIMENTS

Type 2 Migration

Concordia University
47
EXPERIMENTS

Type 3 Migration

48
EXPERIMENTS

Type 4 Migration

49
Application Adaptation

Code Generation

 We propose an interface which will

assist the developer to generate code
automatically.
 This includes the basic usage of:
 Select.
 Insert.
 Delete.
 Update queries.
Concordia University
52
EXPERIMENTS

Performance Analysis

 Perfomance Model

 Computation time.

 Storage Cost.

Concordia University
53
EXPERIMENTS

Average Computation Time

55
EXPERIMENTS

Storage Cost of 10GB
Amazon SimpleDB 2013

Concordia University
56
EXPERIMENTS

Storage Cost of 25GB
Amazon SimpleDB 2013

Concordia University
57
Comparison of Migration Methods
Migration Methods

Type 1

Type 2

Type 3

Type 4

<10GB









>10GB









Sharding









Joins

Limited to one
domain

Limited to
one domain

Cross domain

Limited to one
domain

Denormalzed Data









Storage cost

Nearly same of
Type 2, Type3

Nearly same
of Type 1,
Type3

Nearly same of
Type 2, Type3

Less than Type 1,
Type 2, Type3

Storage Space

Computation Time

Smallest

Larger than
Type1

Concordia University
58

Highest

Larger than
Type2
Limitations
 Stored Procedure.
 User Defined Functions.

 Triggers.

Concordia University
59
CONCLUSION AND FUTURE WORK

Conclusion and Future Direction
 This thesis proposes four diverse methods to migrate

relational databases to cloud databases.
 Each method is independent of the other.
 Successfully migrated relational database to the
NoSQL database.
 Proposes an Interface for code generation.

Concordia University
60
CONCLUSION AND FUTURE WORK

Future Direction
 Migration of :
 Stored procedure.

 Triggers.
 User-Defined Functions.

Concordia University
61
Publications
 K. Mehra, Y. Yan and D. Lemire. Automatic data

migration to the cloud. In the Sixth International
workshop on Cloud Data Management (CloudDB
2014), submitted.
 K. Mehra, Y. Yan and D. Lemire. Automatic data

migration into the cloud. IEEE Services 2014,
Manuscript.

62
63

Contenu connexe

Tendances

ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationSimon Roberts
 
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASESA PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASESijp2p
 
Formal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityFormal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch
 
Symmetry 13-00195-v2
Symmetry 13-00195-v2Symmetry 13-00195-v2
Symmetry 13-00195-v2AdamsOdanji
 

Tendances (6)

ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex information
 
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASESA PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
 
Formal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityFormal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data Interoperability
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Symmetry 13-00195-v2
Symmetry 13-00195-v2Symmetry 13-00195-v2
Symmetry 13-00195-v2
 

Similaire à Thesis presentation

9.Microservices+Data+Patterns (1).pdf
9.Microservices+Data+Patterns (1).pdf9.Microservices+Data+Patterns (1).pdf
9.Microservices+Data+Patterns (1).pdfPratikashBagh1
 
Data Base Design.pptx
Data Base Design.pptxData Base Design.pptx
Data Base Design.pptxSunilMS21
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...Feng Li
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
 
MongoDB.local Atlanta: MongoDB on Z
MongoDB.local Atlanta: MongoDB on ZMongoDB.local Atlanta: MongoDB on Z
MongoDB.local Atlanta: MongoDB on ZMongoDB
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Nenad Bozic
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsDan Sullivan, Ph.D.
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Dan Sullivan, Ph.D.
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 

Similaire à Thesis presentation (20)

9.Microservices+Data+Patterns (1).pdf
9.Microservices+Data+Patterns (1).pdf9.Microservices+Data+Patterns (1).pdf
9.Microservices+Data+Patterns (1).pdf
 
Data Base Design.pptx
Data Base Design.pptxData Base Design.pptx
Data Base Design.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
MongoDB.local Atlanta: MongoDB on Z
MongoDB.local Atlanta: MongoDB on ZMongoDB.local Atlanta: MongoDB on Z
MongoDB.local Atlanta: MongoDB on Z
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
disertation
disertationdisertation
disertation
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Database Lecture Notes
Database Lecture NotesDatabase Lecture Notes
Database Lecture Notes
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)
 
DBMS - Introduction.ppt
DBMS - Introduction.pptDBMS - Introduction.ppt
DBMS - Introduction.ppt
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 

Dernier

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Dernier (20)

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Thesis presentation

  • 1. Automatic Data Migration into the Cloud MS Thesis: Computer Science & Software Engineering By: Kushal Mehra Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire Concordia University 1
  • 2. MOTIVATION  Famous Applications (Facebook, Google Blogger,     Twitter.) depends upon NoSQL. Some of advantages Of NoSQL Databases:  High Scalability. High reading and writing Performance. Availability at low cost. Suitable Applications. Big Data Geographical Data. Concordia University 2
  • 3. Agenda •  Introduction •  Review of Previous Studies •  Proposed Model • Experiment and Results • Future work and Conclusion Concordia University 3
  • 5. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database  Cloud database definition.  Existing Problem in Relational Database. •Scale up •Scale out Concordia University 6
  • 6. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database •Scale up Concordia University 7
  • 7. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database  Scale out Concordia University 8
  • 8. INTRODUCTION AND PROBLEM Data Migration  Enterprises seek to migrate their massive relational databases to the NoSQL databases.  The process of transferring data between storage types, formats, or computer systems is called data migration Concordia University 11
  • 9. INTRODUCTION AND PROBLEM Importance of Data Migration One of survey estimated that the data migration market would reach $906 million by 2012 Concordia University 12
  • 10. REVIEW OF PREVIOUS STUDIES Previous Studies  There are large number of works available for data migration.  Some of them are :  Schema Conversion.  ETL.  Integrated Model Concordia University 13
  • 11. REVIEW OF PREVIOUS STUDIES Previous Studies  Thakar et al. and Chanchary et al. migrated a large relational database to the cloud database (2010) .  Calil et al. proposed a SimpleSQL, a relational layer over Amazon SimpleDB (2012). Concordia University 14
  • 12. Limitations of Existing Work  Existing Migration methods are not sufficient for data migration:  Lack Migration strategy. Application Adaption. Sharding. Existing migrate data from the legacy system to relational database. Concordia University 15
  • 13. Amazon SimpleDB  SimpleDB is a web service which provides structured data storage in the cloud.  Multi Value Attribute. Concordia University 16
  • 14. Amazon SimpleDB Relational Database SimpleDB Table Domain Row Item Column Attribute Value Value(s) Table1 : Relational database and SimpleDB equivalence Concordia University 17
  • 15. Characteristics of NoSQL Databases  No Normalization.  No Joins.  Schemaless.  Data Type. Concordia University 18
  • 16. Characteristics of NoSQL Databases  Some of the cloud database that have same data Model and characteristics. CLOUD DATABASE Amazon SimpleDB MongoDB CouchDB Oracle NoSql Concordia University 19
  • 17. Section 2: Proposed Model Concordia University 21
  • 18. PROPOSED MODEL Data Migration Model Relational-Cloud Mapping 22
  • 19. PROPOSED MODEL Migration Methods  We Propose four Migration Methods. • Type 1: complete relational database to one domain. • Type 2: multiple tables to one domain. • Type 3: a table to one domain. • Type 4: normalization to denormalization and tables to domain.  Each Method is independent of the other and is capable of migrating entire relational database. Concordia University 25
  • 23. PROPOSED MODEL Mapping Strategy 2 (MS2) Concordia University 30
  • 24. PROPOSED MODEL Mapping Strategy 3 (MS3) Concordia University 32
  • 25. PROPOSED MODEL Type 1 Migration  Uses Mapping Strategy 2 (Ms2).  Migrate Entire relational database.  Exists only a single domain in cloud database.  Number of items = number of rows in the entire relational database. Concordia University 34
  • 26. PROPOSED MODEL Type 2 Migration  Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 2 (Ms2).  Migrate tables and their data to one domain.  Migrate a table to one domain. Concordia University 36
  • 27. PROPOSED MODEL Type 3 Migration  Uses Mapping Strategy 1 (Ms1).  Migrate a table to one domain in a cloud database.  Implicit Conversion. Concordia University 38
  • 28. PROPOSED MODEL Type 4 Migration  Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 3 (Ms3).  Migrates denormalized tables to one domain in a cloud database.  Migrate a single table and data to a one domain.  Explicit Conversion of columns. Concordia University 40
  • 29. PROPOSED MODEL Migration Method Usage  Type 1 < 10 GB  Type 2  Data size is more than 10 GB and Joins to be performed.  Type 3  Needs same semantics as of relational database and database size is more than 10GB  Type 4  Denormalization.  Data size is more than 10 GB and Joins to be performed. 42
  • 30. Sharding and Redundancy in Migration Methods  Sharding: Sharding is the process of storing data records across multiple domains.  Type1 does not support sharding.  Type2, Typ3, Type 4 Supports sharding.  Redundancy: Data redundancy is the superfluity of data. Concordia University 43
  • 31. EXPERIMENTS Implementation Details  Source System : can be Oracle, MySQL or Microsoft SQL Server.  Destination System: Our destination system is a cloud database which supports key-value pairs.  We use Microsoft .Net Framework 3.5, Microsoft IIS 7.0 and MicrosoftSQL Server 2008 R2.  C# library of SimpleDB to perform all necessary action for migrating the data. 44
  • 32. EXPERIMENTS Experiment  Migrated the relational database to Amazon Simpledb.  A relational database of the “online bookstore”application.  The sample database consists of thirteen tables and sample data 45
  • 37. Application Adaptation Code Generation  We propose an interface which will assist the developer to generate code automatically.  This includes the basic usage of:  Select.  Insert.  Delete.  Update queries. Concordia University 52
  • 38. EXPERIMENTS Performance Analysis  Perfomance Model  Computation time.  Storage Cost. Concordia University 53
  • 40. EXPERIMENTS Storage Cost of 10GB Amazon SimpleDB 2013 Concordia University 56
  • 41. EXPERIMENTS Storage Cost of 25GB Amazon SimpleDB 2013 Concordia University 57
  • 42. Comparison of Migration Methods Migration Methods Type 1 Type 2 Type 3 Type 4 <10GB     >10GB     Sharding     Joins Limited to one domain Limited to one domain Cross domain Limited to one domain Denormalzed Data     Storage cost Nearly same of Type 2, Type3 Nearly same of Type 1, Type3 Nearly same of Type 2, Type3 Less than Type 1, Type 2, Type3 Storage Space Computation Time Smallest Larger than Type1 Concordia University 58 Highest Larger than Type2
  • 43. Limitations  Stored Procedure.  User Defined Functions.  Triggers. Concordia University 59
  • 44. CONCLUSION AND FUTURE WORK Conclusion and Future Direction  This thesis proposes four diverse methods to migrate relational databases to cloud databases.  Each method is independent of the other.  Successfully migrated relational database to the NoSQL database.  Proposes an Interface for code generation. Concordia University 60
  • 45. CONCLUSION AND FUTURE WORK Future Direction  Migration of :  Stored procedure.  Triggers.  User-Defined Functions. Concordia University 61
  • 46. Publications  K. Mehra, Y. Yan and D. Lemire. Automatic data migration to the cloud. In the Sixth International workshop on Cloud Data Management (CloudDB 2014), submitted.  K. Mehra, Y. Yan and D. Lemire. Automatic data migration into the cloud. IEEE Services 2014, Manuscript. 62
  • 47. 63