Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

Designing and Implementing a
cloud-hosted SaaS for data
movement and Sharing with
SlapOS
Authors: Walid Saad, Heithem Abbes, Mohamed Jemni
and Christophe Cerin
Journal: International Journal of Big Data Intelligence
Online Date: Thursday, July 24, 2014
By:- Arnob Saha (L20339084)
Hari Prasad Dhonju Shrestha (L20352046)
1

Outlines
• Abstracts
• Introduction
•Motivation and fundamental issues
• Related work
• SlapOS overview
• Design and Implementation issues
• Experimental results
• Conclusion and future works
• Acknowledgements
• Reference
2

Abstract
• Tools and framework developed to manage and handle the
big amount of data for the grid platform.
• Tools not adopted because of the complexity of the
installation and configuration processes.
• SlapOS (Simple Language for Accounting and Provisioning
Operating System) emerged
• Main aim -> to hide the complexity of IT infrastructures
Software deployment from users
• Paper propose a cloud-hosted data grid using the SlapOS
cloud
• Through a software as a service (SaaS) solution, users can
request and install automatically any data movement and
sharing tools like Stork and Bitdew without any intervention
of a system administrator
3

Introduction
• Many real world scientific and enterprise applications deal with a
huge amount of data. The emergence of data-intensive application
has prompted scientists around the world to enable data grids.
Examples bio-informatics, medical imaging, high energy physics,
coastal and environmental modelling and geospatial analysis.
• In order to process large data-sets, users need to access, process
and transfer large datasets stored in distributed repositories.
• Paper proposed a self-configurable desktop grids (DGs) platform on
demanda.
• The Simple Language for Accounting and Provisioning Operating
System (SlapOS) cloud presents a configurable environment in
terms of the OS and the software stack to manage without the
need of virtualisation techniques.
4

Introduction (contd…)
• We focus in this paper on a subset of the overall research about
interoperability between DGs and clouds namely data tools as
hosted software as a service (SaaS) frameworks.
• We present the design and the implementation of two Software
as a Service tools for data management. The first service
provides a mean for users to transfer data from their sites to the
computation or simulation sites. The second service will be used
to share data in widely distributed environment.
• The challenge is how to:
• imagine automatic data management tools that are able to mask the
installation and configuration difficulties of data management software
• deliver data management functionality as hosted services via web user
interfaces.
5

Motivations and fundamental issues
• e-Science applications require efficient data management and
transfer software in wide-area, distributed computing environment.
• To achieve data management on demand, the users need a resilient
service and move data transparently
• No IT knowledge required, no software
download/installation/configuration steps.
• Implementations based on:
• Stork data scheduler: Manage data movement over wide area network,
using intermediate data grid storage and different protocols
• Bitdew: make data accessible & shared from other resources including end-user
desktops and servers
• SlapOS: with only a ‘one-click’ process instantiate, configure data
managers(stork+ Bitdew) and deploy them over the internet
7

Related Works
• To manage the low-level data handling issues on grid systems
• High-level tools for co-scheduling of data and computation in grid
environments.
• Research in data management using SaaS-based services.
• Data management and transfer in grid environment
• GridFTP is the most widely used tool through parallel streams.
• Representative examples of storage systems includes SRMs, SRB,
IBP and NeST
• FreeLoader framework is designed to aggregate space and I/O
bandwidth contributions from volatile desktop storage
• Farsite builds a secure file system using entrusted desktop
computers
• Chirp is a user-level file sytem for collaboration across distributed
system like cluster, clouds and grids.
8

Related Works (condt...)
• Bitdew is an open source data management for grid, DG and cloud
computing.
Higher level tools for data scheduling
• Stork: a schedular for data placement activities in a grid env
• Using stroke input data will be queued, scheduled, monitored,
managed and even check-pointed.
• Stork provides solutions for data placement problems both in the
grid and DG environment since it can interact with different data
transfer protocol such as FTP, GridFTP, HTTP and DiskRouter.
Data orchestration through SaaS technologies
• Globus Online (GO) is a project that delivers data management
functionalities not as downloadable software but as hosted SaaS.
• Allows users to move, synchronize and share their data using a web
browser.
9

SlapOS overview
• An open source distributed operating system
• Provides an environment for automating the deployment of
applications
• Based on the idea that ‘everything is process’, SlapOS combines
grid computing, in particular the concepts inherited from
BonjourGrid and the techniques inherited from the field of ERP in
other to manage, through the SlapGrid daemon, IaaS, PaaS and
SaaS cloud services.
• The SlapOS strengths are the compatibility with any operating
system, in particular GNU Linux, all software technologies and
support for several infrastructure
• More than 500 different recipes are available for consumer
application such as Linux Apache MySQL PHP
10

SlapOS key concepts
• SlapOS architecture is composed of two types of
components: SlapOS master and SlapOS node
• SlapOS master: it acts as centralized directory for all SlapOS nodes
and it knows the location where software are located and all
software that are installed.
• SlapOS node: it can be dedicated or volunteer node. The master’s
role is to install applications and run processes on SlapOS nodes.
• In comparision with the traditional clouds,SlapOS is based on
an opportunistic view.
• In its normal utilisation, the requests are serviced by the data
center nodes. Whenever the number of requests reach a
peak, SlapOS can redirect some of them on volunteer node.
11

SlapOS key concepts
• Doing so, the system can win on two points,
• It maintains a good response time in the request treatment
• In the case of increase in the number of cloud’s customers, there is a
good alternative for guaranteeing the SLAs without buying new
machines
• SlapOS node consists essentially of a basic Linux distribution, a
daemon named SlapGrid, a Buildout environment for bootstrapping
applications and supervisord to control processes.
• Node can receive a request to install software form master, receive
request asking the master to deloy an instance of software
• SlapOS software on a node is called a ‘Software Release’ and it
consists of all the binaries to run the software.
• ‘Software Instance’ -> multiple instances of the corresponding s/w
12

How to join SlapOS?
• SlapOS is a voluntary cloud, which mean that each person can potentially
add its own server into the cloud.
• To participate to a BOINC and/or Condor project, one has to:
• Register on a SlapOS master
• Install SlapOS node on the node.
• Add a virtual server on the master and link it to the physical server by
configuring the node installed on the physical server.
• Select and install application, from the list of available application on the
master, that will be allowed to be deployed on the node.
• The number of instances that can be run on the node depends on the
capacity and the configuration of SlapOS on the server.
• To make application available on the SlapOS master, it is necessary to
integrate them to SlapOS.
• The integration of application to SlapOS goes through the writing of
Buildout profiles, consisting mainly of the file software.cfg which will then
make reference to all other reqired files.
13

Design and Implementation Issues
• Implementation steps:
• SlapOS uses Buildout technologies to install software and deploy instances.
• In the Stork case, software is divided in three profiles
1. Component (slapos/component/stork/buidlout.cfg): we find here all the
dependencies used by by Stork. Buildout will allow us to integrate the profile
and dependencies using the rules extends in order to install mainly the Globus
Client, Globus GSI grid security infrastructure.
2. Software Release profile(SR): located on a remote git server and defined by its
URL ( http://git-repository/slapos/software/stork/software.cfg ) . SR describe
the installation of Stork and its dependencies without configuration files and
disk image creation. When SlapOS installs a Stork SR, it launchesa Buildout
command with the correct URL
3. Software Instance: It will reuse an installed Software Released by
creating wrappers, configuration files and anything specific to an
instance. The whole process creates a stork configuration file.
14

Design and Implementation Issues (contd..)
• Architecture overview: SlapOS is based on a master-slave
paradigm. All steps that allow user to participate in SlapOS
community and exploit Stork services are as follow:
1. Slapos-connect(Login, Password)
2. Request-stork-software(Slave_Node_Name, Software_Release_Name)
3. Download-stork-software(Stork_Software_Release_URL)
4. Request-instance-parameter(Slap-Parameters_List)
5. Deploy-instance(Slap_Parameter_List)
6. Submit-data-job(submit_dap_file, stork_server)
7. Move-data(src_data_url, dest_data_url)
15

Architecture overview
• Figure 2 Schematic of the Stork SaaS via SlapOS cloud
16

Security and authentication process
• Security in Stork is an important issue with many aspects to
consider. The most important is the way in which user want to run
Stork daemons. Current Stork releases fall into three main
schema:
1. single host: Stork_Server and Stork_Client are running in the same
machine.
2. Multiple hosts: Stork_Serve in one location and Stork_Client in another
one.
3. Multiple host and third party transfer: Stork_Server manage
movement of data among two or more remote locations.
• Many authentication mechanisms are available like SSL, Kerberso,
PASSWORD and GSI.
• Stork_Server provides only GSI authentication to allow different
client machines to connect to it.
17

Security configuration
• Users can easily run 100+ Stork instances on a ‘small cluster’, each
of them with its own independent daemons and configuration.
• Sercurity setting depends on the manner in which the users want to
deploy their Stork instances.
1. Running Stork in the SlapOS cloud: After installation of the
SlapOS slave node, the user requests one instance which includes
two Stork components(server and client tools), both will use the
same configuration file.
2. Submitting jobs to an external Stork server: An important
property of our approach is the ability to handle transfers using
existing Stork_Server.
3. Remote GridFTP transfer: to use GSIFTP transfers with Stork, the
users need to specify a valid grid proxy and a user crendentials in
place.
18

Data sharing via SaaS
• Once data are placed on SlapOS, a second SaaS based on Bitdew is
automatically launched to published and to distribute data over
SlapOS community.
• BitDew is a programmable framework for large-scale data
management and distribution for DG systems.
• Bitdew offers two sets of nodes: server (service host) and
client(consumer)
• To share data with Bitdew, end-users need to connect to SlapOS,
request Bitdew software and specify information for instances
deployment.
• Cloud hosted approach divides the world in three sets of nodes:
• Could-middleware node(SlapOS master), cloud-provider node (SlapOS
slave node), SaaS instances (Bitdew server and client)
20

Data sharing via SaaS
• SlapOS user must invoke the following steps:
• Request-instance-parameters: for client instances, slap-parametes are
classified into two steps
a. Bitdew_Server: the user sets information about the remote server
hostname
b. data information’s parameters: the user must specify the protocol
used to get remote data and the signature of the file.
• Deploy-instance
• Share-data(transfer_protocal, data-path, properties.json)
• Get-data(transfer_protocol, file_md5_ID)
• Bitdew buildout profiles
• The integration of Bitdew into SlapOS needs writing multiple Buildout
profiles. Buildout profiles are divided in three types(component, software
Release, Software Instances), organized into several directories.
21

Experimental results
• Experiment have performed on the experimental grid computing
infrastructure Grid 5000. Experiments were conducted in four
cluster of Lyon site using more than 50 machines. Set two Debian
Linux Distribution images of SlapOS.
• Deployment steps of SlapOS on Grid ‘5000
• SlapOS is designed to work natively with IPV6. Several restrictions are
applied to limit access to and from outside the Grid ‘5000 infrastructure. To
overcome restrictions, we prepared pre-compiled images containing all the
standard install files of SlapOS: the kernel and runtime daemons. These
images are also configured to run IPv6 at startup. Slapos-vifib image is
implemented and slapos-image is used.
• Usage scenario
• To show the capacity of our cloud-hosted model to build a scalable
platform for the purpose to manage bag-of-tasks applications with
intensive data.
23

• Two type of metrics:
• The scalability in terms of how-many-instances-requests-are-supported. If
the master is overloaded, the time needed to respond to a request
instance may increase.
• Measure the time required to create Stork and Bitdew instances as a
function of the number of SlapOS nodes.
• In our experiments, we use blastn program to search respectively Human
DNA sequences in DNA databases. To run BLAST jobs we need the BLAST
application package, the DNA Genebase which contains millions of
sequences is a compressed large archive, the DNA Sequence to compare
with sequences in Genebase.
• The recommended scenarios to be used in our experiments is shown in
Algorithm. At the end of computation, each job will create a result file
containing all matched sequences.
24

• Experimentation stepss:
26

• Result Analysis
• Data movement service completion time
• All instances are launched simultaneously and completed successfully, the
total completion time includes times to:
• Register SlapOS node to the master:
• Deploy of Stork instances:
• Transfer BLAST files from NCBI FTP server to SlapOS nodes:
• The completion time of instances is proportional to
• the number of nodes connected to the master
• the number of instances required simultaneously.
• Data sharing service completion time
• Deploy of server instances
• Deployment of client instances
• BLAST execution
27

• This figure illustrates the total completion time for two Stork
instances using 50 SlapOS nodes (a total of 100 instances). All
instances are launched simultaneously and completed successfully
28

Conclusion and future work
• The emergence of data-intensive applications and cloud SaaS
technologies brought the flexibility to introduce new data
management handling mechanism that help the basic scientist and
the grid users to deploy easily their distributed platform.
• This works focuses on data management as SaaS-based solutions
for the purpose to mask the complexity of the installation and
configuration processes and the IT infrastructure requirements.
• Since SaaS solutions is already in production into the SlapOS cloud
at Paris 13 University, our future research is focused more on self-configuration,
scalability and security transfer.
29

Acknowledgements
• In France, this work is funded by the FUI-12 Resilience project from
the ministry of industry. Experiments presented in this paper were
partly carried out using the Grid ‘5000 testbed, supported by a
scientific interest group hosted by Inria and including CNRS,
RENATER and several Universities as well as other organisations
(see https://www.grid5000.fr). Some experiments were carried out
on the SlapOS cloud available at University of Paris 13 (see
https://slapos.cloud.univ-paris13.fr).
30

References
• http://pypi.python.org/pypi/slapos.cookbook/
• Abbes, H., Cerin, C. and Jemni, M. (2008) ‘Bonjourgrid as
adescentralized scheduler’, IEEE APSCC, December.
• Foster, I. (2011) ‘Globus online: accelerating and democratizing
science through cloud-based services’, IEEE Internet Computing, Vol.
15, No. 3, pp.70–73.
Thank YOU!!!
31

Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

Similaire à Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS (20)

Dernier

Dernier (20)

Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS