1. Data-Intensive
Research Workshop
Soaring through clouds with Meandre
Xavier Llorà and Bernie Ács
xllora@illinois.edu
bernie@ncsa.illinois.edu
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
3. An Ideological Metaphor & Definition
• Cloud Metaphor
• The term cloud is used as a metaphor for
the Internet, based on how it is depicted in
computer network diagrams and is an
abstraction for the complex infrastructure
it conceals
• Cloud Computing – Definition
• The first academic use of this term appears to define it as a computing
paradigm where the boundaries of computing will be determined
by economic rationale rather than technical limits.
• Cloud computing is a paradigm of computing in which dynamically
scalable and often virtualized resources are provided as a service over
the Internet. Users need not have knowledge of, expertise in, or control
over the technology infrastructure in the "cloud" that supports them
http://en.wikipedia.org/wiki/Cloud_computing
Imaginations unbound
10. Cloud Classification Types
• Public cloud or external cloud describes cloud
computing in the traditional mainstream sense, whereby
resources are dynamically provisioned on a fine-grained,
self-service basis over the Internet, via web applications/
web services, from an off-site third-party provider who
shares resources and bills on a fine-grained
utility computing basis
• Private cloud and internal cloud is a neologism that
describe configurations that emulate (public) cloud
computing on private networks
• Hybrid cloud consists of multiple internal and/or
external cloud deployments
http://en.wikipedia.org/wiki/Cloud_Computing
Imaginations unbound
11. Cloud Computing Models
• Infrastructure as a Service (IaaS)
• the delivery of computer infrastructure (typically a
platform virtualization environment) as a service
• Rather than purchasing servers, software, data center space
or network equipment, clients instead buy those resources as
a fully outsourced service.
• The service is typically billed on a utility computing basis and
amount of resources consumed (and therefore the cost) will
typically reflect the level of activity.
• Supersedes term Hardware as a Service (HaaS)
• It is an evolution of web hosting and virtual private server
offerings.
• Example: Amazon EC2/S3 services
http://en.wikipedia.org/wiki/Infrastructure_as_a_service
Imaginations unbound
12. Cloud Computing Models
• Platform as a Service (PaaS)
• delivery of a computing platform and solution stack as a service
• It facilitates deployment of applications without the cost and
complexity of buying and managing the underlying hardware
and software layers, providing all of the facilities required to
support the complete life cycle of building and delivering
web applications and services entirely available from the
Internet —with no software downloads or installation for
developers, IT managers or end-users
• Open Platform as a Service (OPaaS)
• another step in the Application Service Provider, SaaS, PaaS
evolution
• Example: Microsoft TechNet VLabs
http://en.wikipedia.org/wiki/Platform_as_a_service
Imaginations unbound
13. Cloud Computing Models
• Software as a Service (SaaS)
• is a model of software deployment whereby a provider licenses
an application to customers for use as a service on demand
• vendors may host the application on their own web servers or
download the application to the consumer device, disabling it
after use or after the on-demand contract expires
• Examples:
• Google Apps (Maps, Docs, and Others)
• Adobe (Connect & Buzzword)
• Microsoft (Workspace office live)
http://en.wikipedia.org/wiki/Platform_as_a_service
Imaginations unbound
15. NCSA Uses Virtual Machine Technologies
• Virtual machine technology to support projects &
services using VMware, XenServer, & Others
• An Example Case: ICLCS & WebMO
• Institute for Chemistry Literacy Through Computational Science
(http://Iclcs.uiuc.edu/workshops & http://www.webmo.net/)
Shared Network
File System
Passive LB Node
Centralize
Active LB Node Relational
Database
Internet Users Worker Worker
Internet Users Worker
Node
Worker
Node
Internet Users Node Worker Node
Internet Users
Internet Users Node
Imaginations unbound
16. NCSA Enterprise Cloud
• Virtual Machine Infrastructure Expansion
• Dedicated Resources
• 176 Cores/18 Machines with 50TB Storage and 40Gb IB
• Dedicated Switches, Network services for VM & Cloud.
• Eucalyptus installation base
• “Amazon at home”
• EC2/S3/EBS
• Potential future support for
• dynamic load-balanced services & load-based procurement
• High degree of variability possible in configurations
• Account based virtual private enterprise
• Elastic IP, Elastic Block Storage, & Elastic Computing
• Empowers users versus Constrains users
• Cloud mechanics require a steep learning curve
Imaginations unbound
17. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
List, Launch, & Manage Images
Imaginations unbound
18. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Enterprise Security Rules
Imaginations unbound
19. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
SSH Key-Pair Management
Imaginations unbound
20. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Allocate, Assign, & Associate Elastic IP
Imaginations unbound
21. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Allocate, Assign, &
Associate
Elastic Block Storage
Imaginations unbound
22. NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• AWS Manager
• Statically deployed Web-Application
Imaginations unbound
23. NCSA Enterprise Cloud Conduits
• Private Cloud to Grid Conduit
• Dynamically Scalable Web Front-end & Middleware Layers
• Next Generation WebMO “Science Gateway”
• Batch Queue Proxy Integration, Metering, and Monitoring
• Private Cloud to Private Cloud Conduit
• Exploring Transparent Integration with Remote Sites
• UIUC Computer Science Hadoop Cluster
• Dynamic Integration with other Eucalyptus Site
• Private Cloud to Public Cloud Conduit
• Exploring Transparent Integration with Amazon EC2 Service
• Roles of Virtual Private Network Services
• Dynamic Scalability and Data Localities
Imaginations unbound
24. Part 2: Cloud Programming Paradigm
• How are Software Architecture and Design Impacted by
Virtual Machines & Cloud technologies?
• Natural Match for Multi-tier applications
• To best leverage cloud technology applications need to be more
modular and less monolithic
• Service orientated architecture can benefit from JeOS (Just
Enough Operating System) platforms and
• Can be easily configured to dynamically scale
• Meandre: Overview & Introduction
• Agile Infrastructure for Data Intensive Applications
• Semantic Orientated Component Based Architecture
• Data Driven Execution Paradigm
• SEASR Application Examples
Imaginations unbound
25. MONK Project – GSLIS
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
26. Feature Lens Blow up
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
27. Date Entities to Simile Timeline
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
28. Analyzing CSPAN Archives
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
29. NEMA – Son of Blinkie - GSLIS
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
30. NESTER – GSLIS
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
34. Evolution Highway – IGB
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
35. Fedora Commons Repository
Components & Flows
Interactive Web
Application
Web Service
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
36. Twitter For Research
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
38. Data-intensive Computing for the Cloud
• Meandre
• Integrates within Existing Applications
• May be a Free Standing Service
• Capitalize on elasticity
• Provide complex data computing as a service
• Collocating computation and data
• Natively access data in the cloud
• Hadoop Distributed File System (HDFS)
• Document stores
• KeyValue stores
• Relational stores
39. Meandre: The Dataflow Component
• Data dictates component execution semantics
Inputs Outputs
Component
P
Descriptor in RDF" The component "
of its behavior
implementation
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
40. Meandre: Flow (Complex Tasks)
• A flow is a collection of connected components
Read
P Merge
P
Get Show
P
P
Do
P
Dataflow execution
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
41. Meandre Connectors
Flows are made up of “One or More” components
with “None to Many” connectors that are described Flows may contain connectors that
to the Mendre Server for management are cyclical over one or more
components
Flows must contain at minimum one
component with NO Inputs to cause
an Execute call to be made.
*Outputs are Always Optional.
Flow components may have
multiple connectors assigned
to any input data port
Flows can have any number of components with
“None to Many” Inputs data port s
“None to Many” Output data ports
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
42. Meandre: ZigZag Script Language
• Automatic Parallelization
• Adding the operator [+4] would result in a directed grap
# Describes the data-intensive flow # Describes the data-intensive flow
# #
@pu = push() @pu = push()
@pt = pass( string:pu.string ) [+4] @pt = pass( string:pu.string ) [+4!]
print( object:pt.string ) print( object:pt.string )
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
43. Scaling Genetic Algorithms with Meandre
Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
Imaginations unbound
44. And Beyond with Hadoop
60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet
Resources exhaustion
Imaginations unbound
45. Are Components Black-Box Wrappers?
• Programming Components is multilingual
• Natively support: Java, Scala, Python, and Clojure
• Easily Wrap: R, C, and C++
• Components can also interact with the OS
• Leverage OS tools
• Orchestrate other programs
• The question:
• Can Meandre help orchestrate and facilitate interaction and
cooperation between cloud and grid assets?
47. Cloud Conduits to the Grid
• Cloud mechanics have a steep learning curve
• Can Meandre help simplify the process?
• Orchestrating clouds with Meandre
• Amazon/Eucalyptus model
• Components can be created to:
• List images
• List instances
• Launch instances
• Allocate Elastic IP and Elastic Block Storage
• Transfer Data or Programs to running instances
• Trigger process computation
• Monitor processes and/or executing persistent services
• Terminate instances
49. Conclusions
• Next generation data-intensive applications will:
• Use cloud computing technologies and conduits
• Require adaptation of programming paradigms
• Leverage a flexible architecture and a modular
• Promote processing and resources at scale.
• Meandre
• Data-intensive execution engine
• Component-based programming architecture
• Distributed data flow designs to allow processing to be co-
located with data sources and enable transparent scalability
• Orchestrate cloud deployments
• Leverage cloud conduits
Imaginations unbound