Presentation of Meandre: Semantic-Driven Data-Intensive Flows in the Clouds at eScience 2008 by Bernie Acs
Data-intensive flow computing allows efficient processing of large volumes of data otherwise unapproachable. This paper introduces a new semantic-driven data-intensive flow infrastructure which: (1) provides a robust and transparent scalable solution from a laptop to large-scale clusters, (2) creates an unified solution for batch and interactive tasks in high-performance computing environments, and (3) encourages reusing and sharing components. Banking on virtualization and cloud computing techniques, the Meandre infrastructure is able to create and dispose Meandre clusters on demand, being transparent to the final user. This paper also presents a prototype of such clustered infrastructure and some results obtained using it.
Powerpoint exploring the locations used in television show Time Clash
SEASR eScience 2008
1. SEASR:
Meandre: !
Semantic-Driven Data-Intensive !
Flows in the Clouds
Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg
National Center for Supercomputing Applications!
University of Illinois at Urbana-Champaign
{xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
2. SEASR: The Project
SEASR: Software Environment for the!
Advancement of Scholarly Research
• Funded by the Andrew W. Mellon Foundation to answer the humanities community’s call for a
research and development environment capable of powering leading edge digital humanities
initiatives.
• Fosters collaboration through empowering scholars to share data and research processes with
an infrastructure and framework designed to support reusable, repeatable, and scalable services
and processes.
• Designed to enable developers to rapidly design, build, and share software applications that
support research and collaboration using modular components that can be assembled to create
reusable data-flows.
• Project web site: http://seasr.org
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
3. SEASR: The High-Altitude Picture
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
9. SEASR: A Quick Overview
• Addresses:
– Challenges of transforming information into knowledge
– Constructs software bridges to migrate unstructured and semi-
structured data into structured data and/or metadata to enable
analysis and accessibility.
• Aims:
– Make digital collections more useful and flexible
– Provide access to analytic processes and visualizations
– Enable easy mash-up with other web-based services (SOA)
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
10. SEASR: Knowledge Discovery…
Predictable process
The Process
• Selection
• Preparation
• Transform
• Processing
• Interpret
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
11. SEASR: Knowledge Discovery…
Predictable process across domains.
Domains
• Literature
• History
• Music
• Art
• Science
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
12. SEASR: Knowledge Discovery…
Predictable process across domains and digital collections.
Collection Types
• Text
• Multimedia
• Data
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
13. SEASR: Design Goals
• Transparency
– From a single laptop to a HPC cluster
– Not bound to a particular computation fabric
– Allow heterogeneous development
• Intuitive programming paradigm
– Modular Components, Flows, and Reusable
– Foster Collaboration and Sharing
• Open Source
• Service Orientated Architecture (SOA)
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
14. Meandre: Infrastructure
• SEASR/Meandre Infrastructure:
– Dataflow execution paradigm
– Semantic-web driven
– Web Oriented
– Supports publishing services
– Modular components
– Encapsulation and execution mechanism
– Promotes reuse, sharing, and collaboration
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
15. Meandre: Data Driven Execution
• Execution Paradigms
– Conventional programs perform computational tasks by
executing a sequence of instructions.
– Data driven execution revolves around the idea of
applying transformation operations to a flow or stream
of data when it is available.
• Dataflow Approach
– May have zero to many inputs
– May have zero to many outputs
– Performs a logical operation when data is available
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
16. Meandre: Dataflow Example
• Dataflow Addition Example
– Logical Operation ‘+’
Value1
– Requires two inputs
Sum
– Produces one output
Value2
• When two inputs are available
– Logical operation can be preformed
– Sum is output
• When output is produced
– Reset internal values
– Wait for two new input values to become available
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
17. Meandre: The Dataflow Component
• Data dictates component execution semantics
Inputs Outputs
Component
P
Descriptor in RDF! The component !
of its behavior
implementation
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
18. Meandre: Component Metadata
• Describes a component
• Separates:
– Components semantics (black box)
– Components implementation
• Provides a unified framework:
– Basic building blocks or units (components)
– Complex tasks (flows)
– Standardized metadata
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
19. Meandre: Semantic Web Concepts
• Relies on the usage of the resource description framework
(RDF) which uses simple notation to express graph relations
written usually as XML to provide a set of conventions and
common means to exchange information
• Provides a common framework to share and reuse data
across application, enterprise, and community boundaries
• Focuses on common formats for integration and combination
of data drawn from diverse sources
• Pays special attention to the language used for recording how
the data relates to real world objects
• Allows navigation to sets of data resources that are
semantically connected.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
20. Meandre: Metadata Ontologies
• Meandre's metadata relies on three ontologies:
– The RDF ontology serves as a base for defining
Meandre descriptors
– The Dublin Core Elements ontology provides basic
publishing and descriptive capabilities in the description
of Meandre descriptors
– The Meandre ontology describes a set of relationships
that model valid components, as understood by the
Meandre execution engine architecture
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
21. Meandre: Components in RDF
@prefix meandre: <http://www.meandre.org/ontology/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
Existing!
@prefix dc: <http://purl.org/dc/elements/1.1/> .
Standards
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <#> .
<http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations>
meandre:name Limited iterations^^xsd:string ;
rdf:type meandre:executable_component ;
dc:creator Xavier Llora^^xsd:string ;
dc:date 2007-11-17T00:32:35^^xsd:date ;
dc:description Allows only a limited number of
iterations^^xsd:string ;
dc:format java/class^^xsd:string ;
dc:rights University of Illinois/NCSA Open Source
License^^xsd:string ;
meandre:execution_context
<http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/
colt.jar> ,
<http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/
gacore.jar> ,
<http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-
The SEASR project and its Meandre infrastructure!
iterations/implementation/> ,
are sponsored by The Andrew W. Mellon Foundation
<http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/
resources/gacore-meandre.jar> ,
22. Meandre: Components Types
• Components are the basic building block of any
computational task.
• There are two kinds of Meandre components:
– Executable components
• Perform computational tasks that require no human
interactions during runtime
• Processes are initialized during flow startup and are fired when
in accordance to the policies defined for it.
– Control components
• Used to pause dataflow during user interaction cycles
• WebUI may be a HTML Form, Applet, or Other user interface
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
23. Meandre: Component Assemblies
• Defined by connecting outputs from one component to the
inputs of another.
– Cyclical connections are supported
– Components may have
• Zero to many inputs
• Zero to many output
• Properties that control runtime behavior
• Described using RDF
– Enables storage, reuse, and sharing like components
– Allows discovery and dynamic execution
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
24. Meandre: Flow (Complex Tasks)
• A flow is a collection of connected components
Read
Merge
P
P
Show
Get
P
P
Do
P
Dataflow execution
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
25. Meandre: Create, Publish, & Share
• “Components” and “Flows” have RDF descriptors
– Easily shared, fosters sharing, & reuse
– Allow machines to read and interpret
– Independent of the implementations
– Combine different implementation & platforms
– Components: Java, Python, Lisp, Web Services
– Execution: On a Laptop or a High Performance Cluster
• A “Location” is RDF descriptor of one to many
components, one to many flows, and their
implementations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
26. Meandre: Repository & Locations
• Each location represents a set components/flows
• Users can
– Combine different locations together
– Create components
– Assemble flows
– Share components and flows
• Repositories Help
– Administrate complex environments
– Organize components and flows
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
27. Meandre: Metadata Properties
• Components and Flows share properties such as
component name, creator, creation date, description, tags,
and rights.
• Components specific metadata to describe the
components' behavior, it’s location, type of
implementation, firing policy, runnable, format, resource
location, and execution context
• Flow specific metadata describes the directed graph of
components, components instances, connectors,
connector instance data port source, connector, instance
data port target, connector instance source, connector
instance target, instance name
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
28. Meandre: Programming Paradigm
• The programming paradigm creates complex
tasks by linking together a bunch of specialized
components. Meandre's publishing mechanism
allows components develop by third parties to be
assembled in a new flow.
• There are two ways to develop flows :
– Meandre’s Workbench visual programming tool
– Meandre’s ZigZag scripting language
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
29. Meandre: Workbench Existing Flow
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
30. Meandre: Workbench Create Flow
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
31. Meandre: Workbench Create Flow
Drag & Drop Selected
Component into
workspace
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
32. Meandre: Workbench Create Flow
Properties for Selected
Component Exposed
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
33. Meandre: Workbench Create Flow
Description for Selected
Component Exposed
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
34. Meandre: Workbench Create Flow
Drag & Drop Another
Component into
workspace
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
35. Meandre: Workbench Create Flow
Connect Output of First
Component to Input of
Second
Click First Port to
connect will highlight
Components
with color change (Red)
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
36. Meandre: Workbench Create Flow
Connect Output of First
Component to Input of
Second
Click Port to Connect
will cause a line to be
Components
displayed as visual
indicator
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
37. Meandre: Workbench Create Flow
Repeat Drag & Drop to
Complete the Assembly
Components
Flows
Locations
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
38. Meandre: ZigZag Script Language
• ZigZag is a simple language for describing data-
intensive flows
– Modeled on Python for simplicity.
– ZigZag is declarative language for expressing the
directed graphs that describe flows.
• Command-line tools allow ZigZag files to compile
and execute.
– A compiler is provided to transform a ZigZag program
(.zz) into Meandre archive unit (.mau).
– Mau(s) can then be executed by a Meandre engine.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
39. Meandre: ZigZag Script Language
• As an example the Flow Diagram
– The flow below pushes two strings that get concatenated and
printed to the console
–
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
40. Meandre: ZigZag Script Language
• ZigZag code that represents example flow:
#
# Imports the three required components and creates the component aliases
#
Repository import <http://localhost:1714/public/services/demo_repository.rdf>
Location alias <http://test.org/component/push_string> as PUSH
alias <http://test.org/component/concatenate-strings> as CONCAT
alias <http://test.org/component/print-object> as PRINT
#
Defines the logical # Creates four instances for the flow
repository location #
push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
where components in #
this flow can be found # Sets up the properties of the instances
#
similar to defining a push_hello.message, push_world.message = Hello , world!
location for workbench #
# Describes the data-intensive flow
which would then #
display available @phres, @pwres = push_hello(), push_world()
@cres = concat( string_one: phres.string; string_two: pwres.string )
components located print( object: cres.concatenated_string )
there #
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
41. Meandre: ZigZag Script Language
• ZigZag code that represents example flow:
#
# Imports the three required components and creates the component aliases
#
import <http://localhost:1714/public/services/demo_repository.rdf>
alias <http://test.org/component/push_string> as PUSH
Alias alias <http://test.org/component/concatenate-strings> as CONCAT
alias <http://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
Assigns a logical #
name reference for push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
#
each component # Sets up the properties of the instances
making subsequent #
push_hello.message, push_world.message = Hello , world!
program calls easier to #
read and write. # Describes the data-intensive flow
#
@phres, @pwres = push_hello(), push_world()
@cres = concat( string_one: phres.string; string_two: pwres.string )
print( object: cres.concatenated_string )
#
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
42. Meandre: ZigZag Script Language
• ZigZag code that represents example flow:
#
# Imports the three required components and creates the component aliases
#
import <http://localhost:1714/public/services/demo_repository.rdf>
alias <http://test.org/component/push_string> as PUSH
alias <http://test.org/component/concatenate-strings> as CONCAT
alias <http://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
#
Implementation push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
Instances #
# Sets up the properties of the instances
#
Create instances of push_hello.message, push_world.message = Hello , world!
the components using #
# Describes the data-intensive flow
the “Alias” references #
similar to dragging @phres, @pwres = push_hello(), push_world()
@cres = concat( string_one: phres.string; string_two: pwres.string )
components on to print( object: cres.concatenated_string )
workbench canvas #
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
43. Meandre: ZigZag Script Language
• ZigZag code that represents example flow:
#
# Imports the three required components and creates the component aliases
#
Define the property import <http://localhost:1714/public/services/demo_repository.rdf>
alias <http://test.org/component/push_string> as PUSH
values for components alias <http://test.org/component/concatenate-strings> as CONCAT
which is similar to filing alias <http://test.org/component/print-object> as PRINT
#
in values in the # Creates four instances for the flow
workbench’s properties #
push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
panel. #
# Sets up the properties of the instances
#
Set the Property push_hello.message, push_world.message = Hello , world!
Values #
# Describes the data-intensive flow
#
@phres, @pwres = push_hello(), push_world()
@cres = concat( string_one: phres.string; string_two: pwres.string )
print( object: cres.concatenated_string )
#
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
44. Meandre: ZigZag Script Language
• ZigZag code that represents example flow:
#
# Imports the three required components and creates the component aliases
#
import <http://localhost:1714/public/services/demo_repository.rdf>
alias <http://test.org/component/push_string> as PUSH
Define the connections alias <http://test.org/component/concatenate-strings> as CONCAT
alias <http://test.org/component/print-object> as PRINT
or relationships between #
the components in this # Creates four instances for the flow
#
flow which is similar to push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
drawing connection #
# Sets up the properties of the instances
lines on the workbench #
canvas push_hello.message, push_world.message = Hello , world!
#
# Describes the data-intensive flow
#
@phres, @pwres = push_hello(), push_world()
Describe @cres = concat( string_one: phres.string; string_two: pwres.string )
Connections print( object: cres.concatenated_string )
#
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
45. Meandre: ZigZag Script Language
• Automatic Parallelization
– Multiple instances of a component could be run in parallel to boost
throughput.
– Specialized operator available in ZigZag Scripting to cause multiple
instances of a given component to used
• Consider a simple flow example show in the diagram
• The dataflow declaration would look like
#
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string )
print( object:pt.string )
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
46. Meandre: ZigZag Script Language
• Automatic Parallelization
– Adding the operator [+AUTO] to middle component
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+AUTO]
print( object:pt.string )
– [+AUTO] tells the ZigZag compiler to parallelize the “pass
component instance” by the number of cores available on
system.
– [+AUTO] may also be written [+N] where N is an numeric
value to use for example [+10].
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
47. Meandre: ZigZag Script Language
• Automatic Parallelization
– Adding the operator [+4] would result in a directed graph
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+4]
print( object:pt.string )
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
48. Meandre: ZigZag Script Language
• Automatic Parallelization
– ZigZag has created 4 parallel instances of the component.
• It has also introduced a mapper instance that is in charge of
distributing the incoming data to each of the parallel instance.
• This is called unordered parallelization, since data may be
arriving to the print flow out of the original order in which they
were generated by the push component instance.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
49. Meandre: ZigZag Script Language
• Automatic Parallelization
– The operator [+AUTO] can be told to maintain data order with
“!”
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+AUTO!]
print( object:pt.string )
– The [+AUTO!] tells the ZigZag compiler to parallelize the “pass
component instance” by the number of cores available on
system and to maintain order of data throughput.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
50. Meandre: ZigZag Script Language
• Automatic Parallelization
– ZigZag has created 4 parallel instances of the component.
• It has also introduced a mapper instance that is in charge of
distributing the incoming data to each of the parallel instance.
• It has also introduced a reducer instance that is in charge of
distributing the incoming data to each of the parallel instance
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
51. Meandre: Flows to MAU
• Flows can be executed using their RDF
descriptors
• Flows can be compiled into MAU
• MAU is:
– Self-contained representation
– Ready for execution
– Portable
– The base of flow execution in grid environments
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
52. Meandre: The Architecture
• The design of the Meandre architecture follows
three directives:
– provide a robust and transparent scalable solution from
a laptop to large-scale clusters
– create an unified solution for batch and interactive tasks
– encourage reusing and sharing components
• To ensure such goals, the designed architecture
relies on four stacked layers and builds on top of
service-oriented architectures (SOA)
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
53. Meandre: Basic Single Server
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
54. Meandre MDX: Cloud Computing
• Servers can be
– instantiated on demand
– disposed when done or on demand
• A cluster is formed by at least one server
• The Meandre Distributed Exchange (MDX)
– Orchestrates operational integrity by managing cluster
configuration and membership using a shared database
resource.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
55. Meandre MDX: The Picture
MDX Backbone
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
56. Meandre MDX: The Architecture
• Virtualization infrastructure
– Provide a uniform access to the underlying execution environment. It relies on
virtualization of machines and the usage of Java for hardware abstraction.
• IO standardization
– A unified layer provides access to shared data stores, distributed file-system,
specialized metadata stores, and access to other service-oriented architecture
gateways.
• Data-intensive flow infrastructure
– Provide the basic Meandre execution engine for data-intensive flows, component
repositories and discovery mechanisms, extensible plugins and web user interfaces
(webUIs).
• Interaction layer
– Can provide self-contained applications via webUIs, create plugins for third-party
services, interact with the embedding application that relies on the Meandre engine,
or provide services to the cloud.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
57. Meandre MDX: The Experiment
• Experimental Prototype
– Designed and built to validate viability of MDX cluster
– Using VMWare Server 2.0 on three identical hosts with
• Windows Server 2003
• Equipped with two quad-core 2.8GHz Xeon processors
• 1600MHz front side bus
• 32Gb of RAM
• 4Tb of RAID 5 disk
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
58. Meandre MDX: The Experiment
• Experimental Prototype
– 8 virtual Machine instances were created on each host with
• 32-bit Ubuntu 8.04 Linux
• 3 Gb RAM dedicated to each instance
• 1 Physical processor core assigned to each VM
• VM instances were equipped to run a Meandre MDX server using Sun's Java
1.5 JVM
– A Third Physical hosts support 2 virtual machine instances with
• 32-bit Ubuntu 8.04 Linux
• 3 Gb RAM dedicated to each instance
• 1 Physical processor core assigned to each VM
• Highly available MySQL database and HTTP load-balancing facility
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
59. Meandre MDX: The Experiment
• We conducted three different experiments
– All three were based on the same flow shown earlier in the ZigZag
example with a single change to make the single line of text into
250,000 lines of text for each iteration of the flow.
– The first test was designed to test the scalability of a single
Meandre server.
– Concurrent flows !
running on a standalone!
engine on a log/log scale, !
each iteration of the flow !
pushed 250,000 lines of text
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
60. Meandre MDX: The Experiment
• We conducted three different experiments
– All three were based on the same flow shown earlier in the ZigZag
example with a single change to make the single line of text into
250,000 lines of text for each iteration of the flow.
– The second experiment were run against a virtual Meandre cluster
consisting of 16 Meandre servers.
– Concurrent flows !
running on a standalone!
engine on a log/log scale, !
each iteration of the flow !
pushed 1 lines of text
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
61. Meandre MDX: The Experiment
• We conducted three different experiments
– All three were based on the same flow shown earlier in the ZigZag
example with a single change to make the single line of text into
250,000 lines of text for each iteration of the flow.
– The third experiment were run against a virtual Meandre cluster
consisting of 16 Meandre servers.
– Concurrent flows !
running on a standalone!
engine on a log/log scale, !
each iteration of the flow !
pushed 250,000 lines of text
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
62. Meandre MDX: The Experiment
• We conducted three different experiments
– The first test clearly shows
• The average time per flow increased linearly with the
number of concurrent flows
– The next experiments clearly shows
• Cluster throughput grows linearly with the number of
Meandre servers available
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
63. Upcoming Events
• SEASR 2009 workshop
– The workshop is organized to provide expanded
opportunities for learning, knowledge sharing, and
support and is intended to provide sufficient
introduction and support so that teams can implement
a study using SEASR.
– The workshop is intended for institutional teams of
scholars from the Humanities.
– The workshop will include communication and work
from a team’s home campus as well as face-to-face
meeting on the University of Illinois campus.
64. SEASR:
Meandre: !
Semantic-Driven Data-Intensive !
Flows in the Clouds
Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg
National Center for Supercomputing Applications!
University of Illinois at Urbana-Champaign
{xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation