SlideShare une entreprise Scribd logo
1  sur  34
Data Placement Scheduling
between Distributed Repositories
Stork 1.0 and beyond
Mehmet Balman
Louisiana State University
Baton Rouge, LA
MotivationMotivation

Scientific applicationsare becoming more data intensive
(dealing with petabytes of data)

We use geographically distributed resources to satisfy
immense computational requirements

The distributed nature of the resources made data
movement is a major bottleneck for end-to-end
application performance

Therefore, complex middleware is required to
orchestrate the use of these storage and network
resources between collaborating parties, and to manage
the end-to-end distribution of data.

Data Movement using Stork

Data Scheduling

Tuning Data Transfer Operations

Failure-Awareness

Job Aggregation

Future Directions
AgendaAgenda

Advance Data Transfer Protocols (i.e. GridFTP)

High throughput data transfer

Data Scheduler: Stork

Organizing data movement activities

Ordering data transfer requests
Moving Large Data SetsMoving Large Data Sets
A scientific application generates immense amount
of simulation data using supercomputing resources
The generated data is stored in a temporary space and
need to be moved to a data repository for further processing
or archiving
Another application may be waiting this generated data as
its input to start execution
Delaying the data transfer operation or completing the
transfer far after than the expected time may create several
problems
– (other resources are waiting for this transfer operation
to complete)
Use caseUse case

Stork: A batch scheduler for Data Placement
activities

Supports plug-in data transfer modules for
specific protocols/services

Throttling: deciding number of concurrent
transfers

Keep a log of data placement activities

Add fault tolerance to data transfers

Tuning protocol transfer parameters (number of
parallel TCP streams)
Scheduling Data Movement JobsScheduling Data Movement Jobs
[ dest_url = "gsiftp://eric1.loni.org/scratch/user/";
arguments = -p 4 dbg -vb";
src_url = "file:///home/user/test/";
dap_type = "transfer";
verify_checksum = true;
verify_filesize = true;
set_permission = "755" ;
recursive_copy = true;
network_check = true;
checkpoint_transfer = true;
output = "user.out";
err = "user.err";
log = "userjob.log";
]
Stork Job submissionStork Job submission
End-to-end bulk data transfer (latency wall)

TCP based solutions

Fast TCP, Scalable TCP etc

UDP based solutions

RBUDP, UDT etc

Most of these solutions require kernel level
changes

Not preferred by most domain scientists
Fast Data TransferFast Data Transfer

Take an application-level transfer protocol (i.e.
GridFTP) and tune-up for better performance:

Using Multiple (Parallel) streams

Tuning Buffer size
(efficient utilization of available network capacity)
Level of Parallelism in End-to-end Data Transfer

number of parallel data streams connected to a data transfer
service for increasing the utilization of network bandwidth

number of concurrent data transfer operations that are
initiated at the same time for better utilization of system
resources.
Application Level TuningApplication Level Tuning

Instead of a single connection at a time, multiple
TCP streams are opened to a single data transfer
service in the destination host.

We gain larger bandwidth in TCP especially in a
network with less packet loss rate; parallel connections
better utilize the TCP buffer available to the data
transfer, such that N connections might be N times
faster than a single connection

Multiple TCP streams result in extra in the system
Parallel TCP StreamsParallel TCP Streams
Average Throughput using parallel streams over 1GbpsAverage Throughput using parallel streams over 1Gbps
Experiments in LONI (www.loni.org) environment - transfer file to
QB from Linux m/c

Instead of predictive sampling, use data from
actual transfer

transfer data by chunks (partial transfers) and
also set control parameters on the fly.

measure throughput for every transferred data
chunk

gradually increase the number of parallel
streams till it comes to an equilibrium point
Adaptive TuningAdaptive Tuning

No need to probe the system and make
measurements with external profilers

Does not require any complex model for
parameter optimization

Adapts to changing environment

But, overhead in changing parallelism level

Fast start (exponentially increase the number
of parallel streams)
Adaptive TuningAdaptive Tuning

Start with single stream (n=1)

Measure instant throughput for every data chunk transferred
(fast start)

Increase the number of parallel streams (n=n*2),

transfer the data chunk

measure instant throughput

If current throughput value is better than previous one,
continue

Otherwise, set n to the old value and gradually increase
parallelism level (n=n+1)

If no throughput gain by increasing number of streams (found
the equilibrium point)

Increase chunk size (delay measurement period)
Adaptive TuningAdaptive Tuning
Dynamic Tuning AlgorithmDynamic Tuning Algorithm
Dynamic Tuning AlgorithmDynamic Tuning Algorithm
Dynamic Tuning AlgorithmDynamic Tuning Algorithm
• Dynamic Environment:
• data transfers are prune to frequent failures
• what went wrong during data transfer?
• No access to the remote resources
• Messages get lost due to system malfunction
• Instead of waiting failure to happen
• Detect possible failures and malfunctioning services
• Search for another data server
• Alternate data transfer service
• Classify erroneous cases to make better decisions
Failure AwarenessFailure Awareness
• Use Network Exploration Techniques
– Check availability of the remote service
– Resolve host and determine connectivity failures
– Detect available data transfers service
– should be Fast and Efficient not to bother system/network
resources
• Error while transfer is in progress?
– Error_TRANSFER
• Retry or not?
• When to re-initiate the transfer
• Use alternate options?
Error DetectionError Detection
• Data Transfer Protocol not always return appropriate error codes
• Using error messages generated by the data transfer protocol
• A better logging facility and classification
•Recover from Failure
•Retry failed operation
•Postpone scheduling of
a failed operations
•Early Error Detection
•Initiate Transfer when
erroneous condition
recovered
•Or use Alternate
options
Error ClassificationError Classification
Error ReportingError Reporting
Scoop data - Hurricane Gustov Simulations
Hundreds of files (250 data transfer operation)
Small (100MB) and large files (1G, 2G)
Failure Aware SchedulingFailure Aware Scheduling
• Verify the successful completion of the operation
by controlling checksum and file size.
• for GridFTP, Stork transfer module can recover
from a failed operation by restarting from the last
transmitted file. In case of a retry from a failure,
scheduler informs the transfer module to recover
and restart the transfer using the information from
a rescue file created by the checkpoint-enabled
transfer module.
• An “intelligent” (dynamic tuning) alternative to
Globus RFT (Reliable File Transfer)
New Transfer ModulesNew Transfer Modules
• Multiple data movement jobs are combined and
processed as a single transfer job
• Information about the aggregated job is stored in the
job queue and it is tied to a main job which is actually
performing the transfer operation such that it can be
queried and reported separately.
• Hence, aggregation is transparent to the user
• We have seen vast performance improvement,
especially with small data files
– decreasing the amount of protocol usage
– reducing the number of independent network connections
Job AggregationJob Aggregation
Experiments on LONI (Louisiana Optical Network Initiative) :
1024 transfer jobs from Ducky to Queenbee (rtt avg 5.129 ms) - 5MB
data file per job
Job AggregationJob Aggregation
We need priority-based data transfer scheduling
with advance reservation and provisioning to allow
researchers to use data placement as-a-service
where they can plan ahead and reserve the time
period for their data movement operations.
Need to orchestrate advance storage and network
allocation together for data movements (very less
progress in the literature)
Future DirectionsFuture Directions
Next generation research networks such as ESNet
and Internet2
– provide high-speed on-demand data access
between collaborating institutions by delivering
network-as-a-service
On-Demand Secure Circuits and Advance
Reservation System (OSCARS)
• Guaranteed bandwidth (at certain time, for a
certain bandwidth and length of time)
Network ReservationNetwork Reservation
Next generation research networks such as ESNet
and Internet2
– provide high-speed on-demand data access
between collaborating institutions by delivering
network-as-a-service
On-Demand Secure Circuits and Advance
Reservation System (OSCARS)
• Guaranteed bandwidth (at certain time, for a
certain bandwidth and length of time)
Network ReservationNetwork Reservation
Research ConceptResearch Concept
accept time constraints
allow users to plan ahead
orchestrate resource allocation
provide advance resource reservation
reserve the scheduler’s time for future
data movement operation
MethodologyMethodology
two separate queues
Planning Phase
resource reservation and time allocation
− Preemption?
− Confirm submission of a request?
Execution Phase
re-organization, tuning, and ordering
Failure-awareness
Job Aggregation
Dynamic Adaptation in data transfers
Priority-based scheduling (earliest deadine?)
MethodologyMethodology
Phase 1:
The scheduler checks the availability of resources in a
given time period and justifies whether requested
operation can be satisfied with the given time
constraints

The server and the network capacity is allocated
for the future time period in advance
Phase 2:
The scheduler considers other requests reserved for
future time windows and re-order operations in the
current time period

Aggregation

Pre-processing
www.petashare.org
www.cybertools.loni.org
www.storkproject.org
www.cct.lsu.edu
Questions?Questions?
Mehmet Balman balman@cct.lsu.edu
Thank youThank you
Data Movement between
Distributed Repositories for
Large Scale Collaborative
Science
Mehmet Balman
Louisiana State University
Baton Rouge, LA

Contenu connexe

Tendances

Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
IJERA Editor
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
WANdisco Plc
 

Tendances (20)

Fault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big DataFault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big Data
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
 
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth
[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth
[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
2009.08 grid peer-slides
2009.08 grid peer-slides2009.08 grid peer-slides
2009.08 grid peer-slides
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Common Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartCommon Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli Dart
 
Hadoop
HadoopHadoop
Hadoop
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
D021201024031
D021201024031D021201024031
D021201024031
 

En vedette

мелкая моторика рук 4 гр
мелкая моторика рук 4 грмелкая моторика рук 4 гр
мелкая моторика рук 4 гр
bakirova
 
посещение контактного зоопарка
посещение контактного зоопаркапосещение контактного зоопарка
посещение контактного зоопарка
bakirova
 

En vedette (20)

大学生晚睡
大学生晚睡大学生晚睡
大学生晚睡
 
178th Oktoberfest (2011)
178th Oktoberfest (2011)178th Oktoberfest (2011)
178th Oktoberfest (2011)
 
One day when i grow up
One day when i grow upOne day when i grow up
One day when i grow up
 
ET6
ET6ET6
ET6
 
Make:Shift
Make:ShiftMake:Shift
Make:Shift
 
Presentació ateneu web
Presentació ateneu webPresentació ateneu web
Presentació ateneu web
 
La Amistad
La AmistadLa Amistad
La Amistad
 
Theme1
Theme1Theme1
Theme1
 
Form01
Form01Form01
Form01
 
мелкая моторика рук 4 гр
мелкая моторика рук 4 грмелкая моторика рук 4 гр
мелкая моторика рук 4 гр
 
Policy Paper Menuju Pemanfaatan Ruang Sumatera Selatan Yang Adil
Policy Paper Menuju Pemanfaatan Ruang Sumatera Selatan Yang AdilPolicy Paper Menuju Pemanfaatan Ruang Sumatera Selatan Yang Adil
Policy Paper Menuju Pemanfaatan Ruang Sumatera Selatan Yang Adil
 
(Older) Quote of the Week by Stone Michaels
(Older) Quote of the Week by Stone Michaels(Older) Quote of the Week by Stone Michaels
(Older) Quote of the Week by Stone Michaels
 
Adivina
AdivinaAdivina
Adivina
 
игровая 5
игровая 5игровая 5
игровая 5
 
Photo shop
Photo shopPhoto shop
Photo shop
 
Egypt 2011
Egypt 2011Egypt 2011
Egypt 2011
 
Double page spreads
Double page spreadsDouble page spreads
Double page spreads
 
方博享學(5):做有效的決策
方博享學(5):做有效的決策方博享學(5):做有效的決策
方博享學(5):做有效的決策
 
Africa diapo
Africa diapoAfrica diapo
Africa diapo
 
посещение контактного зоопарка
посещение контактного зоопаркапосещение контактного зоопарка
посещение контактного зоопарка
 

Similaire à Presentation southernstork 2009-nov-southernworkshop

Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
balmanme
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
balmanme
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
balmanme
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
balmanme
 
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellitesDesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
Brenden Hogan
 

Similaire à Presentation southernstork 2009-nov-southernworkshop (20)

Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
System Design & Scalability
System Design & ScalabilitySystem Design & Scalability
System Design & Scalability
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction SystemPACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
Soft Real-Time Guarantee for Control Applications Using Both Measurement and ...
Soft Real-Time Guarantee for Control Applications Using Both Measurement and ...Soft Real-Time Guarantee for Control Applications Using Both Measurement and ...
Soft Real-Time Guarantee for Control Applications Using Both Measurement and ...
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellitesDesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
DesignOfAnExtensibleTelemetry&CommandArcitectureForSmallSatellites
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 

Plus de balmanme

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
balmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
balmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
balmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
balmanme
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
balmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
balmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
balmanme
 
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 -  Delft, The NetherlandsHPDC 2012 presentation - June 19, 2012 -  Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
balmanme
 

Plus de balmanme (20)

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 -  Delft, The NetherlandsHPDC 2012 presentation - June 19, 2012 -  Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Presentation southernstork 2009-nov-southernworkshop

  • 1. Data Placement Scheduling between Distributed Repositories Stork 1.0 and beyond Mehmet Balman Louisiana State University Baton Rouge, LA
  • 2. MotivationMotivation  Scientific applicationsare becoming more data intensive (dealing with petabytes of data)  We use geographically distributed resources to satisfy immense computational requirements  The distributed nature of the resources made data movement is a major bottleneck for end-to-end application performance  Therefore, complex middleware is required to orchestrate the use of these storage and network resources between collaborating parties, and to manage the end-to-end distribution of data.
  • 3.  Data Movement using Stork  Data Scheduling  Tuning Data Transfer Operations  Failure-Awareness  Job Aggregation  Future Directions AgendaAgenda
  • 4.  Advance Data Transfer Protocols (i.e. GridFTP)  High throughput data transfer  Data Scheduler: Stork  Organizing data movement activities  Ordering data transfer requests Moving Large Data SetsMoving Large Data Sets
  • 5. A scientific application generates immense amount of simulation data using supercomputing resources The generated data is stored in a temporary space and need to be moved to a data repository for further processing or archiving Another application may be waiting this generated data as its input to start execution Delaying the data transfer operation or completing the transfer far after than the expected time may create several problems – (other resources are waiting for this transfer operation to complete) Use caseUse case
  • 6.  Stork: A batch scheduler for Data Placement activities  Supports plug-in data transfer modules for specific protocols/services  Throttling: deciding number of concurrent transfers  Keep a log of data placement activities  Add fault tolerance to data transfers  Tuning protocol transfer parameters (number of parallel TCP streams) Scheduling Data Movement JobsScheduling Data Movement Jobs
  • 7. [ dest_url = "gsiftp://eric1.loni.org/scratch/user/"; arguments = -p 4 dbg -vb"; src_url = "file:///home/user/test/"; dap_type = "transfer"; verify_checksum = true; verify_filesize = true; set_permission = "755" ; recursive_copy = true; network_check = true; checkpoint_transfer = true; output = "user.out"; err = "user.err"; log = "userjob.log"; ] Stork Job submissionStork Job submission
  • 8. End-to-end bulk data transfer (latency wall)  TCP based solutions  Fast TCP, Scalable TCP etc  UDP based solutions  RBUDP, UDT etc  Most of these solutions require kernel level changes  Not preferred by most domain scientists Fast Data TransferFast Data Transfer
  • 9.  Take an application-level transfer protocol (i.e. GridFTP) and tune-up for better performance:  Using Multiple (Parallel) streams  Tuning Buffer size (efficient utilization of available network capacity) Level of Parallelism in End-to-end Data Transfer  number of parallel data streams connected to a data transfer service for increasing the utilization of network bandwidth  number of concurrent data transfer operations that are initiated at the same time for better utilization of system resources. Application Level TuningApplication Level Tuning
  • 10.  Instead of a single connection at a time, multiple TCP streams are opened to a single data transfer service in the destination host.  We gain larger bandwidth in TCP especially in a network with less packet loss rate; parallel connections better utilize the TCP buffer available to the data transfer, such that N connections might be N times faster than a single connection  Multiple TCP streams result in extra in the system Parallel TCP StreamsParallel TCP Streams
  • 11. Average Throughput using parallel streams over 1GbpsAverage Throughput using parallel streams over 1Gbps Experiments in LONI (www.loni.org) environment - transfer file to QB from Linux m/c
  • 12.  Instead of predictive sampling, use data from actual transfer  transfer data by chunks (partial transfers) and also set control parameters on the fly.  measure throughput for every transferred data chunk  gradually increase the number of parallel streams till it comes to an equilibrium point Adaptive TuningAdaptive Tuning
  • 13.  No need to probe the system and make measurements with external profilers  Does not require any complex model for parameter optimization  Adapts to changing environment  But, overhead in changing parallelism level  Fast start (exponentially increase the number of parallel streams) Adaptive TuningAdaptive Tuning
  • 14.  Start with single stream (n=1)  Measure instant throughput for every data chunk transferred (fast start)  Increase the number of parallel streams (n=n*2),  transfer the data chunk  measure instant throughput  If current throughput value is better than previous one, continue  Otherwise, set n to the old value and gradually increase parallelism level (n=n+1)  If no throughput gain by increasing number of streams (found the equilibrium point)  Increase chunk size (delay measurement period) Adaptive TuningAdaptive Tuning
  • 18. • Dynamic Environment: • data transfers are prune to frequent failures • what went wrong during data transfer? • No access to the remote resources • Messages get lost due to system malfunction • Instead of waiting failure to happen • Detect possible failures and malfunctioning services • Search for another data server • Alternate data transfer service • Classify erroneous cases to make better decisions Failure AwarenessFailure Awareness
  • 19. • Use Network Exploration Techniques – Check availability of the remote service – Resolve host and determine connectivity failures – Detect available data transfers service – should be Fast and Efficient not to bother system/network resources • Error while transfer is in progress? – Error_TRANSFER • Retry or not? • When to re-initiate the transfer • Use alternate options? Error DetectionError Detection
  • 20. • Data Transfer Protocol not always return appropriate error codes • Using error messages generated by the data transfer protocol • A better logging facility and classification •Recover from Failure •Retry failed operation •Postpone scheduling of a failed operations •Early Error Detection •Initiate Transfer when erroneous condition recovered •Or use Alternate options Error ClassificationError Classification
  • 22. Scoop data - Hurricane Gustov Simulations Hundreds of files (250 data transfer operation) Small (100MB) and large files (1G, 2G) Failure Aware SchedulingFailure Aware Scheduling
  • 23. • Verify the successful completion of the operation by controlling checksum and file size. • for GridFTP, Stork transfer module can recover from a failed operation by restarting from the last transmitted file. In case of a retry from a failure, scheduler informs the transfer module to recover and restart the transfer using the information from a rescue file created by the checkpoint-enabled transfer module. • An “intelligent” (dynamic tuning) alternative to Globus RFT (Reliable File Transfer) New Transfer ModulesNew Transfer Modules
  • 24. • Multiple data movement jobs are combined and processed as a single transfer job • Information about the aggregated job is stored in the job queue and it is tied to a main job which is actually performing the transfer operation such that it can be queried and reported separately. • Hence, aggregation is transparent to the user • We have seen vast performance improvement, especially with small data files – decreasing the amount of protocol usage – reducing the number of independent network connections Job AggregationJob Aggregation
  • 25. Experiments on LONI (Louisiana Optical Network Initiative) : 1024 transfer jobs from Ducky to Queenbee (rtt avg 5.129 ms) - 5MB data file per job Job AggregationJob Aggregation
  • 26. We need priority-based data transfer scheduling with advance reservation and provisioning to allow researchers to use data placement as-a-service where they can plan ahead and reserve the time period for their data movement operations. Need to orchestrate advance storage and network allocation together for data movements (very less progress in the literature) Future DirectionsFuture Directions
  • 27. Next generation research networks such as ESNet and Internet2 – provide high-speed on-demand data access between collaborating institutions by delivering network-as-a-service On-Demand Secure Circuits and Advance Reservation System (OSCARS) • Guaranteed bandwidth (at certain time, for a certain bandwidth and length of time) Network ReservationNetwork Reservation
  • 28. Next generation research networks such as ESNet and Internet2 – provide high-speed on-demand data access between collaborating institutions by delivering network-as-a-service On-Demand Secure Circuits and Advance Reservation System (OSCARS) • Guaranteed bandwidth (at certain time, for a certain bandwidth and length of time) Network ReservationNetwork Reservation
  • 29. Research ConceptResearch Concept accept time constraints allow users to plan ahead orchestrate resource allocation provide advance resource reservation reserve the scheduler’s time for future data movement operation
  • 30. MethodologyMethodology two separate queues Planning Phase resource reservation and time allocation − Preemption? − Confirm submission of a request? Execution Phase re-organization, tuning, and ordering Failure-awareness Job Aggregation Dynamic Adaptation in data transfers Priority-based scheduling (earliest deadine?)
  • 31. MethodologyMethodology Phase 1: The scheduler checks the availability of resources in a given time period and justifies whether requested operation can be satisfied with the given time constraints  The server and the network capacity is allocated for the future time period in advance Phase 2: The scheduler considers other requests reserved for future time windows and re-order operations in the current time period  Aggregation  Pre-processing
  • 34. Data Movement between Distributed Repositories for Large Scale Collaborative Science Mehmet Balman Louisiana State University Baton Rouge, LA