William H. Schroeder, Brett D. Fleisch “ Architecture of the Oasis Mobile Shared Virtual Memory System“, The Department of Computer Science and Engineering at the University of California, Riverside (UCR)
Ballette, M. Liotta, A. Ramzy, S.M. ” Execution time prediction in DSM-based mobile grids” : on IEEE International Symposium Cluster Computing and the Grid, 2005. CCGrid 2005.
Architecture of the oasis mobile shared virtual memory system
1. Architecture of the Oasis Mobile Shared
Virtual Memory System
Speaker :呂宗螢
Adviser :梁文耀 老師
Date : 2008/3/4
2. Embedded and Parallel Systems Lab2
Paper
1. William H. Schroeder, Brett D. Fleisch “ Architecture of the Oasis
Mobile Shared Virtual Memory System“, The Department of
Computer Science and Engineering at the University of California,
Riverside (UCR)
2. Ballette, M. Liotta, A. Ramzy, S.M. ” Execution time prediction in
DSM-based mobile grids” : on IEEE International Symposium
Cluster Computing and the Grid, 2005. CCGrid 2005.
3. Embedded and Parallel Systems Lab3
Outline (for paper 1)
Introduction
Oasis Collaborative Computing Application
DSM vs. SVM
Design
System Architecture
Site Architecture
Protocol 1 and Protocol 2
Disconnection
Reconnection Algorithm
Evaluation
4. Embedded and Parallel Systems Lab4
Outline (for paper 2)
Related Work
Mobile Grid System Architecture
This Paper’s Idea
DSM vs. Grid
5. Embedded and Parallel Systems Lab5
Introduction
Support collaborative applications that operate
with mobile laptop computers and personal
digital assistants (PDAs).
Require peer-to-peer and client-server
interactions in conditions of less than ideal
network connectivity.
Has a highly consistent backbone that typically
execute either tightly or loosely coupled
distributed applications. (can disconnect
operation)
6. Embedded and Parallel Systems Lab6
Introduction
A time-based coherency approach is used which
uses a mechanism called a lease.
Distribute systems software remains difficult and
expensive to develop.
DSM (Distribution Share Memory).
Transparency
7. Embedded and Parallel Systems Lab7
Oasis Collaborative Computing Application
POSIX DSM !?
Hospital Management system
Battlefield Tracking Application
Grocery Shopping Assistant
Distributed Appointment Scheduler
Distributed Grading system
8. Embedded and Parallel Systems Lab8
DSM vs. SVM
Tight DSM
Loose SVM (Shared Virtual Memory)
CPU
Memory
Mapping
CPU
Memory
Mapping
CPU
Memory
Mapping
Node 0 Node 1 Node N
...
Shared memory
The picture source by 張志標 “ Design and Implementation of a User-Level Multi-threaded Distributed Shared Memory System”
9. Embedded and Parallel Systems Lab9
DSM vs. SVM
Cache for persistent store object (e.g. file)
or contain memory that has very few hot
spots.
SVM can operate on the disconnected
10. Embedded and Parallel Systems Lab10
Design
1. Oasis is designed to explore adapting a
DSM paradigm for a system of
collaborating mobile applications using
wireless communication.
Mirage : X
Threadmarks : X
Quarks : O
11. Embedded and Parallel Systems Lab11
Design
2. Oasis is to support continual operability.
Can disconnect operate.
Disconnection of mobile computers in a
distributed system can occur in three
different methods: voluntary, involuntary,
and intermittent.
Time-out based.
12. Embedded and Parallel Systems Lab12
Design
3. Oasis provide referential transparency
for programs that use POSIX shared
memory and POSIX semaphores.
Oasis takes existing applications that adhere
to the POSIX interface and execute these
programs unchanged in a distributed
system, whereas earlier, they may have
been only functional on an uniprocessor.
17. Embedded and Parallel Systems Lab17
Disconnection
Hoarding
Cache the shared memory pages required
during disconnection
Lease
Conjunction with hoarding to enforce the time-
based coherence model.
During the fixed period, the mobile machine
may use the resources exclusively.
If exceeds the time, reconciliation
mechanisms
18. Embedded and Parallel Systems Lab18
Disconnection
Three steps to disconnect
1. Applications query the user for information
to be used for disconnection.
2. Checks determine if there are any conflicts
in the pages that are requested.
3. There is a synchronous signal to continue
the disconnection process after the previous
operations are complete.
19. Embedded and Parallel Systems Lab19
Disconnection-step 1
The user is queried for information.
Estimated disconnection time and the
hoarding policy for the shared memory
pages.
Hoarding policy
All pages, most recently used pages, least
recently used pages, or most frequently
referenced pages
20. Embedded and Parallel Systems Lab20
Disconnection-step 2
Checks the requested pages for conflicts.
Conflicts arise using leases when the user
requests a lease on a page that is already
leased or owned by another disconnected
machine.
The user will be given the opportunity to
terminate the disconnected process or to
hoard the desired pages without a lease.
21. Embedded and Parallel Systems Lab21
Disconnection-step 3
The disconnection algorithm is initiated by internally or
externally generating a signal which will indirectly
activate the disconnection/reconnection thread.
The following operations occur during resource
reorganization:
Acquire global disconnection system lock.
Flush stored shared memory pages and locks to the
backbone.
Hoard shared memory pages to the disconnecting site and
optionally establish a lease for those pages.
Release global disconnection system lock and initiate
independent operation
22. Embedded and Parallel Systems Lab22
Disconnected Operation
Lease that have a valid lease at reconnection
can simply replace pages on the backbone,
which are guaranteed to be unchanged from
pages updated during disconnection.
Capturing and reconciliation, permits Oasis to
integrate changes that occurred on disconnected
machines with possible conflicting updates that
may have occurred on the backbone.
23. Embedded and Parallel Systems Lab23
Disconnected Operation
Capturing is used to record all write to the
memory while the client operates
disconnected.
Logging
Twinning
25. Embedded and Parallel Systems Lab25
Reconnection Algorithm
Activate the disconnection/reconnection thread.
1. The backbone global system mutual exclusion
lock is raised.
2. The reconnecting machine must validate the
lease, if one was used.
3. After determining if the lease is valid, two
different situations arise:
reconnection with a valid lease
reconnection using reconciliation.
26. Embedded and Parallel Systems Lab26
Reconnection With A Valid Lease
The server holds the lease timer, and this
timer must be cancelled during
reconnection.
If time out, the reconnecting machine must
use reconciliation.
If no, the log or all twins can be discarded.
27. Embedded and Parallel Systems Lab27
Reconnection Using Reconciliation Methods
Ignoring changes that occurred when
disconnected. (read-only).
Merge procedure which is an application-
specific rule-based approach to integrate
disconnected memory changes. (like
Bayou)
28. Embedded and Parallel Systems Lab28
Evaluation
System Performance and Basic Costs
Event Time (ms)
Page fault 17.35
32bytes message round-trip 1.95
1K message round-trip 11.71
Memory-to-memory copy(8k) 0.98
30. Embedded and Parallel Systems Lab30
Component Cost of Disconnection Algorithm
1 Oasis server.
1 disconnecting site.
1~6 backbone sites.
Each site started a test program which
allocated 6 shared memory pages.
1 shared lock.
31. Embedded and Parallel Systems Lab31
Component Cost of Disconnection Algorithm
Case 1
Has all the shared resources stored locally at the time
of disconnection. (worst)
Case 2
Has all shared resources located on backbone site.
(best)
Disconnection consists of 5 components:
Acquire Global System Lock
Flush Shared memory Pages
Acquire and Sync Pages
Flush Shared locks
Release Global System Lock
36. Embedded and Parallel Systems Lab36
Conclusions
Proof-of-concept.
Mobile distributed systems.
Convenient programming methodology
37. Embedded and Parallel Systems Lab37
Future Work
Wireless network.
Incorporating multiple DSM consistency
protocols.
Experiments with different hoarding
policies to obtain pages during
disconnection.
Shared object model instead of a paged
region model.
41. Embedded and Parallel Systems Lab41
Mobile Grid System Architecture
Terminals act as resource provider or
consumers.
Resource providers periodically publish
their capabilities (CPU, storage, memory
etc.) and their updated status (resource
utilization, Mobile Grid availability
connectivity etc.) to the monitoring service.
42. Embedded and Parallel Systems Lab42
Mobile Grid System Architecture
The broker, based on the task descriptor
and the resource provider related
information passed by the monitoring
service,
Disseminates the different work units among
the chosen terminals.
Collects the partial results
Sends them back to the resource consumer.
43. Embedded and Parallel Systems Lab43
This Paper’s Idea
Task-duration prediction approach.
Fault-recovery based on time-out.
44. Embedded and Parallel Systems Lab44
DSM vs. Grid
DSM Grid
Need lock? yes no
Programming
paradigm
Acquire() (lock)
Read(x)
Write(x)
Release()
(unlock)
in(x)
out(x)
pop(x)