This poster has been presented in INFORMS annual meeting 2015. And it can be seen as a "snapshot" of my thesis research work, with the elimination of more advanced developments.
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Optimized scheduling of sequential resource allocation systems (poster)
1. 5.2 Static random switches
Static random switches are defined only by the set of the enabled
untimed transitions and not by the state itself, i.e.,
Ξi = Ξj if the vanishing states vi and vj activate the same set
of untimed transitions
• The corresponding policy space contains all the “static-
priority” policies
• Mathematically, the proposed restriction corresponds to
a state space aggregation
• Hence, we can refine the obtained solution through
(partial) disaggregation
4. The methodological framework (demo with an example resource allocation system)1. Background and motivation
Resource allocation in flexibly automated operations
Optimized Scheduling of Sequential Resource Allocation Systems
Ran Li (rli63@gatech.edu)
Spyros Reveliotis (spyros@isye.gatech.edu)
WS1 WS2
I/O Port
Process route:
WS1 -> WS2 -> WS1
0
1
2
3 4
5
6
7
8
9
μ2 / (μ1 + μ2)
13
12
14
17
23
μ2 / (μ2 + μ3)
18
20
19
22
11
55
56
58
59
60
62
65
26
28
29 30
51
53
μ3 / (μ2 + μ3)
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
49
50 47
48
44
46
45
25
323334
36
37
38
4041
57
63
27
35
64
10
16
15
μ2 / (μ2 + μ3)
μ3 / (μ2 + μ3)
μ1 / (μ1 + μ2)
24
μ1 / (μ1 + μ2)
μ2 / (μ1 + μ2)
31
39
42
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
43
μ3 / (μ2 + μ3)
μ2 / (μ2 + μ3)
61
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
52
μ2 / (μ2 + μ3)
54
μ3 / (μ2 + μ3) μ2 / (μ2 + μ3)
21
μ3 / (μ2 + μ3)
maximize ζ η(ζ) = π(ζ) T • r
subject to
Ξi
T • 1 = 1.0 for all vi
ε ≤ ξij for all vi and all j in {1,…,k(i)}
where
Ξi = < ζij: j=1,…,k(i) > the random switch
for vanishing state vi
ζ = the vector collecting all ζij
ε = a minimal degree of randomization in
each Ξi
π(ζ) = the steady-state distribution for
tangible states, defined by the pricing of
each element of ζ
r = the vector collecting the reward rates
at the tangible states
4.1 The example system
A flexibly automated production cell
Objective
Maximize long-run time average throughput
Configuration
2 workstations (WS): each with 1 server, 2 buffer slots
The jobs in processing occupy their buffer slots
1 process type with 3 stages
Stage j takes exponentially distributed time length with rate µj
4.2 Generalized stochastic Petri-net
(GSPN)
Route t0 – p0 – t1 … p6 – t7: the process route
• Untimed transitions: their firing is immediate,
and models the allocation of resources
• Timed transitions: their firing has an
exponentially distributed delay time, has lower
priority than the firing of untimed transitions,
and models the processing of job instances
• Places: Model the different process stages
Places p7 - p10: Model resource availability
Place p11 and its arcs (the red subnet): Models the
applied DAP.
Model as a
discrete event
system
State space for
the timed
dynamics
The underlying
optimization
problem
4.3 State transition diagram for the underlying
semi-Markov process (SMP) with reward
Tangible state: only timed transitions are enabled, and
their branching probabilities are determined by
exponential race
Tangible state with rewards: the timed transition that
models the output (i.e., transition t7) is enabled
Vanishing state: at least one untimed transition is
enabled
Vanishing state with a random switch: at least two
untimed transitions are enabled, and a decision of “which
fires first” is needed
Flexibly automated production cell Automated guided vehicles (AGV) 2D traffic system of free-ranging mobile agents
Multi-thread software
Stage I-1
Stage I-2
Process Type I
Stage II-1
Stage II-2a Stage II-2b
Stage II-3
Process Type II
Choose one
alternative
Resources and
Requirement on them
All these applications can be abstracted as
sequential resource allocation systems (RAS)
Sequential resource allocation systems
• A sequential resource allocation system consists
of several process types, and reusable but finite
resources of different types.
• A job instance of a process type can be
executed by going through a number of stages.
• Each stage requires a certain amount of certain
resource types and a random processing time.
• The job instances of different process types, or
the same process type but different stages, may
compete for the required resource.
2. Problem definition
Objective
• Maximize some time-related performance measure, while
• maintaining behavioral correctness (e.g., avoid deadlocks).
What can be regulated?
• Allocation of resources to the competing job instances
3. Method overview The logical control problem has been
well studied in the community of discrete
event systems.
The performance control problem is in
the domain of stochastic optimization.
This research defines a discrete event
model as the framework for solving
performance control problem while
integrating the existing logical control
results, and develops the supporting
methodology.RAS Domain
LogicalControl
SystemStateModel
PerformanceControl
Configuration Data
Feasible
Actions
Admissible
Actions
Event Commanded
Action
Deadlock
A pattern of “circular waiting”: all jobs in a given set cannot
advance to their next stage since they are waiting for
resources currently allocated to some other job in the set.
Optimal deadlock avoidance policy (DAP)
Forbid the actions that will unavoidably lead to deadlock
states.
Stage 2 job
instance
WS2WS1
Stage 1 job
instance
No job instances can advance further, because all
buffers are full
Optimal DAP: not load new jobs if total number of
job instances in stages 1 and 2 is three
Deadlock and deadlock avoidance in the example system Implementation
t0
t1
t2
t3
t4
t5
t6
t7
p0
p1
p2
p3
p4
p5
p6
p7
p8
p9 p10
p11
Untimed Transitions
Timed Transitions
rate = µ1
rate = µ2
rate = µ3
5. Coping with the underlying complexity
t2 and t6 are enabled
at state 25, but firing
one transition does
not disable the other
5.1 Random switch refinement
Some random switches are not necessary since they do
not reflect “real conflicts” in resource allocation
Example:
We can replace {t2, t6} by
the singleton {t2}, but not
{t6} : firing t6 first “lost”
the possibility to reach
the tangible state 39
For each vanishing state, the
replacement can be
performed if it does not
impact the potential to reach
any tangible states.
Such a refinement maintains
the performance potential of
the policy space
…√
X
4.4 Mathematic programming formulation
Note that the vanishing states can be “collapsed” to tangible
states since they have zero sojourn times and zero rewards.
Then the SMP becomes a continuous time Markov chain
(CTMC)
The steady-state distribution π(ζ) can either be
(i) computed through the “balance equation”, or
(ii) estimated through steady-state simulation
The whole
state space
The green and yellow nodes correspond to the two static random
switches that remain in the state space of the example RAS of Section
4, after refinement of the initial random switches.
4.5. Computational challenges
Explosion of vi => Explosion of ζij
Explosion of π(ζ)
In the example system:
3 stages
2 single servers
2 buffers of capacity 2
19 tangible states
47 vanishing states
20 random switches
27 decision variables
state
space
Increasing system size =>
5.3 Stochastic approximation: coping with the
explosion of π(ζ)
A typical iteration of stochastic approximation is:
ζk+1 = ζk + γk Yk
ζk is the vector of decision variables at iteration k, γk is the
positive step size, and Yk is the improvement direction.
A typical choice of Yk for the average-reward problem of
irreducible Markov chains is the estimated gradient. In this work,
we adapt the Likelihood Ratio gradient estimator with a sample
size of 2N regenerative cycles at each iteration, then:
where
p is transition probability
u is revisiting time to the reference state
Λ is sum of likelihood ratio of p, i.e.
k
uj
jj
jj
k
k
mmp
mmp
1
1
1
)(
),(
),(
1111
1
212
1
0
1
12222
2
22
12
12
2
12
2
22
12
22
12
12
2
)()(
])()[(])()[(
2ˆ
i
i
i
i
i
i
i
i
i
i
i
i
u
uk k
u
uk k
u
uk k
u
uk k
u
uk kkii
N
i
u
uk kkii
N
mrmr
mruumruu
u
N
Y
6. Conclusion
An integrated framework for real-
time management of sequential
resource allocation systems based
on
• the (formal) representational
power of GSPNs;
• a parsimonious representation
of the underlying conflicts;
• a pertinent specification of the
set of target scheduling
policies;
• results from sensitivity analysis
of Markov reward processes.
The table shows the effectiveness of the complexity
control of 20 RAS configurations
(Config. 1 is the example system of Section 4)
R.S. = random switch(es)
D.V. = decision variable(s)
Config.
Origin Apply refinement Apply static R.S.
Num. of
R.S.
Num. of
D.V.
Num. of
R.S.
Num. of
D.V.
Num. of
R.S.
Num. of
D.V.
1 20 27 5 5 2 2
2 4 4 1 1 1 1
3 40 56 11 11 2 2
4 128 177 35 35 2 2
5 1,007 1,374 269 269 2 2
6 71 84 9 9 1 1
7 346 463 49 49 2 2
8 742 966 112 112 2 2
9 4,304 5,498 677 677 2 2
10 13,302 20,948 2,083 2,290 13 15
11 7,573 11,368 1,513 1,513 4 4
12 2,781 4,018 678 678 4 4
13 2,468 3,759 609 609 5 5
14 519 693 106 106 5 5
15 4,256 5,887 759 759 6 6
16 1,851 2,534 243 243 6 6
17 163,695 270,738 30,805 35,420 15 17
18 74,655 109,948 12,313 12,313 4 4
19 322,052 525,166 80,142 85,117 19 22
20 788,731 1,270,562 139,496 154,069 14 17
0
1
2
3
5
6
7
8
9
μ2 / (μ1 + μ2)
12
17
23
μ2 / (μ2 + μ3)
18
19
22
11
55
62
65
26
28
29
51
53
μ3 / (μ2 + μ3)
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
49
50 47
44
46
45
25
323334
36
38
4041
63
35
64
10
16
15
μ2 / (μ2 + μ3)
μ3 / (μ2 + μ3)
μ1 / (μ1 + μ2)
24
μ1 / (μ1 + μ2)
μ2 / (μ1 + μ2)
31
39
42
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
43
μ3 / (μ2 + μ3)
μ2 / (μ2 + μ3)
61
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
52
μ2 / (μ2 + μ3)
54
μ3 / (μ2 + μ3) μ2 / (μ2 + μ3)
21
μ3 / (μ2 + μ3)
t3
t3
t5
t5
t6
26
28
29 30
t6
t2
25 36
37
38
27
t2
35
t3
t6
t5
t0
31
t3
t6
t6
t3
t5
39
t0
t3
t3
t5
t5
t6
t6
t2
25
27
t2
t3
t6
t5
t0
t3
t6
t6
t3
t5
t0
26
28
29 30
36
37
38
35
31
39
t3
t3
t5
t5
t6
t6
t2
25
27
t2
t3
t6
t5
t0
t3
t6
t6
t3
t5
t0
26
28
29 30
36
37
38
35
31
39
t3
26
t6
t2
25 36
38
31
t5
39
t0