Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Evaluating Simulation Software
Components with Player Rating
Systems
6. 3. 2013, SIMUTools 2013

Jonathan Wienß Michael Stein Roland Ewald

Sponsored by:

6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1

Component-Based Simulation Systems

• Simulator: combination of components

• Typical components:

• Event management
• Collision detection
• State saving
• Result storage
• Random number generation
• etc.

• Example: JAMES II
http://ﬂickr.com/photos/jdhancock/7239958506, cc-by


Problem: Evaluating Individual Components

https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg

• Only component combinations are comparable
• Dedicated performance studies are expensive & difﬁcult

Solution: Player Rating Systems
Performance Comparison Multiplayer Team Results

E.g., Event Queues
{ A B 1. SC
2. SE B
Simulators
{ SC SD SE 3. SD A
15 s 25 s 17 s

1. Component Combination = Team of Players
2. Record results (of multiple combinations)
3. Update global component rating
⇒ Component Rating Systems, e.g. to ﬁnd good default components.


Component Rating Systems

• What is required?

• How does it work?

• How well does it work?


Component Rating Systems: Requirements

• Re-usable (system-independent)

• Inexpensive (memory, execution time)

• Scalable (w.r.t. components / component combinations)

• Robust (w.r.t. ‘outlier problems’)

• Adaptive (component updates)


Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM )
• Input:
• Team deﬁned by player indices, e.g., Ai = {4, 8, 125}
• Team assignment A = {A1 , . . . , Ak } (pairwise disjoint)
• Team ranking r (game result)

• Output: player skill ratings µ i

• Assumptions:
• Player skill si ∼ N (µi , σi2 )
• Player performance pi ∼ N (si , β 2 )
• Team performance tj = i ∈ A pi
j

1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 2007

Bayesian Inference in TrueSkill

P (r |s , A)p(s )
p(s |r , A) =
P (r |A)
∞ ∞
= ... p(s , p , t |r , A)d p d t
−∞ −∞

r : ranking t : team performances s : player skills
A: team assignment p : player performances


Factor Graphs & Message Passing
parison Multi-Player Team Results

B 1. SC Skills sSC sSE sB sSD sA

2. SE B
SE
3. SD A
17 s Performance pSC pSE pB pSD pA

1. Pass messages downwards:
s→p→t
2. Expectation propagation (≈): Team

t ↔ d (r ) Performance tSC tSE+B tSD+A

3. Pass messages upwards:
t→p→s Team
Performance
Difference
d1 d2


Limitations & Adaptations

• Strong assumptions that may not hold:
• Player performance independence
• Normally distributed performance

• No additive team performance → average

• Player may play in more than one team


Ranking Event Queues in JAMES II: Reference Data
Event Queues / Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sum
MList 4 4 5 2 3 5 7 3 5 4 4 4 4 4 4 5 1 1 1 6 76
LinkedList 1 1 1 5 1 3 3 5 3 7 6 6 6 6 6 2 5 5 5 3 80
TwoList 2 2 2 6 2 2 2 7 2 3 7 7 7 7 7 4 6 6 6 4 91
CalendarQueue 7 7 8 3 6 8 8 1 6 5 2 2 2 3 2 9 2 2 2 7 92
BucketsThreshold 10 9 9 1 8 1 4 6 10 8 1 1 1 1 1 8 4 4 4 2 93
MPLinkedList 3 3 3 7 4 6 5 4 4 1 9 9 9 9 9 3 7 7 7 5 114
CalendarReQueue 9 10 10 4 7 9 6 10 9 9 3 3 3 2 3 7 3 3 3 8 121
Heap 5 5 4 8 5 4 1 9 8 6 8 8 8 8 8 6 9 9 9 1 129
Simple 8 6 6 9 9 7 9 2 1 10 5 5 5 5 5 1 10 10 10 9 132
DynamicCalendarQueue 6 8 7 10 10 9 10 8 7 2 10 10 10 10 10 10 8 8 8 10 171

55 55 55 55 55 54 55 55 55 55 55 55 55 55 55 55 55 55 55 55
• Five models for each formalism: SRS, stoch-π, PDEVS, SR
• Per formalism: (1 + 3 + 1 + 3 = 8 simulators) × 10 event queues
• 80 comp. combinations × 20 replications × 5 models = 8.000 runs


Experiment Setup

1. A, r
Simulation Problems 2.

A B
Component Rating System
Eligible Component
Combinations
SD SE Current Event Queue Ranking:
1. ...
Execution Times
... ? Count Inversions
10. ...


Evaluation: Ranking Event Queues
25
Default Setup
β = 833.3
20
Average Number of Inversions

15

10

5

0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Component Combination Comparisons


Summary

Problem: How to evaluate individual components of a simulation system?
Solution: A scalable and robust component rating system.
Method: Bayesian inference (MS TrueSkill algorithm).

Outlook:
• Global component rankings
• Consider ‘margin of victory’
• Improve usage for experiment steering


http://bitbucket.org/alesia

6. 3. 2013 c 2013 (License: Apache 2.0)
UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 15

Thank you.
Questions?


Operation Modes

Passive Mode Active Mode
Simulation
Users
Software
Problem Results Performance Problem &
Results Component
Simulation Combinations
Software Match Selection &
Experiment Control
Component Component Component Component
Comparisons Ranks Comparisons Ranks etc.

Component Component
Rating System Rating System


Evaluation: Ranking Event Queues
30
Passive Mode Active Mode
25
Average Number of Inversions

20

15

10

5

0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Component Combination Comparisons

Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Recommandé

Recommandé

Contenu connexe

Similaire à Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Similaire à Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013) (20)

Dernier

Dernier (20)

Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)