In component-based simulation systems, simulation runs are usually executed by combinations of distinct components, each solving a particular sub-task. If multiple components are available for a given sub-task (e.g., different event queue implementations), a simulation system may rely on an automatic selection mechanism, on a user decision, or --- if neither is available --- on a predefined default component. However, deciding upon a default component for each kind of sub-task is difficult: such a component should work well across various application domains and various combinations with other components. Furthermore, the performance of individual components cannot be evaluated easily, since performance is typically measured for component combinations as a whole (e.g., the execution time of a simulation run). Finally, the selection of default components should be dynamic, as new and potentially superior components may be deployed to the system over time. We illustrate how player rating systems for team-based games can solve the above problems and evaluate our approach with an implementation of the TrueSkill(tm) rating system (Herbrich et al, 2007), applied in the context of the open-source modeling and simulation framework JAMES II. We also show how such systems can be used to steer performance analysis experiments for component ranking.
The paper can be found here: https://docs.google.com/file/d/0BxPrl7QoBqmoUDVXNmZUc29Nbmc/edit?usp=sharing
Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)
1. Evaluating Simulation Software
Components with Player Rating
Systems
6. 3. 2013, SIMUTools 2013
Jonathan Wienß Michael Stein Roland Ewald
Sponsored by:
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1
2. Component-Based Simulation Systems
• Simulator: combination of components
• Typical components:
• Event management
• Collision detection
• State saving
• Result storage
• Random number generation
• etc.
• Example: JAMES II
http://flickr.com/photos/jdhancock/7239958506, cc-by
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 2
3. Problem: Evaluating Individual Components
https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg
• Only component combinations are comparable
• Dedicated performance studies are expensive & difficult
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 3
4. Solution: Player Rating Systems
Performance Comparison Multiplayer Team Results
E.g., Event Queues
{ A B 1. SC
2. SE B
Simulators
{ SC SD SE 3. SD A
15 s 25 s 17 s
1. Component Combination = Team of Players
2. Record results (of multiple combinations)
3. Update global component rating
⇒ Component Rating Systems, e.g. to find good default components.
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 4
5. Component Rating Systems
• What is required?
• How does it work?
• How well does it work?
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 5
7. Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM )
• Input:
• Team defined by player indices, e.g., Ai = {4, 8, 125}
• Team assignment A = {A1 , . . . , Ak } (pairwise disjoint)
• Team ranking r (game result)
• Output: player skill ratings µ i
• Assumptions:
• Player skill si ∼ N (µi , σi2 )
• Player performance pi ∼ N (si , β 2 )
• Team performance tj = i ∈ A pi
j
1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 2007
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 7
8. Bayesian Inference in TrueSkill
P (r |s , A)p(s )
p(s |r , A) =
P (r |A)
∞ ∞
= ... p(s , p , t |r , A)d p d t
−∞ −∞
r : ranking t : team performances s : player skills
A: team assignment p : player performances
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 8
9. Factor Graphs & Message Passing
parison Multi-Player Team Results
B 1. SC Skills sSC sSE sB sSD sA
2. SE B
SE
3. SD A
17 s Performance pSC pSE pB pSD pA
1. Pass messages downwards:
s→p→t
2. Expectation propagation (≈): Team
t ↔ d (r ) Performance tSC tSE+B tSD+A
3. Pass messages upwards:
t→p→s Team
Performance
Difference
d1 d2
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 9
10. Limitations & Adaptations
• Strong assumptions that may not hold:
• Player performance independence
• Normally distributed performance
• No additive team performance → average
• Player may play in more than one team
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 10
12. Experiment Setup
1. A, r
Simulation Problems 2.
A B
Component Rating System
Eligible Component
Combinations
SD SE Current Event Queue Ranking:
1. ...
Execution Times
... ? Count Inversions
10. ...
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 12
13. Evaluation: Ranking Event Queues
25
Default Setup
β = 833.3
20
Average Number of Inversions
15
10
5
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Component Combination Comparisons
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 13
14. Summary
Problem: How to evaluate individual components of a simulation system?
Solution: A scalable and robust component rating system.
Method: Bayesian inference (MS TrueSkill algorithm).
Outlook:
• Global component rankings
• Consider ‘margin of victory’
• Improve usage for experiment steering
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 14
16. Thank you.
Questions?
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 16
17. Operation Modes
Passive Mode Active Mode
Simulation
Users
Software
Problem Results Performance Problem &
Results Component
Simulation Combinations
Software Match Selection &
Experiment Control
Component Component Component Component
Comparisons Ranks Comparisons Ranks etc.
Component Component
Rating System Rating System
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 17
18. Evaluation: Ranking Event Queues
30
Passive Mode Active Mode
25
Average Number of Inversions
20
15
10
5
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Component Combination Comparisons
6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 18