Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VAmi5s.
Natalia Chechina outlines features of actor and functional programming models, and the reason these models attract so much interest in parallel, concurrent, and scaling world. She talks about fault tolerance, its importance in large scale systems, and the approaches to implement it. Filmed at qconnewyork.com.
Natalia Chechina is a Research Fellow at the University of Glasgow. She received her PhD degree from Heriot-Watt University, UK in 2011. She then was a Research Associate at the University of Glasgow leading Scalable Distributed Erlang work package (WP3) in the RELEASE project.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Scaling Distributed Systems
1. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Scaling Distributes Systems
Natalia Chechina
and RELEASE Team
June 11, 2015
N. Chechina, RELEASE team Scaling Distributes Systems
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/actor-functional-scale
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Who am I?
2011: Received PhD degree in Computer Science from
Heriot-Watt University, UK
2011-2015: WP3 lead in the EU RELEASE Project at
Glasgow University, UK
March 2015: Research Fellow at Glasgow University, UK
Main research interest: Scaling distributed computations on
commodity hardware
N. Chechina, RELEASE team Scaling Distributes Systems
5. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Sources
Research findings
Experience from the RELEASE project
Funded by EU FP7 Framework
5 academic & 3 industrial partners
Aim: To scale the radical actor (concurrency-oriented)
paradigm to build reliable general-purpose software, such as
server-based systems, on massively parallel machines (105
cores)
Erlang programming language
Experience of other researches and developers
N. Chechina, RELEASE team Scaling Distributes Systems
6. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Scaling a Sysem
Scaling ALL aspects of computation
Application
Language
Virtual Machine
In-memory data structures
Persistent data structures
Tools (debugging, monitoring, etc)
N. Chechina, RELEASE team Scaling Distributes Systems
7. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Scaling a Sysem
Scaling ALL aspects of computation
Application
Language
Virtual Machine
In-memory data structures
Persistent data structures
Tools (debugging, monitoring, etc)
N. Chechina, RELEASE team Scaling Distributes Systems
8. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Scaling on language level
Actor model
Functional programming
N. Chechina, RELEASE team Scaling Distributes Systems
9. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Language – Actor Model
Built-in concurrency
Actors have own states and don’t share them
Communication between actor happens only via message
passing
Actors can spawn new actors
N. Chechina, RELEASE team Scaling Distributes Systems
10. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Language – Functional programming
Fundamental operation – application of functions to arguments
Higher-order functions – well-structured software
Modules – independent, reusable
Lazy evaluations
Variables given values only once
N. Chechina, RELEASE team Scaling Distributes Systems
11. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Fault Tolerance
105 cores – approx. failure of 1 core per hour
Non-defensive approach – Supervision & ”Let it crash”
N. Chechina, RELEASE team Scaling Distributes Systems
12. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Philosophy
Principles
Ideas
Core values
N. Chechina, RELEASE team Scaling Distributes Systems
13. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
RELEASE Aim
To scale the radical actor (concurrency-oriented) paradigm to build
reliable general-purpose software, such as server-based systems, on
massively parallel machines (105 cores).
N. Chechina, RELEASE team Scaling Distributes Systems
14. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
RELEASE Aim
To scale the radical actor (concurrency-oriented) paradigm to build
reliable general-purpose software, such as server-based systems, on
massively parallel machines (105 cores).
Erlang
N. Chechina, RELEASE team Scaling Distributes Systems
15. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
RELEASE Aim
To scale the radical actor (concurrency-oriented) paradigm to build
reliable general-purpose software, such as server-based systems, on
massively parallel machines (105 cores).
Erlang
VM aspects, e.g. synchronisation on internal data structures
Language aspects, e.g. maintaining a fully connected
network of nodes, explicit process placement
Tool support
N. Chechina, RELEASE team Scaling Distributes Systems
16. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
RELEASE Aim
To scale the radical actor (concurrency-oriented) paradigm to build
reliable general-purpose software, such as server-based systems, on
massively parallel machines (105 cores).
Erlang
VM aspects, e.g. synchronisation on internal data structures
Language aspects, e.g. maintaining a fully connected
network of nodes, explicit process placement
Tool support
N. Chechina, RELEASE team Scaling Distributes Systems
17. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Typical Target Architecture - 105
cores
Commodity hardware
Non-uniform communication
(Level0 – same host, Level1 – same cluster, etc)
N. Chechina, RELEASE team Scaling Distributes Systems
18. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Erlang Overview
Erlang
is a functional general purpose concurrent programming
language developed in 1986 at Ericsson
is dynamically typed
was designed for distributed, fault-tolerant, massively
concurrent, and soft-real time systems
follows let it crash and share nothing philosophy
The language primitives are processes.
Erlang concurrency is handled by the language and not by the
operating system [Arm10].
N. Chechina, RELEASE team Scaling Distributes Systems
19. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang
Distributed Erlang
N. Chechina, RELEASE team Scaling Distributes Systems
20. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang
Transitive connections
N. Chechina, RELEASE team Scaling Distributes Systems
21. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang
Transitive connections
Explicit Placement, i.e.
spawn(Node, Module, Function, Args) → pid()
N. Chechina, RELEASE team Scaling Distributes Systems
22. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang Scalability Limitations
Global operations
Global operations, i.e. registering names using global module
N. Chechina, RELEASE team Scaling Distributes Systems
23. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang Scalability Limitations
Global operations
Global operations, i.e. registering names using global module
Other global operations, e.g. using rpc:call to call multiple nodes
N. Chechina, RELEASE team Scaling Distributes Systems
24. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang Scalability Limitations
Global operations
Global operations, i.e. registering names using global module
Other global operations, e.g. using rpc:call to call multiple nodes
All-to-all transitive connections
N. Chechina, RELEASE team Scaling Distributes Systems
25. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang Scalability Limitations
Global operations
Global operations, i.e. registering names using global module
Other global operations, e.g. using rpc:call to call multiple nodes
All-to-all transitive connections
But... aren’t global operations and transitivity are optional in
distributed Erlang? Why use them if they are a bottleneck?
N. Chechina, RELEASE team Scaling Distributes Systems
26. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Distributed Erlang Scalability Limitations
Global operations
Global operations, i.e. registering names using global module
Other global operations, e.g. using rpc:call to call multiple nodes
All-to-all transitive connections
But... aren’t global operations and transitivity are optional in
distributed Erlang? Why use them if they are a bottleneck?
Reliability and fault tolerance – when a process or a node fail, the
remaining nodes know about that. The same holds for the recovery
It’s already there – no extra effort to connect nodes and distribute
information
Easy to scale – a new node knows about running nodes, and vice
versa
N. Chechina, RELEASE team Scaling Distributes Systems
27. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Scalable Distributed (SD) Erlang
SD Erlang is a small conservative extension of Distributed Erlang
Network Scalability
All-to-all connections are not scalable onto 1000s of nodes
Aim: Reduce connectivity
Semi-explicit Placement
Becomes not feasible for a programmer to be aware of all nodes
Aim: Automatic process placement in groups of nodes
N. Chechina, RELEASE team Scaling Distributes Systems
28. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Node Connections vs. S group Node Connections
N. Chechina, RELEASE team Scaling Distributes Systems
29. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Node Connections vs. S group Node Connections
N. Chechina, RELEASE team Scaling Distributes Systems
30. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Node Connections vs. S group Node Connections
N. Chechina, RELEASE team Scaling Distributes Systems
31. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Node Connections vs. S group Node Connections
N. Chechina, RELEASE team Scaling Distributes Systems
32. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Node Connections vs. S group Node Connections
N. Chechina, RELEASE team Scaling Distributes Systems
33. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
34. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
35. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
36. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
37. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
38. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Connections between Different Types of Nodes
N. Chechina, RELEASE team Scaling Distributes Systems
39. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Why S groups?
Preserve Erlang phylosophy & transitivity and scale
Considered approaches
Grouping nodes according to their hash values
A hierarchical approach
Overlapping s groups
Other approaches
Distributed Erlang global groups
Spapi Router (SpilGames)
Custom routing on non-transtively connected normal or
hidden nodes
N. Chechina, RELEASE team Scaling Distributes Systems
40. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Hierarchical Grouping
N. Chechina, RELEASE team Scaling Distributes Systems
41. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Free Nodes and S groups
N. Chechina, RELEASE team Scaling Distributes Systems
42. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Embedded Grouping
N. Chechina, RELEASE team Scaling Distributes Systems
43. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
SD Erlang Improves Scalability
N. Chechina, RELEASE team Scaling Distributes Systems
44. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Speed Up of Distributed Erlang Orbit & SD Erlang Orbit
N. Chechina, RELEASE team Scaling Distributes Systems
45. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Speed Up of Distributed Erlang ACO & SD Erlang ACO
N. Chechina, RELEASE team Scaling Distributes Systems
46. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Semi-Explicit Placement
Communication latencies between nodes may vary according
to their relative positions
In terms of communication time nodes may be “nearby” or
“far away”
We may wish some tasks to be close together because they’re
communicating with each other a lot
N. Chechina, RELEASE team Scaling Distributes Systems
47. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Example
System structure
N. Chechina, RELEASE team Scaling Distributes Systems
48. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Example: system structure
Racks
N. Chechina, RELEASE team Scaling Distributes Systems
49. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Example: system structure
Clusters
N. Chechina, RELEASE team Scaling Distributes Systems
50. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Example: system structure
Cloud
N. Chechina, RELEASE team Scaling Distributes Systems
52. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Measuring communication distance
We can define a distance function d on the set V of Erlang VMs in
a distributed system by
d(x, y) =
0 if x = y
2− (x,y) if x = y.
where (x, y) is the length of the longest path which is shared by
the paths from the root to x and y.
N. Chechina, RELEASE team Scaling Distributes Systems
53. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
Distances
(b, c) = 2
d(b, c) = 2−2 = 1/4
N. Chechina, RELEASE team Scaling Distributes Systems
56. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
SD Erlang
Network Scalability
Validation
Semi-Explicit Placement
choose nodes/1
Every node may have a list of attributes
choose nodes/1 function returns a list of nodes that satisfy
given restrictions
s_group: choose_nodes ([Parameter]) -> [Node]
where
Parameter = {s_group , SGroupName} | {attribute , AttributeName }
| {nearer , 0.4} | {between , 0.5, 0.7}
SGroupName = group_name ()
AttributeName = term ()
N. Chechina, RELEASE team Scaling Distributes Systems
57. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
S group Operational Semantics
Validation of SD Erlang Semantics and Implementation
Operational Semantics
(state, command, ni) −→ (state , value)
Executing command on node ni in state returns value and
transitions to state .
N. Chechina, RELEASE team Scaling Distributes Systems
58. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
S group Operational Semantics
Validation of SD Erlang Semantics and Implementation
Validation of Semantics and Implementation
Validate the consistency between the formal semantics and
the SD Erlang implementation
Use Erlang QuickCheck tool developed by QuviQ
Behaviour is specified by properties expressed in a logical form
eqc statem is a finite state machine in QuickCheck
Figure: Testing SD Erlang Using QuickCheck eqc statem
N. Chechina, RELEASE team Scaling Distributes Systems
59. Distributed Erlang
Scalable Distributed (SD) Erlang
Operational Semantics
Plans
Sources
Ongoing and Future Work
S groups
Introduce more patterns, for example, routing for a tree
structure
Analysis of fault tolerance strategies and features in SD
Erlang applications
Semi-explicit Placement
Discovering system structure at runtime
Robustness – dynamically adjusting a view of the system if
new nodes join it, or if existing ones fail
N. Chechina, RELEASE team Scaling Distributes Systems