As systems and user bases grow, a once abundant resource can become scarce. While scaling out PlayStation services to millions of users at over a 100,000 requests/second, network throughput became a precious resource to optimize for. Alex and Dustin talk about how the microservices that power Playstation achieved low latency interactions while conserving on precious network bandwidth. These services powered by Amazon Elastic Load Balancing and Amazon DynamoDB benefitted from soft-state optimizations, a pattern that is used in complex interactions such as searching through a user’s social graph in sub 100 ms, or a user’s game library in 7 ms. As a developer utilizing Amazon Web services, you will discover new patterns and implementations which will better utilize your network, instances, and load balancers in order to deliver personalized experiences to millions of users while saving costs.
3. Who is talking
Alexander Filipchik (PSN: LaserToy)
Principal Software Engineer
at Sony Interactive Entertainment
Dustin Pham
Principal Software Engineer
at Sony Interactive Entertainment
4. Agenda
• Quick PSN overview
• Standard Stateless architecture
• Soft State overview
• How we applied it
5. The Rise of PlayStation4
PlayStation Network is big and growing.
– Over 65 million monthly active users
– Hundreds of millions of users
– More than 47M PS4s
– A Lot of offerings
10. Stateless Design
• The go to design
• Created a distinction between applications
and scalable databases
• Pure Stateless systems are extremely rare,
web calculator is a good example
• Sometimes design is taken to an extreme
11. Pros
• Easy to scale horizontally
• Easy to program
• Works in multiregional deployments
– As long as your underlying tech works
• 1 step to serverless
12. Cons
• You rely on someone else's code to deal with
state
• And hope it scales
• Complex uses cases require a lot of network
communications (social networks)
• Memory and disk are not utilized to their full
capacity
• And…
13. A lot of times system look like
Scalable Black
Magic
Your
code
Your
code
Your
code
Client
Your
code
Your
code
The state
14. Any cool alternatives?
• Lambda architecture for services
• Soft state
– Accept the fact that state exists and use it to your
advantage
15. Soft State
In computer science,
soft state is state
which is useful for
efficiency, but not
essential, as it can be
regenerated or
replaced if needed.
16. Highlights
• A simple state that is kept in memory for
performance
• Harder to program, but can save a lot of money
• Some systems are not feasible without it
23. Our Social Graph
100s millions of users
Growing number of
connections
Rich networking
features
24. New Feature/new Journey
• We want users to be able to find other users on the platform
• We should respect privacy settings
• We want to recommend new friends to users (You May Know)
• When user searches we want to display result in the following
order:
– Direct friends
– Friends of Friends 0_o
– Everyone else
• Do it fast with a small team of ninjas (small means 2)
26. So, we figured out
• We can use Solr to index everyone, so we can do
platform wide search
• And try to use it for indexing relations between users,
so we can
– Sort by distance (direct, friend of a friend)
– Sort by other user related fields (who do I play with often,
Facebook friends, and so on)
– You may know is another search: Give me 10
friend of friends sorted by number of common
friends
27. But
• Data has both high and low cardinality
properties
• We will need to somehow index relations
between users. And it is not obvious.
• And it will not be very fast because Solr is
optimized for a completely different use case
28. Options
• Tried graph databases and find a lot of
reasons not to use them
• Dump everything into a scalable database in a
denormalized format and see what happens
– We can store friends graph there (the State) and
write stateless service to deal with requests
29. So, we came up with The Schema
Account 1 Friend 1 Friend 2 …. Friend n
Now it horizontally scales as long
as NoSQL scales
Now we need to figure out how to support
flexible queries
{
"firstName":"Dustin",
"lastName":"Pham",
"gamerName":"gamer 0"
}
30. Going deeper
• What does the client want:
– Search, sort, filter
• What can we do:
– Use some kind of NoSql secondary Index (Cassandra,
Couchbase, …) powered by magic
– Fetch everything in memory and process
– How about…
31. Apply CS 101
• We can index ourselves, and writing indexer
sounds like a lot of fun
• Wait, someone already had the fun and made:
32. Account 1 Friend 1 Friend 2 …. Friend n
Schema v2
Account 1 Friend Friend n Version
Now We can Search on anything inside the row that represents the user
Index is small and it is fast to pull it from NoSql
But we will be pulling all this bytes (indexes) all
the time (stateless design again!!!)
And what if 2 servers modify same row?
33. Distributed Cache?
• It is nice to keep things as close to
our Microservice as possible
• So we can have a beefy
Memcached/Redis/Aerospike/…
• And Still pay Network penalty and
think about scaling them
34. Soft State?
• Cache lives inside the MicroService, so no network penalty
• Requests for the same user are processed on the same instance, so we
can save network roundtrip and also have some optimizations done
(read/write lock, etc)
• Changes to State also are replicated to the storage and are identified with
some version number
• We will need to check index version before doing search to make
sure index is not stale
35. Or in Other Words
Account 1
Version
Account 2
Version
Account 3
Version
Account 4
Version
Account 5
Version
Account 6
Version
Account1 jsons Version
Account2 jsons Version
Account3 jsons Version
Account4 jsons Version
Account5 jsons Version
…. … … …
Account n jsons Version
Instance 1
Instance 2
Instance 3
NoSql
36. Routing
• How to find where to route?
– Lookup table
– Gossiping algorithm
– Routing master
• How to maintain?
– Change in capacity
– New deployment
37. We followed KISS
• We are on AWS so we just used ELB stickiness with a
twist
• It works only with cookies so, you will need to
somehow store them
• Client library is smart and writes accountId-
>AWSStickyCookie to a shared cache (or Dynamo)
• Before sending request through ELB we pull sticky
cookie from the shared cache and attach it to the
request
38. Server side AWS stickiness
Smart
Client
Does user have sticky
session in the cache?
Attach it to the request and
send to ELB
Send the request to ELB
Store returned sticky
cookie in cache
Call Yes
No
Return
39. The whole system looked like:
Solr Cloud
Social
Network
Friends
Finder
Cassandra
Queue
Personalized
Search
Microservice
Indexer
Change Change
New user
Privacy update
etc
Friendship changed
Name change
etc
40. How did it do?
• Solr was fine
• Personalized part not so fine
• Each change in friendship required reindexing of a lot of
users
• Same goes for privacy changes
• Our NoSQL (Cassandra) uses SSTables, so space is not
released right after an update
• Data size was growing much faster than
we expected
44. Crazy idea
• What if we do ephemeral indexes?
• They can live in memory for the duration of the
user’s search session and then get discarded
• We can use the same code, we just need to
slightly change it
46. Is it fixed yet?
• Not really
• Now we need to make indexing really fast
• And significant time and resources are spent
on pulling user related data from Social
network
• Wait, we’ve just talked about Soft State. What
if?
47. Let’s do math
• Number of users: hundreds of millions, but
number of active is less
• Each user has some searchable metadata; let’s
say it is 200 bytes
• How much memory will we need to cache all the
active ones?
• 100000000 * 200/ (1024 * 1024* 1024) = 18Gb
48. We can organize it like
App Memory
Java Heap
(8Gb)
Off Heap Ehcache (40 Gb)
Accounts info (20Gb) Lucene indexes (20Gb)
SSD if we need to spill over
49. Will it work?
• On AWS we can have up to 256Gb of ram
(r3.8xlarge) and instances have SSDs which
usually do nothing
• Actually, with new X1 family we can have up to
1.9 TB
• So, it sounds like it can work