3. SRE mission at Mercari
● To ensure a reliable service that is enjoyable to use at anytime
● Takes care of all engineering apart from new service development
○ Performance improvement, automation, security etc
10. Challenges
● Code is too huge/complex to understand
● Team is too large to efficiently work on shared code base
● Communication overhead is too large
● Velocity (development cycle) is stalled...
12. Microservices?
● Architectural and organizational approach to software development
○ To speed up deployment cycles
○ Foster innovation and ownership
○ Improve maitainability and scalability
14. Microservices
● Do one thing well
○ Unix philosophy
○ One function in one service, not multiple functions in one service
● Decentralized Governance
○ Each team has ownership on each service
● Independent
○ Each service can be changed, upgraded, or replaced independently
● Polyglot
○ Right framework and tool for each domain
15. Goal
● Software Engineer
○ Without velocity stalled, rather make feature improvement iteration speed fast
○ -> Provide great features to customers faster
● SRE
○ Provide automated platform for microservice
○ Give some responsibility (e.g., deployment, debug) to software engineering
○ -> Focus on more SRE related software engineering task
24. Container
● Resource isolation
● Resource limitation
● Fast boot (vs. VM)
Docker
● Easy to build container image
● Easy to distribute via registry
25. Why Docker?
● Software engineer control more
○ They can include what they want (e.g., runtime, library)
● Environmental parity
○ What works on local development (or QA env) is exact same (easy to debug)
○ No more “it works on my environment but not in production!”
● Easy to deploy
○ Docker image ≒ Single static linked binary
○ You already know its benefit if you use Go
26. Kubernetes (GKE)
● Container orchestration
● Derives from Google internal
system named Borg & Omega
● Inspired and informed by
Google’s experiences and
internal systems
27. Why kubernetes?
● Best way to maximize container benefit
○ Resource isolation/limitation enables us compute resource utilization. But how?
■ K8s can correctly schedule container proper instances
○ How to communicate between dynamically scheduled containers?
■ K8s provide the service discovery
● Reduce operation costs
○ Self healing & auto scaling
● Infrastructure of infrastructure
○ Industrial standard https://githubengineering.com/kubernetes-at-github
○ More tools/software comes top on k8s in future (I guess)
28. gRPC
● gRPC Remote Procedure Call
● High performance, general
purpose, open source,
standards-based, RPC
framework
● Open source version of stubby
RPC in used in Google
29. gRPC
● Simple service definition
○ By default, gRPC uses protocol buffers as the Interface Definition Language (IDL) for
describing both the service interface and the structure of the payload messages.
● Works across languages and platforms
○ Write golang server and python client
○ Utilize polyglot microservices
30. Why not REST?
● Who can implement REST correctly?
○ High cost to design (Path? Parameters? hah?)
○ Eventually it’s just HTTP endpoints
● No more HTTP client implementation ..
33. Deployment
● Deployment is key in microservices platform
○ “Without velocity stalled, rather make iteration speed faster”
● We need easy & safe automated deployment system
○ We started chatbot style deployment but it was not scale
34. Spinnaker
● Continuous Delivery platform
● Developed in Netflix
○ Worked with Google and open
sourced in 2015
● Support multi cloud
○ Kubernetes!, GCE, AWS
37. Why Spinnaker?
● Kubernetes support
● Built-in deployment best practice from Netflix and Google
○ Immutable infrastructure
○ Blue/Green deployment, Canary deployment
○ Manual judgement (by manager) phase
○ Run integration tests
38. Spinnaker in Mercari
● Currently only for container deployment to kubernetes
● Each team uses spinnaker to deploy their own services
● One spinnaker handles all microservices in all region
45. Request ID in log
● Which service caused problem in one request?
46. Request ID in log
Gateway API
Mercari API
HTTP
search
personalization
offer
HTTP
gRPC
① Generate unique ID
② Annotate log by the
ID in same request
HTTP headergRPC metadata
47. Request ID in log
Search by request ID
Log from gateway
Log from service X
50. Metrics
Selection of metrics service/software is still on-going discussion & trial
● First support of container and kubernetes
● Integration with kubernetes ecosystem
○ Spinnaker, istio and so on
● Service dependency visualization
55. State of microservices in JP
JP is just started
● Some services (Machine learning product) are started to containerized and
deployed on GKE
● On-going discussion about the best architecture
56. Conclusion
● Why we started microservices?
● Current state of US microservices and challenges
60. Testing
Testing in microservice is hard?
● Focus on unit tests as usual
○ Because each service is supposed to independent
○ Each microservices must measure testing coverage
● Integration tests?
○ Use mock instead of working hard for preparing local env
62. QA environment
How to test development feature from QA device?
● Pull request (PR) based pod creation
63. PR based pod creation
Proxy API gateway (master)
API gateway (PR 313)
API gateway (PR 314)
Proxy by PR number
Set RP number
Container is
deployed via CI
65. PR based pod creation
Proxy API gateway (master)
API gateway (PR 313)
API gateway (PR 314)
Service A (master)
Service A (PR 21)
Proxy by PR number
Set RP number
Container is
deployed via CI
67. Service mesh
Don’t trust each other!
● Traffic management
○ API rate limit, circuit breaker
● Policy enforcement
○ Ensure access policies (which service can access which service?)
We should realize above without modifying client/server code!
70. Chaos engineering
● Real world is hard …
○ machine is crashed, network is unstable (especially in distributed system)
● Dependent service fails anytime
71. Chaos engineering
● Service must be fault tolerance whenever something wrong
● Emulate real world problem
○ We need to identify weaknesses
■ Improper fallback settings when a service is unavailable
○ Software Engineer should be aware