Microservices Summit - The Human Side of Services

•Télécharger en tant que PPTX, PDF•

1 j'aime•551 vues

Yelp Engineering

Technologie

The human side of services
John Billings, Joseph Lynch
{billings,jlynch}@yelp.com

Yelp’s Mission:
Connecting people with great
local businesses.

2011 - 2016: A Cambrian
Explosion of services

Why?
“There are only two hard problems in distributed
systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly once delivery”
● Mathias Varraes

"/pets": {
"get": {
"description": "Returns all pets from the system",
"responses": {
"200": {
"description": "A list of pets.",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/Pet"
}
}
}
}
}
}
Interface design

Org Objectives
Distributed System Objectives
● Performance
● Security
● Stability
● Cost
These are cross cutting objectives that are crucial in a
SOA.

Org Objectives
No problem …
● Performance is important!
● Security is important!
● Stability is important!
● Cost is important!
… all done right?

Org Objectives
What gets measured gets managed
- Drucker

Service level objectives
● Contracts are cool
● Performance is cool
● Uptime is cool
● Keeping cost low is cool
… So write these things down in a Contract and page
owners when we violate them?

Microservice Pitfalls
(from those that have fallen)

Lots of Flowers
“Let 1000 flowers
bloom. Then rip 999 of them
out by the roots”
- Peter Seibel

Append Only Services
Service Service Service ...
Service Service Service ...
Service Service Service ...

Ditching Libraries
●Libraries can be pretty terrible
– Break tests
– Deploy 20 versions
– Function calls work only in ${language}
– Bugs can take weeks to fix
– Tests can take a long time

Service
Library
That’s not a SOA
… that’s just RPC
Ditching Libraries

●Libraries can be pretty awesome
– Break tests not websites
– Deploy 20 versions
– Function calls are wicked fast
– Have weeks to fix a bug
– Unit tests are fast
Ditching Libraries

Everybody does Ops?
Dev vs Ops = Dev frustration, Ops burnout
Dev + Ops = ?

● Not all Devs can (want to) do Ops
● Not all Ops can (want to) do Dev
● What about?
○ DBAs
○ Security Engineers
○ Designers
Everybody does Ops?

What to Aim For Instead?
1.Encourage cooperation
2.Acknowledge your engineers have varied
skills: {Ops, Dev, Security, Databases, Design,
Frontend, API design, Performance, etc …}
3.Try to build teams that have a wide range of
skills

Image Citations
● Sailing Ship: https://en.wikipedia.org/wiki/Sailing_ship
● Programmer with Laptop:
https://commons.wikimedia.org/wiki/File:Typing_computer_screen_reflection.jpg
● Moore’s Law:
https://en.wikipedia.org/wiki/Moore%27s_law#/media/File:Transistor_Count_and_Moore%2
7s_Law_-_2011.svg
● Cambrian explosion:
https://en.wikipedia.org/wiki/Devonian#/media/File:Fish_evolution.png
● Bus queue: https://en.wikipedia.org/wiki/File:Bus_Queue.jpg
● Deputy badge:
https://commons.wikimedia.org/wiki/File:Badge_of_the_San_Diego_County_Sheriff%27s_De
partment.png

Image Citations
● Deep dive:
https://en.wikipedia.org/wiki/Deep_diving#/media/File:Trevor_Jackson_returns_from_SS_Ky
ogle.jpg
● Map: https://commons.wikimedia.org/wiki/File:Carta_Marina_AB_stitched.jpg
● AWS Total Cost of Onwership: https://aws.amazon.com/blogs/aws/the-new-aws-tco-
calculator/
● Sharing milkshake:
https://commons.wikimedia.org/wiki/File:Children_sharing_a_milkshake.jpg
● Field of flowers: TODO

Recommandé

Humans by the hundredYelp Engineering

RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemEhsan38

Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Engineering

Shawn Wallace - Test automation in brownfield applicationsQA or the Highway

Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24

ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...DevOpsDays Tel Aviv

Making disaster routinePeter Varhol

Building an Open Source AppSec PipelineMatt Tesauro

Recommandé

Humans by the hundredYelp Engineering

RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemEhsan38

Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Engineering

Shawn Wallace - Test automation in brownfield applicationsQA or the Highway

Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24

ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...DevOpsDays Tel Aviv

Making disaster routinePeter Varhol

Building an Open Source AppSec PipelineMatt Tesauro

DOES SFO 2016 San Francisco - Julia Wester - Predictability: No Magic RequiredGene Kim

Mobile Testing at GiltGilt Tech Talks

DevSecCon Singapore 2018 - Pushing left like a boss by Tanya JancaDevSecCon

Handling Changes to Your Server-Side Data ModelGilt Tech Talks

Pick Any Three: Good, Fast, or Safe - Devops from ScratchPete Cheslock

iOS Testing With Appium at GiltGilt Tech Talks

Datadog + VictorOps WebinarDatadog

DevSecCon Singapore 2018 - Insecurity in information technology by Tanya JancaDevSecCon

Fraud Engineering, from Merchant Risk Council Annual Meeting 2012Nick Galbreath

DevOps and the Bottom Line Chef

6 Guidelines for A/B TestingEmily Robinson

No estimates - 10 new principles for testingVasco Duarte

Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...Neotys

Shifting left – embedding security into the devops pipeline by Mike d. KailDevSecCon

8 Blind Spots Often Overlooked When Testing on MobileNeotys

Just4Meeting 2012 - How to protect your web applicationsMagno Logan

Faster Secure Software Development with Continuous Deployment - PH Days 2013Nick Galbreath

Підтримка легасі-платформи. Погляд менеджераDataArt

From monitoring to automated testing, Jesse Reynolds, PuppetPuppet

The Thinking Tester, EvolvedElisabeth Hendrickson

2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!Gene Kim

Dev ops lessons learned - Michael CollinsDevopsdays

Contenu connexe

Tendances

DOES SFO 2016 San Francisco - Julia Wester - Predictability: No Magic RequiredGene Kim

Mobile Testing at GiltGilt Tech Talks

DevSecCon Singapore 2018 - Pushing left like a boss by Tanya JancaDevSecCon

Handling Changes to Your Server-Side Data ModelGilt Tech Talks

Pick Any Three: Good, Fast, or Safe - Devops from ScratchPete Cheslock

iOS Testing With Appium at GiltGilt Tech Talks

Datadog + VictorOps WebinarDatadog

DevSecCon Singapore 2018 - Insecurity in information technology by Tanya JancaDevSecCon

Fraud Engineering, from Merchant Risk Council Annual Meeting 2012Nick Galbreath

DevOps and the Bottom Line Chef

6 Guidelines for A/B TestingEmily Robinson

No estimates - 10 new principles for testingVasco Duarte

Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...Neotys

Shifting left – embedding security into the devops pipeline by Mike d. KailDevSecCon

8 Blind Spots Often Overlooked When Testing on MobileNeotys

Just4Meeting 2012 - How to protect your web applicationsMagno Logan

Faster Secure Software Development with Continuous Deployment - PH Days 2013Nick Galbreath

Підтримка легасі-платформи. Погляд менеджераDataArt

From monitoring to automated testing, Jesse Reynolds, PuppetPuppet

The Thinking Tester, EvolvedElisabeth Hendrickson

Tendances (20)

DOES SFO 2016 San Francisco - Julia Wester - Predictability: No Magic Required

Mobile Testing at Gilt

DevSecCon Singapore 2018 - Pushing left like a boss by Tanya Janca

Handling Changes to Your Server-Side Data Model

Pick Any Three: Good, Fast, or Safe - Devops from Scratch

iOS Testing With Appium at Gilt

Datadog + VictorOps Webinar

DevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca

Fraud Engineering, from Merchant Risk Council Annual Meeting 2012

DevOps and the Bottom Line

6 Guidelines for A/B Testing

No estimates - 10 new principles for testing

Translating Tester-Speak Into Plain English: Simple Explanations for 8 Testin...

Shifting left – embedding security into the devops pipeline by Mike d. Kail

8 Blind Spots Often Overlooked When Testing on Mobile

Just4Meeting 2012 - How to protect your web applications

Faster Secure Software Development with Continuous Deployment - PH Days 2013

Підтримка легасі-платформи. Погляд менеджера

From monitoring to automated testing, Jesse Reynolds, Puppet

The Thinking Tester, Evolved

Similaire à Microservices Summit - The Human Side of Services

2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!Gene Kim

Dev ops lessons learned - Michael CollinsDevopsdays

Self-Service Operations: Because Ops Still HappensRundeck

DevOps Enterprise Summit 2016Shaw Innes

Turning Human Capital into High Performance Organizational CapitalJohn Willis

Why other ppl_dont_get_itjaxLondonConference

Maintainable Machine Learning ProductsAndrew Musselman

Evolving Architecture and Organization - Lessons from Google and eBayRandy Shoup

SRE Topics with Charity Majors and Liz Fong-Jones of HoneycombDaniel Zivkovic

MVP to MLP - Minimum Lovable ProductJake Levirne

DevOps by the Numbers - How to Approach the Measurement and Metrics of Your C...XebiaLabs

All daydevops 2016 - Turning Human Capital into High Performance Organizati...John Willis

Summit 2014 KeynoteAtlassian

SaltConf14 - Justin Carmony, Deseret Digital Media - Teaching Devs About DevOpsSaltStack

Moving Fast At ScaleRandy Shoup

Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media

When UX (guy) Meets OperationsTim Sheiner

Back to Basics: Reporting 101TIBCO Jaspersoft

Oracle Discoverer is dead - Where to next for BI?Sage Computing Services

Lessons Learned from Large Scale Adoption of DevOps for IBM z Systems SoftwareDevOps for Enterprise Systems

Similaire à Microservices Summit - The Human Side of Services (20)

2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!

Dev ops lessons learned - Michael Collins

Self-Service Operations: Because Ops Still Happens

DevOps Enterprise Summit 2016

Turning Human Capital into High Performance Organizational Capital

Why other ppl_dont_get_it

Maintainable Machine Learning Products

Evolving Architecture and Organization - Lessons from Google and eBay

SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb

MVP to MLP - Minimum Lovable Product

DevOps by the Numbers - How to Approach the Measurement and Metrics of Your C...

All daydevops 2016 - Turning Human Capital into High Performance Organizati...

Summit 2014 Keynote

SaltConf14 - Justin Carmony, Deseret Digital Media - Teaching Devs About DevOps

Moving Fast At Scale

Architectures That Scale Deep - Regaining Control in Deep Systems

When UX (guy) Meets Operations

Back to Basics: Reporting 101

Oracle Discoverer is dead - Where to next for BI?

Lessons Learned from Large Scale Adoption of DevOps for IBM z Systems Software

Plus de Yelp Engineering

Human OpsYelp Engineering

Teeing Up Python - Code GolfYelp Engineering

Fluxx StreamingYelp Engineering

Giving Design CritiqueYelp Engineering

Building a World Class Security TeamYelp Engineering

Humans by the hundred (DevOps Days Ohio)Yelp Engineering

Ensuring Consistency in a Replicated WorldYelp Engineering

A Beginners Guide To Launching Yelp In Hong KongYelp Engineering

MySQL At YelpYelp Engineering

Own Your CareerYelp Engineering

Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering

Optimal Learning for Fun and Profit with MOEYelp Engineering

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...Yelp Engineering

"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...Yelp Engineering

Plus de Yelp Engineering (14)

Human Ops

Teeing Up Python - Code Golf

Fluxx Streaming

Giving Design Critique

Building a World Class Security Team

Humans by the hundred (DevOps Days Ohio)

Ensuring Consistency in a Replicated World

A Beginners Guide To Launching Yelp In Hong Kong

MySQL At Yelp

Own Your Career

Scaling Traffic from 0 to 139 Million Unique Visitors

Optimal Learning for Fun and Profit with MOE

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...

"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...

Dernier

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Connecting the Dots for Information Discovery.pdfNeo4j

Manual 508 Accessibility Compliance AuditSkynet Technologies

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

2024 April Patch TuesdayIvanti

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Data governance with Unity Catalog PresentationKnoldus Inc.

Sample pptx for embedding into website for demoHarshalMandlekar2

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Dernier (20)

Long journey of Ruby standard library at RubyConf AU 2024

DevEX - reference for building teams, processes, and platforms

Connecting the Dots for Information Discovery.pdf

Manual 508 Accessibility Compliance Audit

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

2024 April Patch Tuesday

Potential of AI (Generative AI) in Business: Learnings and Insights

Data governance with Unity Catalog Presentation

Sample pptx for embedding into website for demo

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

TeamStation AI System Report LATAM IT Salaries 2024

How AI, OpenAI, and ChatGPT impact business and software.

Generative Artificial Intelligence: How generative AI works.pdf

The State of Passkeys with FIDO Alliance.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Microservices Summit - The Human Side of Services

1. The human side of services John Billings, Joseph Lynch {billings,jlynch}@yelp.com

2. Yelp’s Mission: Connecting people with great local businesses.

3. Yelp Stats: As of Q3 2015 89M 3271%90M

4. What’s Important?

5. What’s Important?

6. What’s Really Important?

7. What’s Really Important?

8. Getting Started

10. 2005 2016 yelp-main 0 3,000,000 LoC

11. 2011: A service is born

12. 2011 - 2016: A Cambrian Explosion of services

13. 2011 - 2016: New practices

14. Education

15. Why? “There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly once delivery” ● Mathias Varraes

16. How?

17. How?

18. How?

19. Consistency

20. Tooling

21. "/pets": { "get": { "description": "Returns all pets from the system", "responses": { "200": { "description": "A list of pets.", "schema": { "type": "array", "items": { "$ref": "#/definitions/Pet" } } } } } } Interface design

22. Organizational Objectives

23. Org Objectives Distributed System Objectives ● Performance ● Security ● Stability ● Cost These are cross cutting objectives that are crucial in a SOA.

24. Org Objectives No problem … ● Performance is important! ● Security is important! ● Stability is important! ● Cost is important! … all done right?

25. Org Objectives What gets measured gets managed - Drucker

26. Good Objective Metric: Performance

27. Good Objective Metric: Security

28. Good Objective Metric: Reliability

29. Good Objective Metric: Cost

30. Service level objectives ● Contracts are cool ● Performance is cool ● Uptime is cool ● Keeping cost low is cool … So write these things down in a Contract and page owners when we violate them?

31. Operating Services

32. Ownership

33. Dealing with Failure

34. Microservice Pitfalls (from those that have fallen)

35. Lots of Flowers “Let 1000 flowers bloom. Then rip 999 of them out by the roots” - Peter Seibel

36. Append Only Services Service Service Service ... Service Service Service ... Service Service Service ...

37. Ditching Libraries ●Libraries can be pretty terrible – Break tests – Deploy 20 versions – Function calls work only in ${language} – Bugs can take weeks to fix – Tests can take a long time

38. Service Library Ditching Libraries

39. Service Library That’s not a SOA … that’s just RPC Ditching Libraries

40. ●Libraries can be pretty awesome – Break tests not websites – Deploy 20 versions – Function calls are wicked fast – Have weeks to fix a bug – Unit tests are fast Ditching Libraries

41. Everybody does Ops? Dev vs Ops = Dev frustration, Ops burnout Dev + Ops = ?

42. ● Not all Devs can (want to) do Ops ● Not all Ops can (want to) do Dev ● What about? ○ DBAs ○ Security Engineers ○ Designers Everybody does Ops?

43. What to Aim For Instead? 1.Encourage cooperation 2.Acknowledge your engineers have varied skills: {Ops, Dev, Security, Databases, Design, Frontend, API design, Performance, etc …} 3.Try to build teams that have a wide range of skills

44. Any questions?

45. Image Citations ● Sailing Ship: https://en.wikipedia.org/wiki/Sailing_ship ● Programmer with Laptop: https://commons.wikimedia.org/wiki/File:Typing_computer_screen_reflection.jpg ● Moore’s Law: https://en.wikipedia.org/wiki/Moore%27s_law#/media/File:Transistor_Count_and_Moore%2 7s_Law_-_2011.svg ● Cambrian explosion: https://en.wikipedia.org/wiki/Devonian#/media/File:Fish_evolution.png ● Bus queue: https://en.wikipedia.org/wiki/File:Bus_Queue.jpg ● Deputy badge: https://commons.wikimedia.org/wiki/File:Badge_of_the_San_Diego_County_Sheriff%27s_De partment.png

46. Image Citations ● Deep dive: https://en.wikipedia.org/wiki/Deep_diving#/media/File:Trevor_Jackson_returns_from_SS_Ky ogle.jpg ● Map: https://commons.wikimedia.org/wiki/File:Carta_Marina_AB_stitched.jpg ● AWS Total Cost of Onwership: https://aws.amazon.com/blogs/aws/the-new-aws-tco- calculator/ ● Sharing milkshake: https://commons.wikimedia.org/wiki/File:Children_sharing_a_milkshake.jpg ● Field of flowers: TODO

Notes de l'éditeur

Approx. 89 million UMVs via mobile More than 90 million reviews contributed since inception Approx. 71% of all searches on Yelp came from mobile (mobile web & app) Yelp is present across 32 countries
Here’s a cool picture showing exponential increase in complexity over time What does this have to do with services, you might ask?
In the beginning, there were zero lines of code in yelp-main In 2016, there are about three million The problem is, it’s hard to scale up our release process as we keep adding code and developers What is our release process? Once your branch is code reviewed, you submit your branch as a push request Three times a day, a push master grabs around 20 branches and pushes that code out to production So at most around 60 branches get released per day We needed an alternative approach...
This was our first production service It didn’t do very much :) But it was a very useful testing ground for service technologies, as well as deployment, monitoring etc. We generalized it to become v1 of our service template Which then begot PaaSTA, our Platform as a Service
In five years we saw an explosion of over 150 services Maybe we overshot the mark a little? :) Joey is going to talk more about this in a bit
In order to get good at deploying services we’ve had to make lots of changes to the org It used to take several weeks to deploy a service, now it takes an hour or two We spread out operations responsibilities to minimize queuing This is a specific case of a more general one of distributing knowledge
Programming the monolith is hard Programming a service oriented architecture is very hard A few weeks ago we had an issue in our task queues due to a kafka issue This caused massive duplication of some tasks e.g. 50x for some These duplicate tasks caused duplicate photos to appear in timelines :( Great example of why knowing about idempotency is important
Service principles document Outlines what we think are the important things wrt design and operations Technology agnostic Service tutorial We use a cool program called dexy to script incremental service creation and display the output “Here the diff, here’s the output of the service when you apply the diff”
Deputy programs There are some processes where you can cause a lot of damage if you do them wrong e.g. Making puppet changes, setting up new services So we really don’t want to hand the keys to new developers Solution: take one or two more senior engineers from each team and train them to do these things Every week we hold office hours Anyone from across the org can drop in and ask questions about services
Deep dives Every Monday we have an engineering all-hands meeting As part of this, we have a deep dive where an engineer discusses something they’ve been working on Periodically use this to talk about some aspect of services Service Creation Form (SCF) documents the basics of your service Reviewed by a small group of more experienced engineers It’s a balancing act wrt process (goldilocks) In general, we’ve tried to disperse knowledge across the organisation instead Examples of areas covered by SCF: Load balancing, failure modes, caching Review process?
In the monolith, you usually have just one language, one ORM, one database technology, one caching technology When we first went to services, everybody did their own thing: clojure, redis, thrift, couchdb Person-SPOF This is today’s map of the world Yours will probably look different Common set of ‘safe’, well-supported technologies You don’t *have* to use these, but if you don’t then you’re on your own...
One thing that we have standardized on is HTTP/JSON Interface definition! Many (not all) services are using Swagger to define their interfaces Here’s an example of a partial swagger definition Especially successful for our internalapi service Previously: anything goes anywhere Now: Swagger spec for every new endpoint, all spec changes go out to reviewboard group
Every single service has a per service endpoint Not just the website
This is a service’s uptime + reliability, not the website
Each team owns their own services Why? It’s a lot easier to assign responsibility if ownership is clear e.g. upgrade this library Ideally >= 2 people know about a service on a team Some services do effectively become unowned
We use a JIRA project to track ongoing incidents Once resolved, enters into the postmortem status All postmortems go to all developers I like postmortems, but they do take quite a lot of work. Luckily Yelp is very supportive of these efforts Initially some of this was a bit of a struggle for teams not used to operations So we had to spread some of the operations best practices across the org
Oncall, not everyone wants to be oncall Teams need Ops
No dedicated DevOps teams, rather empower your existing developers to become a DevOps or a SecOps or a DevSec, but don’t expect them to be everything