SlideShare une entreprise Scribd logo
1  sur  31
Building a Culture Where
Software Projects Get
Done
Greg Brockman
CTO at Stripe
@thegdb
We don’t know how
to build software
EXPECTED

Toda
y

LIKEL
Y

Disappointing

Insane

Engineering timelines
will slip
System complexity
never decreases
Rewrites will always fail
(that doesn’t stop people
from trying, though)
You are not special
Choose wisely how
you’re spending your
time
Roll your own solution to
your hardest problems,
not your easiest ones
Balance creation
versus maintenance
5.times {print “Automate”}
Once a bug is triggered, it will
keep biting you on a short
timeline, no matter how unlikely it
seems
Invest in technology
to support your rate
of change
Tests aren’t for your
benefit
Create a technology
monoculture
You will have technical
debt — and that’s good

Image Credit: Philippe Kruchten
Pick a few standards
Have checks and
balances against
yourself
Minimize distance to
the first production use
Time to shard everything: 3 months
(projected)
Time to shard internal collection: 1 week
Have assumption
questioners
Bus factor: not just for bus
accidents
Use forcing functions
(cautiously)
Have a good launch
process in place
Have good post-hoc
processes in place
Make collaboration
great
Find communication
sidechannels
Documentation should
not be a primary source
Meetings: useful but
costly
Have design dictators
Have lots of remotes
or no remotes
Greg Brockman
gdb@stripe.com
@thegdb

Contenu connexe

Tendances

Mimosa Story (incomplete version)
Mimosa Story (incomplete version)Mimosa Story (incomplete version)
Mimosa Story (incomplete version)Gan Kah Hwee
 
Agile Testers: Becoming a key asset for your team
Agile Testers: Becoming a key asset for your teamAgile Testers: Becoming a key asset for your team
Agile Testers: Becoming a key asset for your teamgojkoadzic
 
Solving technical challenges
Solving technical challengesSolving technical challenges
Solving technical challengesMoti Margalit
 
The Mythical Man-Month #3 The Surgical Team
The Mythical Man-Month #3 The Surgical TeamThe Mythical Man-Month #3 The Surgical Team
The Mythical Man-Month #3 The Surgical Team楼閣 砂上
 
Introduction To Software Engineering
Introduction To Software EngineeringIntroduction To Software Engineering
Introduction To Software EngineeringLeyla Bonilla
 
TDD & Effective Software Development
TDD & Effective Software DevelopmentTDD & Effective Software Development
TDD & Effective Software Developmentsamserpoosh
 
CS101- Introduction to Computing- Lecture 24
CS101- Introduction to Computing- Lecture 24CS101- Introduction to Computing- Lecture 24
CS101- Introduction to Computing- Lecture 24Bilal Ahmed
 
Getting business people and developers to listen to testers
Getting business people and developers to listen to testersGetting business people and developers to listen to testers
Getting business people and developers to listen to testersgojkoadzic
 
Get things done : pragmatic project management
Get things done : pragmatic project managementGet things done : pragmatic project management
Get things done : pragmatic project managementStan Carrico
 
Unit 3b swot analysis
Unit 3b   swot analysisUnit 3b   swot analysis
Unit 3b swot analysisAlex_Gormley
 
Software development project management
Software development project managementSoftware development project management
Software development project managementRoni Banerjee
 
2013-11-07 standups, stories, planning & estimates
2013-11-07 standups, stories, planning & estimates2013-11-07 standups, stories, planning & estimates
2013-11-07 standups, stories, planning & estimatesmezis
 
Productivity tips for busy professionals
Productivity tips for busy professionalsProductivity tips for busy professionals
Productivity tips for busy professionalsLuigi Benetton
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 

Tendances (20)

Overcoming challenges
Overcoming challengesOvercoming challenges
Overcoming challenges
 
QMSS Root Cause Analysis - Sample Slides
QMSS Root Cause Analysis - Sample SlidesQMSS Root Cause Analysis - Sample Slides
QMSS Root Cause Analysis - Sample Slides
 
Mimosa Story (incomplete version)
Mimosa Story (incomplete version)Mimosa Story (incomplete version)
Mimosa Story (incomplete version)
 
Agile Testers: Becoming a key asset for your team
Agile Testers: Becoming a key asset for your teamAgile Testers: Becoming a key asset for your team
Agile Testers: Becoming a key asset for your team
 
Solving technical challenges
Solving technical challengesSolving technical challenges
Solving technical challenges
 
The Mythical Man-Month #3 The Surgical Team
The Mythical Man-Month #3 The Surgical TeamThe Mythical Man-Month #3 The Surgical Team
The Mythical Man-Month #3 The Surgical Team
 
Introduction To Software Engineering
Introduction To Software EngineeringIntroduction To Software Engineering
Introduction To Software Engineering
 
TDD & Effective Software Development
TDD & Effective Software DevelopmentTDD & Effective Software Development
TDD & Effective Software Development
 
Lean Dot Game
Lean Dot Game Lean Dot Game
Lean Dot Game
 
CS101- Introduction to Computing- Lecture 24
CS101- Introduction to Computing- Lecture 24CS101- Introduction to Computing- Lecture 24
CS101- Introduction to Computing- Lecture 24
 
Getting business people and developers to listen to testers
Getting business people and developers to listen to testersGetting business people and developers to listen to testers
Getting business people and developers to listen to testers
 
Design talk
Design talkDesign talk
Design talk
 
Get things done : pragmatic project management
Get things done : pragmatic project managementGet things done : pragmatic project management
Get things done : pragmatic project management
 
The senior dev
The senior devThe senior dev
The senior dev
 
Testers developers think differently
Testers developers think differentlyTesters developers think differently
Testers developers think differently
 
Unit 3b swot analysis
Unit 3b   swot analysisUnit 3b   swot analysis
Unit 3b swot analysis
 
Software development project management
Software development project managementSoftware development project management
Software development project management
 
2013-11-07 standups, stories, planning & estimates
2013-11-07 standups, stories, planning & estimates2013-11-07 standups, stories, planning & estimates
2013-11-07 standups, stories, planning & estimates
 
Productivity tips for busy professionals
Productivity tips for busy professionalsProductivity tips for busy professionals
Productivity tips for busy professionals
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 

En vedette (16)

Startup Engineering culture - "What matters & what does not"
Startup Engineering culture - "What matters & what does not"Startup Engineering culture - "What matters & what does not"
Startup Engineering culture - "What matters & what does not"
 
Product Innovation is a Habit
Product Innovation is a HabitProduct Innovation is a Habit
Product Innovation is a Habit
 
How to run an effective (and fun) standup
How to run an effective (and fun) standupHow to run an effective (and fun) standup
How to run an effective (and fun) standup
 
ADB Email PP
ADB Email PPADB Email PP
ADB Email PP
 
Catàleg
CatàlegCatàleg
Catàleg
 
Certificate_2
Certificate_2Certificate_2
Certificate_2
 
Humor en el A.T.
Humor en el A.T.Humor en el A.T.
Humor en el A.T.
 
1
11
1
 
ESSR | Premiers Secours Genève
ESSR | Premiers Secours GenèveESSR | Premiers Secours Genève
ESSR | Premiers Secours Genève
 
New CV Imran Qureshi
New CV Imran QureshiNew CV Imran Qureshi
New CV Imran Qureshi
 
ทิคุสมาส
ทิคุสมาสทิคุสมาส
ทิคุสมาส
 
Final econs report
Final econs reportFinal econs report
Final econs report
 
Max power-engineers
Max power-engineersMax power-engineers
Max power-engineers
 
Portfolio for Public
Portfolio for PublicPortfolio for Public
Portfolio for Public
 
Document2
Document2Document2
Document2
 
BV Brochure Nov 2015
BV Brochure Nov 2015BV Brochure Nov 2015
BV Brochure Nov 2015
 

Similaire à Building a culture where software projects get done

What Do You Do When Agile Is Too Slow?
What Do You Do When Agile Is Too Slow?What Do You Do When Agile Is Too Slow?
What Do You Do When Agile Is Too Slow?Doug Thomas
 
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINALJun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINALAlex Tarra
 
Measure It! How to measure quality in (not only) large software projects, OW2...
Measure It! How to measure quality in (not only) large software projects, OW2...Measure It! How to measure quality in (not only) large software projects, OW2...
Measure It! How to measure quality in (not only) large software projects, OW2...OW2
 
Software Development in 21st Century
Software Development in 21st CenturySoftware Development in 21st Century
Software Development in 21st CenturyHenry Jacob
 
4 PM Anti-Patterns
4 PM Anti-Patterns4 PM Anti-Patterns
4 PM Anti-PatternsBert Heymans
 
Agile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovAgile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovSvetlin Nakov
 
Agile Methodologies And Extreme Programming
Agile Methodologies And Extreme ProgrammingAgile Methodologies And Extreme Programming
Agile Methodologies And Extreme ProgrammingUtkarsh Khare
 
Software development myths that block your career
Software development myths that block your careerSoftware development myths that block your career
Software development myths that block your careerPiotr Horzycki
 
Successful Software Projects - What you need to consider
Successful Software Projects - What you need to considerSuccessful Software Projects - What you need to consider
Successful Software Projects - What you need to considerLloydMoore
 
Agile Software Development with Scrum
Agile Software Development with ScrumAgile Software Development with Scrum
Agile Software Development with ScrumChris Brown
 
Project Management Bootcamp for Event Professionals
Project Management Bootcamp for Event ProfessionalsProject Management Bootcamp for Event Professionals
Project Management Bootcamp for Event ProfessionalsSocial Tables
 
Strangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven ApproachStrangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven ApproachVMware Tanzu
 
The "Evils" of Optimization
The "Evils" of OptimizationThe "Evils" of Optimization
The "Evils" of OptimizationBlackRabbitCoder
 
SW Engineering Management
SW Engineering ManagementSW Engineering Management
SW Engineering ManagementRobert Sayegh
 
GMO'less Software Development Practices
GMO'less Software Development PracticesGMO'less Software Development Practices
GMO'less Software Development PracticesLemi Orhan Ergin
 
Agile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches SlidesAgile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches Slidesguesta1c5d7
 

Similaire à Building a culture where software projects get done (20)

What Do You Do When Agile Is Too Slow?
What Do You Do When Agile Is Too Slow?What Do You Do When Agile Is Too Slow?
What Do You Do When Agile Is Too Slow?
 
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINALJun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
 
Measure It! How to measure quality in (not only) large software projects, OW2...
Measure It! How to measure quality in (not only) large software projects, OW2...Measure It! How to measure quality in (not only) large software projects, OW2...
Measure It! How to measure quality in (not only) large software projects, OW2...
 
Software Development in 21st Century
Software Development in 21st CenturySoftware Development in 21st Century
Software Development in 21st Century
 
4 PM Anti-Patterns
4 PM Anti-Patterns4 PM Anti-Patterns
4 PM Anti-Patterns
 
Agile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovAgile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin Nakov
 
software lecture
software lecturesoftware lecture
software lecture
 
Super Projects
Super ProjectsSuper Projects
Super Projects
 
Agile Methodologies And Extreme Programming
Agile Methodologies And Extreme ProgrammingAgile Methodologies And Extreme Programming
Agile Methodologies And Extreme Programming
 
Software development myths that block your career
Software development myths that block your careerSoftware development myths that block your career
Software development myths that block your career
 
Successful Software Projects - What you need to consider
Successful Software Projects - What you need to considerSuccessful Software Projects - What you need to consider
Successful Software Projects - What you need to consider
 
Agile Software Development with Scrum
Agile Software Development with ScrumAgile Software Development with Scrum
Agile Software Development with Scrum
 
Project Management Bootcamp for Event Professionals
Project Management Bootcamp for Event ProfessionalsProject Management Bootcamp for Event Professionals
Project Management Bootcamp for Event Professionals
 
Strangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven ApproachStrangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven Approach
 
The "Evils" of Optimization
The "Evils" of OptimizationThe "Evils" of Optimization
The "Evils" of Optimization
 
It’s a world of bugs after all
It’s a world of bugs after allIt’s a world of bugs after all
It’s a world of bugs after all
 
Myths
MythsMyths
Myths
 
SW Engineering Management
SW Engineering ManagementSW Engineering Management
SW Engineering Management
 
GMO'less Software Development Practices
GMO'less Software Development PracticesGMO'less Software Development Practices
GMO'less Software Development Practices
 
Agile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches SlidesAgile Development Brown Bag Lunches Slides
Agile Development Brown Bag Lunches Slides
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Building a culture where software projects get done

Notes de l'éditeur

  1. Today I’m going to be talking about the single aspect of software engineering that has basically remained stagnant for the past thirty years — how to get software projects done. Most of you probably know the classic allegories in this space: the mythical man month, second system effect, and so on.
  2. But even though these stories have been around for a long time, the same issues still plague us today. And they’re growing ever more important as software continues to eat the world [1]. Just take a look at the recent healthcare.gov fiasco — the difficulties of building software are now making national headlines. [1] http://online.wsj.com/news/articles/SB10001424053111903480904576512250915629460
  3. The first step to recovery is admitting we have a problem. I think that we as a community need to acknowledge something: we still don’t know how to build software. Whether it’s bugs, schedule delays, or feature creep, these are all staples of software engineering. Think about how reliable traditional engineering is, and compare that to what we see in software (cf [1]). You don’t cross a bridge worrying that it’s about to fall down. But when you use software, you see it crashing all the time. There’s certainly debate about just why this is true. Is it just that software engineering is young, and we need more time to figure it out? I think it’s deeper than that — software engineering is just inherently more complex than any other engineering discipline: modules have a much broader interaction surface, and more modules interact with each other, meaning that the resulting systems are just orders of magnitude more complex than anything else we build. [1] http://www.codinghorror.com/blog/2005/05/bridges-software-engineering-and-god.html
  4. Software timelines are one thing that we just don’t know how to deliver on yet. My usual rule of thumb is that you should take your expected timeline and keep tripling it until it feels like it’d be insane to take that long — the real timeline is usually between “disappointing” and “insane”. (It’s remarkable how well this works — I think the problem is most estimates are incredibly optimistic, and fail to factor in the amount of random interruptions that will happen along the way.)
  5. When was the last time you saw one of these? If you can’t see, it’s a negative diffstat. It probably wouldn’t look very familiar anyway, because it’s rare that anyone actually produces one. Most engineering cultures measure productivity as producing new code to do new things, and view improving old code as just overhead — it’s very rare anyone feels good about taking the time to make old code do old things better. That’s kind of surprising, because one thing we do know is successfully building software is all about constraining complexity, and the simpler you can make your code the more you can get done in the future. But it’s also easy to approach the drive for simplicity from the wrong direction — this is basically the second system effect. Let me give you an example. Back in college, I was tasked with running technology for the Harvard-MIT Math Tournament. I’d inherited a lot of code: there was a Java app for entering results, a Perl script to generate rankings, a Python script to turn those rankings into results emails, and a website that ran on server-side includes. Looking at this, I declared all existing code legacy, and resolved to rewrite everything in a single unifying Ruby on Rails project. I started writing, and whenever I thought of some potential piece of functionality, I’d go ahead and add it. Support for multiple tournament years in the application? Of course. Adding a CMS for all the static pages? How could I resist? Over time, I noticed that the application was becoming very complex. There was so much functionality, which grew out of a soup of modules without well-defined abstraction boundaries, that it was becoming hard to trace the origin of any one behavior. And in response I did something surprising: I rejoiced! Surely this meant I was getting a lot done — that sloccount kept going up, and what value can a project have if it’s not complex? Needless to say, when I presented my application to my co-maintainers, no one could figure out what was going on. With the old system, even though there were a lot of tools, you could figure out how to change any one component by just looking at that component in isolation. With my Monorail, any changes required understanding the entire system, in all of its complexity. Even though the Monorail basically worked, and from the outside perhaps it looked like a simpler system since it unified all these tools, we had to throw it out: the increased complexity was not worth the corresponding functionality gains. You should look at code as shackles: every additional feature you pack in is something you’ll have to maintain, something you’ll have to reimplement if you ever decide to switch stacks, and something that will potentially interact with any new features you add. Even worse than the features you meant to add, you probably will end up with a bunch of emergent behaviors that are an accident of how you happened to code together your system, and you’ll find that new code will start relying on those behaviors, making it impossible to understand just part of the system in order to make future changes. Given how bad we are at writing software, we need to take every opportunity we can to constrain complexity. But tempting as it is, you can’t solve complexity by throwing more complexity at it — you need to figure out how to incrementally improve your existing solution. And we as a community have yet to figure out how to set up our cultures so that this happens.
  6. You can look at that occurrence as simply the follies of a novice programmer. But the crazy thing is, there are many stories just like it from industry, even from some of the best programmers out there. You’ve probably heard the classic stories, like how the massive Netscape rewrite basically killed the company, but what’s surprising is that this still happens today. To illustrate: one of my friend’s companies was written in a scripting language, and over time increasingly felt like they couldn’t get enough performance out of it. After a few attempts to improve performance, they decided to take the nuclear option: it was time to rewrite in Scala. The plan was to have a few people spend two months rewriting all of the core abstractions in Scala. After that was done, there’d be a feature freeze, and everyone would spend the next two weeks porting all the application code to Scala, at which point they could just switch over to Scala entirely. I bet you all see where this is heading. The first sign of danger was that the abstraction porting took longer than expected — but after 6 months, they were ready to go. The delay was an accumulation of little things, ranging from getting the toolchain running in their stack to building out tooling to have Scala serve “shadow” web requests in order to check its correctness. And then the feature freeze began. For the next three months, everyone in the company was full-time on porting over from their scripting language. It turned out that there was a lot of complexity in their application logic, and porting over and testing each page took a lot longer than the expected 2 weeks. At some point, they realized that they couldn’t afford to keep their existing site stagnant, and so they put some people back to work on the old site. But now they had to deal with diverging code, which further slowed the port. Finally, realizing that the rewrite was doomed, they redoubled their efforts in figuring out how to scale their existing language. Ultimately, they did find a solution — it turned out that by cleverly breaking out parallel rendering, they were able to get the performance they needed. The thing I find fascinating about this story is it has nothing to do with the company having bad engineers — in fact, they employed some of the best engineers I know. And it isn’t like this is a newly-discovered failure mode: Joel on Software has an article from 2000 talking about the dangers of rewrites [1]. So why do we keep doing monolithic rewrites? I have a lot of hypotheses, but to some extent, the underlying reasoning doesn’t matter. What’s important is to make sure every member of your engineering team is aware of the fact that this is something everyone tries, and everyone gets wrong, and that if you try to rewrite your site from scratch you will fail, and possibly kill the company along with you. Engineering problems usually have multiple solutions, and once we’ve found one we usually just give up on searching for more. Note that once they’d constrained their solution space to “things which don’t involve a massive rewrite”, it wasn’t actually hard for them to find a solution. [1] http://www.joelonsoftware.com/articles/fog0000000069.html
  7. Perhaps the most important point to instill in your culture is the realization that you are not special. You’re not immune to any of the failure modes that people run into, and the things that I’m going to talk about today are probably going to mirror the ones that you’ll tell in the future. If you think you’re immune, that probably just means you haven’t been around for long enough to see how it’s going to break down and bite you yet. There are people who are just as smart as you who have been thinking about the same problems for a very long time. The best way to get things done is to approach every project cognizant of how these things usually go wrong, and be constantly looking for those warning signs that you’re going to mess it up. In practice, you’ll still make mistakes. But this way, you’ll make fewer of them.
  8. The most common reason that people fail to get things done is that they spend their time working on the wrong things. Shaping your culture so that people work on what’s important is tough — it’s so easy to get sidetracked. But it’s also the only way to be successful.
  9. Whenever your company has a new requirement, you have a choice. Do you integrate an off-the-shelf solution, or do you build one yourself? As an engineer, it’s always tempting to just roll your own — NIH, or not invented here (http://en.wikipedia.org/wiki/Not_invented_here), is the name given to this proclivity. As a manager, it’s tempting to just use someone else’s solution. So how do you decide when to build and when to buy? The danger with building isn’t actually that any of these problems are hard. It’s easy enough to get an MVP of an applicant tracking system up and running. The thing that always goes wrong is in the maintenance, and the laundry list of a hundred features (individual user accounts, email integration, daily summaries, reminders, etc.) that would make the product just a little bit better. Presumably your engineering effort will always be incrementally better spent working on other problems, and so these ones just won’t get done. The main thing you pay for when you buy is someone to work on the long tail of features, not actually the core functionality. Most cultures lose sight of this tradeoff, and NIH all of their easy problems. This consigns them to working on maintenance on all these applications. In contrast, many people’s first instinct is to outsource their hardest problems, because, after all, they are hard and it seems better to just let someone else deal with them. One case where we ran into this tension was with sharding. For a long time, we’d incrementally scaled our databases. Whenever a database cluster would become overloaded, we’d split out collections into new physical databases. At some point, we knew this would break down, and we wanted to stay ahead of the curve. We correspondingly decided it was time to implement a sharding layer. Sharding is a very hard problem (you’re combining distributed systems with your production-critical data), and we would love to just be able to outsource that problem to someone else. We use a lot of MongoDB (we chose it primarily for its automated failover capabilities), which comes with a sharding scheme. We started out by deploying its sharding against our log archive cluster. Over the next few months, we had a number of operational issues. When those came up, the two things we’d do was “read the source” or “go to MongoDB support and ask them for help” (or possibly both). Support’s turnaround was quite good, but it added a lot of latency to issues that would have been critical had they been for production data. As well, we soon realized that outsourcing the code writing hadn’t actually allowed us to outsource the code understanding, but mostly meant we didn’t have control over the sharding layer / the ability to patch it if things went wrong. So we soon realized that sharding was something we just needed to write ourselves. It was sufficiently core that we couldn’t trust someone else to manage it, and we could take advantage of application-level invariants to better tune a lot of the behavior for our use-case. That being said, there are many hard problems which you should not try to solve yourself. Only pick the problems that are core to your business, and where there’s some reason to believe that the solution for you is significantly easier than the general case. For something like scaling your log pipeline, everyone has exact the same problems as you — don’t write your own; use something off the shelf even if it is missing that one little feature you want. (As an aside, there’s actually a great paper on this subject, End-to-End Arguments in System Design [1], which I highly recommend for anyone trying to decide what functionality they need to implement themselves.) [1] http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
  10. Perhaps one of the hardest things about building a rapidly-growing company is balancing building new things versus maintaining the old ones. Fixing the existing things are always the most urgent — if your system is down, or a user can’t log in, or your database has been corrupted, you do just need to drop what you’re doing to go deal with it. Being good at maintenance is how you avoid losing. However, the really important stuff, the things that will make you win, are the new and innovative things in your development pipeline. Nothing stalls a project more than an engineer being pulled into a bunch of firefighting, or answering customer questions, or helping diagnose some weird behavior. One structure that we’ve adopted for handling these issues is what we call build/run rotation. Each week, one person on each engineering team is on “run”. Their job is to serve as a buffer for all of the operational concerns. When urgent things come up requiring that team’s attention, it’s the runner’s job to intercept it and handle it appropriately. Sometimes they’ll have no choice but to escalate, but those times should be few and far between. With their free cycles, the runner should work on polish, trying to make small, quick wins which are easily interrupted. This leaves the builders free to focus on the most important projects to move the team forward. One nice property of this structure is that everyone on the team has an understanding of the main operational issues affecting them. This can help focus their build time on what’s important.
  11. Along these lines, you should bake into your culture a desire to squash manual tasks. Otherwise, those tasks will grow in number until they’re all that you do. Some manual tasks won’t be worth automating, but those should be few and far between. Invest in a framework for task automation so that it’s easy to add and maintain automations — once it’s in place, you should see people just starting to use it without further prompting. At Stripe, our systems team has a project called Golem, which they use to automate processes such as database cluster rebuilds which used to require a lot of human time. In all of these, your goal should be to maximize your people’s efficiency and leverage.
  12. Imagine you have some service outage due to a memory leak. You track the leak down to a very rare corner case in your code. Do you immediately fix it, or do you put it on your list of things to deal with in the future? It’s very tempting to say that since it’s a rare condition, you can just ignore it. However, I’ve noticed that empirically, if you’ve hit an issue once, you’ll almost always keep hitting it again and again on a surprisingly short timescale. I think what’s going on there is the fact that you managed to trigger it once is actually a pretty strong indicator that you’re now at a point where it’s likely to be triggered again. Sometimes there’s a second factor that you don’t understand; for example, perhaps some other constraint in your systems end up making the corner case far more probable, or maybe a new customer started sending you a bunch of data of an unexpected form. Sometimes it’s just that you’ve crossed some volume threshold without noticing. But I’d strongly recommend a policy of fixing production bugs as soon as they are discovered — anything else will cause a surprising amount of repeat breakage.
  13. Writing code is hard, but writing code that you’re sure actually works is much harder. The single best thing you can do to make it possible to get projects done in your codebase is provide good ways to be sure that changes are correct. No code should be considered complete or shippable until it has a good way of ensuring it’ll be modifiable in the future.
  14. The primary mechanism to accomplish this is tests. One common misconception is that tests are to ensure that what is currently written works. That’s not really true — you can probably convince yourself of correctness just by playing around with the software by hand, and it’s usually a lot faster than writing a test. Instead, tests are a statement to future maintainers (including yourself six months in the future) about what contracts your code needs to maintain. If it’s not tested, then it’s undefined behavior, and you should assume future refactors will either come and break it, or just won’t happen because you’ve written a block of gnarly, untested code that everyone will be afraid to touch. Choosing what tests to write is a bit of an art. Tests should be explicit about what they’re testing, and if they break, it should be obvious what no longer works. And importantly, they should really be statements about the functionality you care about, rather than implementation details. I’ve seen TDD adherents iterate by writing an empty function, asserting it returns an array, changing the function to return an empty array, adding a new test that it returns an array of length two, and going from there. Most of the resulting tests aren’t useful: all you really care about is that your function, say, splits a name into first and last, and you should keep your tests to that high-level behavior.
  15. A less obvious investment to make is staying on a unified stack. As your company grows, it can be tempting to introduce new technologies (and you’ll probably notice people pushing for them). There are certainly times you should let this happen — it’s unlikely that your main stack will be able to solve all problems. However, keeping as much of a technology monoculture as possible means that when you fix a performance problem in one service, all of your other services get the benefit of that fix. It can be painful, because this often means you shouldn’t go seeking the right tool for the job — instead, you should look for the right way of solving problems within your existing technology stack, no matter how kludgy. A good illustration of this is that from the beginning of Stripe history, we were written entirely in Ruby. I wrote a scary amount of Ruby systems code — there wasn’t really the library support for some of the stuff I was doing, but sticking to Ruby meant that I could reuse our deploy, test, and dependency management code. As we built more and more infrastructure, we spun out our own in-house framework, which we currently use to underly all our services. That means we can make a single code change to, say, our logging, and suddenly have all of our services reflect the change. There have been many times when we’ve been tempted to break this culture, and some times that we have. When someone wants to introduce something like Redis, and we can find a way to hack the functionality we want using our existing systems, we’ll stick to the monoculture. But when someone wants to introduce Hadoop, where it’s clear that building even a mildly plausible Ruby alternative is infeasible, we’ll introduce the new technology. In general, if the functionality is something that’s important to the business, then you really have no choice but to accept whatever stack it comes with. Just keep in mind that it’ll be a burden that you’ll bear forever.
  16. Technical debt is something that always comes up in these sorts of talks. I think the best explanation of technical debt I’ve seen is this image. Nice, visible things are what we call features. Nice, invisible things are architecture. Bad but visible things are bugs. The bad and invisible things are what we call technical debt. It bogs us down and slows the rate of change. Most people feel bogged down in technical debt, and start asking “how can I change my culture so we stop adding technical debt”. That’s really the wrong question though. Technical debt, if managed properly, is actually a good thing. It’s just like real debt: it lets you move more quickly in the short-term, but you’ll have to pay it back in the future. If you can’t pay it back, you’ve lost. But by spreading out the load of polishing your system, you can get way more done. So the real question is how to change your culture to better manage technical debt, and make sure you’re paying it down at a good rate. Unfortunately, there’s no silver bullet here, but a good rule of thumb is “don’t do work you’ll later have to undo” — it’s applicable in probably 75% of cases. In any case, you should make your new debt explicit. You probably won’t do anything about it, but at least you know it’s there, rather than it being discovered next time someone tries to make a change. Once you’ve identified what your debt is, whether just bolting a new function onto an existing class, or partially integrating some external system, or whatever it may be, you can be more strategic about its accumulation.
  17. Given those statements, you need to decide what properties all of your software must absolutely have. You should assume that anything on that list will slow down your immediate iteration cycle. But you should also assume that anything not on that list will end up being sacrificed in a permanent way in at least one project. I think good testing should always be on that list — that’s the one lifeline you have to pulling yourself out of the technical debt mire in the future. Security is a pretty important configuration item to pick. Do you let people embed secrets into your code? What kinds of third-party services do you allow, and what data are you ok with giving them? These are not easy questions, and the right answers vary from culture to culture. Quality standards are also important. You should assume the things you write will be there forever. My first project at Stripe was something called password-vault. It was a system for storing shared passwords, such as logins for third-party services. It was pretty horrendous code, but I figured I just needed to get something out and I’d get around to fixing it up in the next few weeks. Three years later, that code remains in use. So you need to decide up front what level of quality is acceptable, and just not let anything ship below the bar. There are a number of other standards you might choose. Monitoring coverage? Do all services need to be nicely packaged, or is it ok to manually configure the servers they run on? The choices you make here say a lot about who you really are as a company and culture. One dissatisfying thing about standards is you only get a few of them — have too many of them, and you won’t be able to get anything done. But if you don’t have them, then your codebase and systems are going to fly out of control and be completely unmanageable.
  18. Because humans are so bad at building software, it’s important to not lean on just your intuitions and assumptions. You should make sure you’re using techniques that will help you get things done while continually reevaluating your priors and adapting to the situation at hand.
  19. In general, you should try to get something, anything out into production as incrementally as possible. Trade off features, but don’t trade off implementation quality. Once it’s in production, you don’t have to worry about your branch growing stale — people who are making changes will also include your system. As well, this helps guide what problems you actually need to solve (perhaps that thing you thought would be the bottleneck is actually fine, and maybe the lack of a webface is really a bigger problem than you were expecting). And once your system is out, the problems all become nicely incremental. In many ways, your ability to get projects done really is just a reflection of the timeline to get something from zero into production, with some adjustment for iteration cycle length. Most people think about the latter, but don’t take the former at all into account. Iterating is way easier than getting something built from scratch: it’s a lot clearer what problems you actually need to solve. Our sharding project is a good example of this. We’d sat down and come up with a design, and implemented the core code in a few weeks. However, we figured it’d take at least 3 months to get fully comfortable enough to roll it to production. There were after all a lot of moving parts, including a shard splitting tool we hadn’t started on, and getting things wrong could result in missing data or incorrect queries. And of course, since we thought it’d take 3 months, it’d probably take more like 9. This was starting to sound like a massive project. We thought for a while about whether there was a better approach. Finally, someone suggested — what if we just roll it out to some non-critical internal collection? In that case, we could just punt on all the fanciness. We rolled out sharding for that one collection later that week. Suddenly we had a clear set of priorities, and things that needed tuning, and things shifted from being abstract to concrete worries, which meant we could fix them. This changed sharding from vaporware into something that was used in production, somewhere.
  20. Getting sharding fully into production ended up hinging on conversations between the engineers driving the project and other people on the team who were less closely involved, but had the right background. The next milestone for sharding was rolling it to critical data. Our databases were growing increasingly loaded, and if we didn’t have sharding soon we’d have to come up with a drastic stopgap. At this point, we’d become pretty comfortable with our sharding implementation. The main blocking point was building out a shard splitter. There was no way to make the shard splitter itself more incremental. At one point, an engineer working on sharding walked a counterpart removed from the project through all the details. The counterpart asked a bunch of questions on each component, just trying to fully understand what was going on. At one point, he asked “So the hard problem here is splitting shards, right? But why can’t we just start putting new users onto a new shard?” And then the solution was clear — in fact, we could just punt on the shard splitter altogether. Sharding just new data would leave us in no worse of a world, and would mean our databases would catch no more on fire. So suddenly we had a plan, which allowed us to reap immediate benefit from a project that would have taken a very long time to complete, and we didn’t have to implement a stopgap. It’s interesting to step back and ask: what actually happened here? It’s not that the counterpart was a better engineer: instead, I think the issue is that, when you’re building a system, you need to make thousands of tiny decisions and judgement calls. Probabilistically, some percentage of these will be wrong. And so sitting down with someone who isn’t steeped in the details, but has enough background to question your assumptions will invariably be useful and help you discover something you wouldn’t otherwise — if you’re familiar with rubber ducking, I think this of this as basically rubber ducking++.
  21. Many people talk about the idea of a bus factor, or the minimum number of people who could be removed from the project (graphically portrayed as being bit by a bus) before no one is familiar with the code. Usually the focus of bus factor is redundancy — you want to make sure that if your main programmer leaves the company, or wants to work on something else, that the project doesn’t suddenly stagnate since no one knows how it works. In reality though, there’s a more important benefit to maintaining a high bus factor: it just leads to better decisions and code. For some systems, it’s not enough to sit down and talk design every so often — the assumption questioner really needs to be writing code alongside the primary author. One example here is Monster, a system for durable event processing I wrote early in Stripe’s history. It’s the core backbone of our systems, and has grown from thousands to tens of millions of events per day. The way we build systems at Stripe is to roll them out with the simplest possible implementation, and improve from there. We’d started out by running all consumers in a single process, which round-robined among them. Over time, we noticed that low-priority but high-volume consumers could starve out high-priority consumers, and ended up splitting out consumers into groups according to their priority. We continued doing this sort of incremental scaling for quite some time. At some point, we decided the time had come to figure out how to scale Monster for the next order of magnitude of growth. All of the consumer scheduling stuff had been baked into Monster as an afterthought, and it felt like clearly we should just find a piece of software that already did that rather than roll our own. We were very cognizant of the second system effect (that is, the tendency for system redesigns to end up with feature bloat), and chose to add only the bare minimum of required new features, such as a sharding scheme to parallelize individual consumers. After looking around, we settled on Storm as the closest thing to what we wanted. This would require rewriting Monster in Java. We didn’t even entertain the notion of rewriting our consumers; instead, we immediately jumped to writing a multilang connector so that all of our consumers would remain written in Ruby. The design was carefully incremental, and allowed us to switch over just one event type to get something in production as early as possible. It seemed like we’d successfully applied all of our principles, and the project should be safe from massive delay. We kicked off the project about this time last year, with the belief that it’d be fully done by the beginning of 2013. As with sharding, one engineer went off and implemented the new design. However, we were hit with a bunch of implementation delays: our initial design for the new queuing layer turned out not to be performant; some of our consumers turned out to rely on being run in a certain order and had to be updated; and a bunch of other small things we hadn’t accounted for. This meant we weren’t running the first events through for an extra two months, almost twice the intended project length. Even though there was another engineer closely following the design, we realized that they weren’t able to effectively question assumptions: since they weren’t actually familiar enough with the code, it was very hard to grok what the actual implementation issues were, or to help point to which problems could be worked around. That meant we weren’t able to get any of the usual benefits of an assumption questioner. Had there been someone else writing code, we would have been able to have much better conversations about it, ultimately ending up with a much better project and probably getting it done much sooner.
  22. During Stripe’s second capture the flag, a security competition we ran last September, I really wanted a launch which didn’t involve us working up to the deadline, in contrast to the first. So we internally agreed on a “soft” launch deadline and a “hard” deadline a week later — and we left plenty of time to launch by the soft deadline. But at the soft deadline, we found ourselves in a world with a bunch of work left to do. We then redoubled our efforts, and finished up with just a few minutes to spare to the hard deadline. I find this soft-hard technique works well — you’ll probably never make the soft deadline, but it at least gives you a checkpoint at which you should decide what you need to do to make the hard one. If you’re not familiar with Parkinson’s Law [1], applied to software, it’s the statement that work expands to fill the time allocated to it. It’s surprisingly true in practice. If you’re not careful, your project will expand indefinitely in timeline and scope, and you need to put in stopgaps to counteract it. The only successful way I’ve seen of fighting Parkinson’s Law is a forcing function. Maybe it’s set yourself a hard deadline, and make sure you have some incentive to get it done by then / that you can’t easily just push it back; or maybe it’s hire someone you don’t quite have the infrastructure to support, so that you are forced to actually invest in building out the right tooling for them. For what it’s worth, I used to not believe in forcing functions. To some extent, using a forcing function is an indication that you were unable to properly prioritize on your own, and so you should just get better at prioritizing rather than having to shell out to an external agent. However, in reality, I think prioritization is just a 10x harder problem than anyone gives it credit for, and setting an external forcing function is really just a hack for shifting the prioritization burden to the universe. Now, you do have to be careful. You need to make sure whatever function you set before yourself, you don’t feel boxed into shipping an inferior product. Setting external-facing deadlines is generally a bad idea, whether with customers or with the media. It can be painful, since you’d love to tell a user “this feature will ship by the end of the month”, but what do you do in the 50% of cases when the feature’s been delayed by the fact that we don’t really know how to do software? Perhaps the worst world an engineer can be in is feeling forced into a deadline that he or she didn’t choose or agree to — at Stripe, all deadlines, together with how we expose them externally, are set by the engineers working on a project. [1] http://en.wikipedia.org/wiki/Parkinson's_law
  23. Ok, so you’ve finally gotten your project ready to ship. How do you actually get it out there? At Stripe, a PM is a verb, not a person. The primary engineer on a project PM’s the launch. It’s their responsibility to make sure that all of the concerns of getting the thing shipped are taken care of. They don’t necessarily have to do everything themselves, but they should make sure it all gets done. As your company grows, there will be an increasing amount to think about surrounding a launch. Is anyone thinking about monitoring? Tracking? Performance? How this will affect existing users? It can be hard for any one person to think of and ensure all the possible concerns are met. So you need to make sure there’s a clear and predictable process for how things get launched, which everyone knows about and can participate in. The way we do this is that, a few days to a few weeks prior to launch (depending on the project), whoever’s PMing sends out a pre-shipped email. This contains the relevant details of the launch: what’s going out, what the goal is, and any other needed context. This is everyone else’s chance to ask questions, or to batten down the hatches to prepare the systems they own for launch. You know you’ve failed if you ever have someone find out about a launch at the same time the public does. It should be very clear what approval is needed in order to launch something. Depending on your organization, the answer might be “none; just go ahead and launch anything”. You probably want to have a small set of people who are trusted arbiters of product quality, who are the one source of approval needed to get something out. These days, we have a product-signoff list; you just need the approval of one of the people on that list in order to complete your launch. [1] http://www.quora.com/Stripe-company/Does-Stripe-have-product-managers-or-do-engineers-manage-the-products-themselves
  24. Sometimes, things will go wrong. Whether you misjudged how people would react to a new feature, or the site went down during routine maintenance, or you forgot to monitor some service and it silently fell over in the middle of the night, operational breakages are an expected part of the business of software. How you react to them is a key part of your culture. First of all, you should have a good postmortem culture in place. When things go wrong, it’s an opportunity for you to build up expertise for how to do them right in the future. If you’re a rapidly growing company, then this one mistake pales in comparison to repeating it 6-12 months from now, so it’s well worthwhile doing so. Our postmortems are pretty simple: we describe the effects, the root cause, and how we’ll avoid the issue in the future. When writing postmortems, it’s very important to do two things: first, postmortems shouldn’t be about finger pointing [1]. Even your best engineers will make mistakes (and honestly, it’s probably the case that your best engineers will make the most mistakes, since they are getting the most done). They should be about figuring out what actually happened, and how to make sure it doesn’t happen again. Secondly, it’s important to avoid platitudes or things you won’t actually change. It’s very easy to say “we should improve the code here” or “we should have better test coverage”, but without specific, actionable recommendations nothing is going to change. [1] http://codeascraft.com/2012/05/22/blameless-postmortems/
  25. If there’s one thing your engineering culture needs to do well, it’s collaboration. If you’re doing it right, your organization is a collection of individually-capable nodes, and then the biggest challenge you have is coordination among those nodes. If collaboration is broken, then everything else I just talked about doesn’t matter, and you’re pretty much doomed.
  26. At Stripe, we look for low-effort ways to make information accessible within the company. Usually we take communication that is already happening, and when it makes sense, shift it to a standardized public forum. That makes it way easier for others to stay in the loop, without requiring much overhead from the people generating the communications. One example is what we call email transparency [1]. The idea is that you should CC a mailing list for all emails you send, down to the “person-to-person emails that I don’t think anyone else will be interested in”. With the right list infrastructure, this allows people to passively subscribe to the feed of everything going on in the company, while only requiring marginal effort on the part of the people sending the email. (Of course, you have to be careful about how far you take this, as emails that are personal or personnel-related should in fact be kept private. We leave that judgement call up to the discretion of the author.) Another primitive that we use internally are SRFCs, or Stripe Request for Comments. These are effectively just design documents, but the interesting bit is that they live in a standardized place where anyone can comment inline. We write SRFCs for everything from new systems to conference room naming schemes to hiring strategies. All of these documents would likely get written at any other company, but simply by providing a well-known forum for them we end up with a lot of collaboration we wouldn’t otherwise. We will sometimes do more active things, such as status emails, but these other techniques make the active ones much less burdensome.
  27. Documentation is an interesting communication channel. It’s something that everyone thinks of as a must-have, and I think most people write the wrong kind of documentation, or view it in the wrong light. In-depth code documentation has an overwhelming tendency to go stale. There are no tests for it / nothing breaks when you change the code but not the docs, and so the natural tendency is for documentation to fall out of date with the code. The best documentation serves as a pointer: it gives someone new to the system enough high-level context and concepts to know where to get started, and it’s very unlikely to go stale. Like it or not, understanding what’s actually going on is going to basically always require reading the code, and you should write your documentation with that in mind.
  28. Meetings are one communication channel that get a bad rap. I think it’s not because meetings are inherently flawed, but because people’s usage of them tends to be flawed. There are two kinds of useful meetings. The first is a “let’s kick around a bunch of ideas for us to later go off and think about”. The second is a “let’s take these concrete proposals we’ve been discussing elsewhere and make a decision”. In both of these cases, it’s useful to get a lock on everyone’s time and make everyone focus simultaneously. But if you try to mix and match modes, you’ll notice nothing gets done. Also, you should be cognizant that meetings have a high fixed cost: you basically can’t write any code for the 30 minutes leading up to it, since you know that you’re just going to be interrupted. You also will spend the next 30 minutes after the meeting is over ramping back up into the zone. So, use meetings as a primitive in your culture, but make sure you’re using them properly.
  29. One of the biggest challenges with having lots of great people is figuring out to do when there’s a disagreement. Especially when it comes to architecture, there are often several plausible alternatives, and people tend to come down vigorously on one side or the other. At some point, it’s better to just make some decision than continue to debate, and you need to make sure that there’s a clear way that things happen in your organization. The way we do this at Stripe evolved from an early debate around our API’s design. Stripe’s first API was effectively JSON-RPC. We didn’t use any of the features of HTTP, such as status codes, URLs, or headers. When we were 8 people, one of the recent hires spoke up about this, saying that it was a bad idea and we should switch over to a much more REST-ful interface. This kicked off three weeks of debate, with half the company on each side of the argument. To make matters worse, within any given side each person had their own sub-opinion. We correspondingly ended up debating the sub-opinions as much as we debated the REST vs non-REST question, which clearly was just a poor use of time. Finally, the engineer who had started the debate went and implemented his proposal on a branch. We took a look: everyone agreed it felt at least as good as what we already had. We realized we’d just lost 3 weeks on this conversation, and since this was a net win we took the plunge and rolled his change to production. We realized a few things as a result of this. First of all, you just can’t have a technical debate with more than 4 people. Usually there are two or maybe three major opinions — you should make sure a representative from each is in the room, and let them hash that out, but it’s just far too inefficient to add incremental people. But perhaps more fundamentally, we realized that you do just need someone who can make the final call on what we’re doing. You want it to be someone who feels a lot of ownership over the domain, and who has great judgement that people respect, but other than that it doesn’t even really matter who it is. We split our projects into different components, and assigned someone as the owner for each of them. The owner is given final say about everything affecting their domain, and they are correspondingly also responsible for its overall quality. It’s not always a glamorous, call-the-shots role — if there’s ugly maintenance to do, while they don’t necessarily need to do it themselves, they do need to make sure it gets done. We made that engineer into the owner of the API, and we never had to sit around paralyzed for 3 weeks about a change again.
  30. One question that many organizations are faced with is “should we hire remote engineers”? It’s certainly very tempting to do so: there are many great engineers who don’t happen to live within commute distance of your office, and if you can hire them, it seems like a great way to expand your team. Getting things done as a remote engineer is largely an exercise in gathering information. You’re coming from way behind relative to your local counterparts. Every time there’s an IRL conversation, local engineers have some chance of walking by and hearing it. As a remote engineer, you’re just excluded. The only way you can know what’s going on, and consequently what’s important to be working on, is through what ends up in email, code, or IRC. Often, these communication mechanisms are less convenient than IRL conversation. (That’s the cost you incur for being able to hire these great, non-commute range engineers.) Correspondingly, people just won’t shift their communication without a forcing function — one remote person complaining about being left out of the loop just won’t be enough. The only possible way to ensure that the shift happens is via having a real team of remote engineers. So if you’re thinking of going down the remote engineer route, it can work, but you need to make sure that the teams they work on have enough distribution to make sure the pains of being remote are addressed.