Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1SUVxKa.
Roy Rapoport demonstrates the power of alignment (or lack thereof) using real-world examples from previous and current employer, with specific emphasis on his experience introducing Python to production use within Netflix, the organizational structures he interacted with through that process, and the way they tie into Netflix's formal culture. Filmed at qconsf.com.
Roy Rapoport manages the Insight Engineering group at Netflix, responsible for building Netflix's Operational Insight platforms, including cloud telemetry, alerting, and real-time analytics". Roy has been in tech for about 20 years with positions in IT engineering and operations, software development, and software quality engineering, but his passion remains with operations and automation.
Scaling API-first – The story of a global engineering organization
Culture and the Games People Play
1. Culture and the Games
People Play
Roy Rapoport
rsr@netflix.com @royrapoport
November 18, 2015
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/culture-games
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
11. A Word About Netflix …
• Clear Priorities
1. Innovation
2. Availability
3. Cost
• Hire smart, experienced, people
• Get out of the way
• Anti-process bias
Culture
13. Dozens of SSL Certificates
Decentralized
Kept Expiring
Hilarity would ensue
Amazon Resources
“No Preset Limit”
You know when you hit it
Hilarity would ensue
The Before Time
14. Well-developed Developer Ecosystem
Service Discovery
DB Client
Credentials Management
Memory Object Cache
Server Infrastructure
Telemetry
You wanted that for Java, right?
The Before Time
15. Just moved from IT/Ops
Formally tasked with SSL cert
issue as quarterly goal
Limits issue “tacked” on
“Effective” in Python
Didn’t know Java
Presenter Selfie
The Before Time
16. Ported necessary libraries to Python
Boss was dubious. Really dubious.
Ran into security problem
Introducing Jay
No Problem!
18. Conceived by Reliability Engineer
Remote Telemetry Network
Teams involved:
Reliability Engineering
Insight Engineering
Performance Engineering
Some others …
Surprise!
“Proof-of-concept work
on Ansible
configuration
management for Gulo
and Hammerhead.”
19. Avoid Zero-Sum Games
Stack ranking
Fixed bonus / raise pools
No ranking/quantifying
Reviews != raises
Decentralize collaboration
Align goals
I want:
Collaboration and Selflessness
27. The Override Bar
A Bold Proposal
Totally duplicates functionality
Customized fit
Failed the override bar:
Am I sure this is the wrong thing?
If I’m right, will this be very expensive for us?
28. The Override Bar
Accomplished predicted results
Massively simplified operational processes
Improved resiliency and velocity
Unpredictable results
Used by other teams
Inspiration
Will retire
35. Literally* no downsides!
* For very non-literal definitions of the word “literally”
Predictability tradeoffs
Locality optimization
Duplication
Duplication
36. Agility vs Predictability
Neither is bad
Probably need some of both
Do you know how much you want?
Do you have it?
Agility Predictability
37. Agility vs Predictability
Optimize for agility
Constrain predictability
Some things are important to predict
Public KPIs
Big product plans
Fewer are important than you may think
Agility Predictability
38. If a Thing can be built anywhere
Not always in the best place
Extra work
Locality Optimization
Or lack thereof
42. Scryer Architecture, v1
Real-Time Telemetry System
2 weeks of data
Telemetry
Extractor
Telemetry Persistence
4 weeks of data
Predictor
Signal Predictions
Today
Product
Value-add
Process
Waste of
Time
Pain the
[REDACTED]
43. The Thing Is …
Real-Time Telemetry System
2 weeks of data
Cloud Storage
All telemetry, forever
ETL
44. Scryer Architecture, v2
Real-Time Telemetry System
2 weeks of data
Predicted Signal Today
Predictor
Product
Value-add
Process
Cloud Storage
All telemetry, forever
ETL
47. "I only want to ride the wind
and walk the waves, slay the
big whales of the Eastern sea,
clean up frontiers, and save
the people from drowning.
Why should I imitate others,
bow my head, stoop over and
be a slave?” - Lady Triệu