SlideShare une entreprise Scribd logo
1  sur  50
Story of
How LinkedIn Used RUM & PoPs to Make
the Site Faster
Ritesh Maheshwari
Performance Engineer
LinkedIn
@ritesh
2013
2015
Story
PoP RUM
Story: In a nutshell
1. We built PoPs
2. …used RUM to assign users to Optimal PoPs
3. …found DNS based assignment is suboptimal
4. …evaluated Anycast as a solution using RUM
5. …now using Anycast to assign users to PoPs
6. …new PoPs locations chosen using RUM
What is RUM?
JavaScript (client code) to measure performance
 Navigation Timing API
 LinkedIn uses
https://github.com/lognormal/boomerang/
Metrics like:
• DNS Time
• Connection Time
• First Byte Time
• Download Time
• Page Load Time
What are PoPs?
Point of Presence / PoP
• Small-scale data centers
– Cheap, Many
• Proxy servers at LinkedIn (ATS)
World Without PoPs
Browser Data Center
connection time 250ms
World Without PoPs
Browser Data Center
connection time
server
compute
time
250ms
500ms
World Without PoPs
Browser Data Center
connection time
first
byte
time
+
page
download
time
5 RTTs = 5x250ms = 1250ms
server
compute
time
250ms
Total = 2000ms
500ms
With PoPs
Browser Data CenterPoP
100ms
250ms
With PoPs
Browser Data CenterPoP
100msconnection time
With PoPs
Browser Data CenterPoP
100msconnection time
first
byte
time
+
page
download
time
server
compute
time
500ms
With PoPs
Browser Data CenterPoP
100msconnection time
5 RTTs = 5x100ms = 500ms
Total = 1100ms
900 ms gain!first
byte
time
+
page
download
time
500ms
server
compute
time
How are users assigned to PoPs?
Through 3rd party DNS services
IP handed based on user’s resolver location
# Spain
$ dig @109.69.8.51 +short www.linkedin.com
91.225.248.80
# California
$ dig +short www.linkedin.com
216.52.242.80
Should India connect to Singapore or
Dublin?
How to assure optimal PoPs assignment?
• Geographical Distance
• Knowledge about Internet Connectivity
Lets Make Data-based Decisions
Synthetic measurements?
No realistic agent distribution across networks
No realistic agent connectivity distribution
Can our users become measurements agent?
• RUM!
RUM beacons
Fetch a tiny object from each candidate PoP
For each pop_name,
1. Start timer
2. Fetch {pop_name}.perf.linkedin.com/pop/admin
3. Stop timer
Send data back to our servers
• Millions of agents!
• Analyze data to find “optimal” PoP per country
We can assign countries to new PoPs!
Country PoP Median Beacon Time
China Hong Kong 434
China Dublin 1216
China Singapore 515
India Hong Kong 1368
India Dublin 1042
India Singapore 898
We can audit current assignment!
Country Is PoP optimal? Current PoP Optimal PoP
India TRUE Singapore Singapore
Pakistan FALSE Singapore Dublin
Spain TRUE Dublin Dublin
Brazil FALSE US West Coast US East Coast
Netherlands TRUE Dublin Dublin
UAE FALSE US West Coast Dublin
Italy TRUE Dublin Dublin
Mexico TRUE US West Coast US West Coast
Russia FALSE US West Coast Dublin
0%
5%
10%
15%
20%
25%
30%
India Pakistan Singapore Russia Brazil
PercentageImprovement LinkedIn Homepage Download Time Improvement
Median Improvement 90th Percentile Improvement
Trust, but Verify
Is 100% of India connecting to Singapore PoP?
Can RUM identify which PoP served this page view?
1. RUM is JavaScript
2. PoP selected through DNS
3. JavaScripts can’t know DNS answers
 JavaScripts can read response headers
Can RUM identify which PoP served
this page view?
After page load:
1. Fetch www.linkedin.com/pop/admin
2. Read X-Li-Pop header
3. Add that data to RUM and send it back
Offline analysis:
1. What % of traffic in India going to Singapore PoP?
2. A/B testing PoP assignment.
Plot Twist:
Assignment far from optimal
• About 31% of US traffic gets assigned to a
suboptimal PoP.
– 45% of East Coast
• About 10% of traffic globally gets assigned to a
suboptimal PoP.
DNS PoP assignment is suboptimal
• Assignment based on Resolver IP, not Client IP
DNS
Resolver
PoP
US
East
PoP
US
West
New YorkCalifornia
DNS PoP assignment is suboptimal
• Assignment based on Resolver IP, not Client IP
• Bad IP to Geo databases
– Resolver really in NY, but database says CA
Story so far
1. We built PoPs
2. …used RUM to assign users to Optimal PoPs
3. …found DNS based assignment is suboptimal
Accurate PoP assignment Problem
• Bug our DNS providers (31% -> 27%)
• Run our own DNS
How about Anycast?
Anycast – One IP, Multiple Servers
PoP A
PoP B
PoP C
Bob
1.1.1.1
1.1.1.1
1.1.1.1
• Unicast – one IP per server
• Seamless support at routing layer
• Packets routed to closest server by
# of hops
 Client IP, not Resolver IP used!
 No Geo-IP Databases
How does Anycast compare to DNS?
Will anycast send more users to optimal PoP?
Lets test it!
RUM to rescue
For each PoP:
1. Announce same anycast IP (108.174.13.10)
2. Configure a domain
ac.perf.linkedin.com to point to
108.174.13.10
RUM to rescue
For each page view:
1. RUM downloads a tiny object :
ac.perf.linkedin.com/pop/admin
2. Read X-Li-Pop response header to record which PoP served
the object
3. Send this back to LinkedIn with RUM data
Data:
1. For each user, the anycast PoP
2. For each user, the optimal PoP (from pop beacons)
Results 
Region or
Country
DNS % Optimal
Assignment
Anycast % Optimal
Assignment
Illinois 70 90
Florida 73 95
Georgia 75 93
Pennsylvania 85 95
Results 
Region or
Country
DNS % Optimal
Assignment
Anycast % Optimal
Assignment
Arizona 60 39
Brazil 88 33
New York 77 74
Fewer hops != Lower Latency
• Carriers prefer to haul packets within
their own network
• Peering can create inter-continental
short cuts
Z
X
Alice
Y
1.1.1.1
1.1.1.1
1.1.1.1
Maybe DNS wasn’t so bad
Continent level identification
City / State level identification
“Regional” Anycast
1 anycast IP per continent
Ran a RUM experiment,
all was fine Z
X
Alice
Y
2.2.2.2
1.1.1.1
1.1.1.1
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
20141206 20141208 20141210 20141212 20141214 20141216 20141218
%TrafficgoingtoOptimalPoP
Date
Illinois
Florida
North Carolina
Indiana
New York
New Jersey
Virginia
West Virginia
Louisiana
USA Ramp Results
Ramp outside
USA
In progress
Story so far
1. We built PoPs
2. …used RUM to assign users to Optimal PoPs
3. …found DNS based assignment is suboptimal
4. …evaluated Anycast as a solution using RUM
5. …now using Anycast to assign users to PoPs
Next play:
• Build more PoPs!
Build More PoPs: But where?
• Previously, based on
– Intuition
– Knowledge
• Can we use RUM data?
– Get a list of candidate PoP locations
– Build a machine learning model
– Rank locations based on most performance benefit
Plot Twist: Miami vs Sao Paolo
• Can we use RUM again?
– Fetch a tiny object and measure latency
– But we have no servers there
 Our CDNs do!
1. Get IP address of CDN servers
2. Configure a domain to point to the IP
3. RUM to fetch a tiny object ( /cdo/rum/id )
4. Measure time and report back
Miami vs Sao Paolo - Results
 Sao Paolo is better than US East Coast
 Miami is better than Sao Paolo for many in South America
Country Median beacon time
Miami Sao Paolo US East Coast
Columbia 648 970 778
Argentina 918 970 1213
Brazil 868 549 1074
Story: Recap
1. We built PoPs
2. …used RUM to assign users to Optimal PoPs
3. …found DNS based assignment is suboptimal
4. …evaluated Anycast as a solution using RUM
5. …now using Anycast to assign users to PoPs
6. …new PoPs locations chosen using RUM
10-25% gain
30% suboptimal in US
“Regional” Anycast
30% -> 10%
Story: The End
• Clients are your measurement agents
• Make data-driven decisions
• Trust, but verify
• Collaboration leads to bigger impact
Questions?
in/RiteshMaheshwari@ritesh
©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.
What about Mobile?
• Packet Gateways are likely to be in the same
part of the country as users
• Wired measurements should be applicable for
Mobile
– India wired users are closer to Singapore, Mobile
should be similar
Is Synthetic Monitoring dead?
No, not yet
• Much more powerful than RUM
• Not everything can be done in production
Next Play
• Keep evaluating Anycast
• Keep building new PoPs
• Dynamic RUM-based PoP assignment

Contenu connexe

En vedette

Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Andres Bang
 

En vedette (16)

Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
 
LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016
 
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
 
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
 
Talent Pools: Using insights to power your talent acquisition strategy
Talent Pools: Using insights to power your talent acquisition strategyTalent Pools: Using insights to power your talent acquisition strategy
Talent Pools: Using insights to power your talent acquisition strategy
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data Application
 
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
The latest in LinkedIn talent pool reports  | Talent Connect AnaheimThe latest in LinkedIn talent pool reports  | Talent Connect Anaheim
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLive Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
 
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
 
Data-Driven Best Practices for LinkedIn Sponsored InMail
Data-Driven Best Practices for LinkedIn Sponsored InMailData-Driven Best Practices for LinkedIn Sponsored InMail
Data-Driven Best Practices for LinkedIn Sponsored InMail
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

How LinkedIn used RUM to drive optimizations and make the site faster

  • 1.
  • 2. Story of How LinkedIn Used RUM & PoPs to Make the Site Faster Ritesh Maheshwari Performance Engineer LinkedIn @ritesh
  • 6. Story: In a nutshell 1. We built PoPs 2. …used RUM to assign users to Optimal PoPs 3. …found DNS based assignment is suboptimal 4. …evaluated Anycast as a solution using RUM 5. …now using Anycast to assign users to PoPs 6. …new PoPs locations chosen using RUM
  • 7. What is RUM? JavaScript (client code) to measure performance  Navigation Timing API  LinkedIn uses https://github.com/lognormal/boomerang/ Metrics like: • DNS Time • Connection Time • First Byte Time • Download Time • Page Load Time
  • 8. What are PoPs? Point of Presence / PoP • Small-scale data centers – Cheap, Many • Proxy servers at LinkedIn (ATS)
  • 9. World Without PoPs Browser Data Center connection time 250ms
  • 10. World Without PoPs Browser Data Center connection time server compute time 250ms 500ms
  • 11. World Without PoPs Browser Data Center connection time first byte time + page download time 5 RTTs = 5x250ms = 1250ms server compute time 250ms Total = 2000ms 500ms
  • 12. With PoPs Browser Data CenterPoP 100ms 250ms
  • 13. With PoPs Browser Data CenterPoP 100msconnection time
  • 14. With PoPs Browser Data CenterPoP 100msconnection time first byte time + page download time server compute time 500ms
  • 15. With PoPs Browser Data CenterPoP 100msconnection time 5 RTTs = 5x100ms = 500ms Total = 1100ms 900 ms gain!first byte time + page download time 500ms server compute time
  • 16. How are users assigned to PoPs? Through 3rd party DNS services IP handed based on user’s resolver location # Spain $ dig @109.69.8.51 +short www.linkedin.com 91.225.248.80 # California $ dig +short www.linkedin.com 216.52.242.80
  • 17. Should India connect to Singapore or Dublin? How to assure optimal PoPs assignment? • Geographical Distance • Knowledge about Internet Connectivity
  • 18. Lets Make Data-based Decisions Synthetic measurements? No realistic agent distribution across networks No realistic agent connectivity distribution Can our users become measurements agent? • RUM!
  • 19. RUM beacons Fetch a tiny object from each candidate PoP For each pop_name, 1. Start timer 2. Fetch {pop_name}.perf.linkedin.com/pop/admin 3. Stop timer Send data back to our servers • Millions of agents! • Analyze data to find “optimal” PoP per country
  • 20. We can assign countries to new PoPs! Country PoP Median Beacon Time China Hong Kong 434 China Dublin 1216 China Singapore 515 India Hong Kong 1368 India Dublin 1042 India Singapore 898
  • 21. We can audit current assignment! Country Is PoP optimal? Current PoP Optimal PoP India TRUE Singapore Singapore Pakistan FALSE Singapore Dublin Spain TRUE Dublin Dublin Brazil FALSE US West Coast US East Coast Netherlands TRUE Dublin Dublin UAE FALSE US West Coast Dublin Italy TRUE Dublin Dublin Mexico TRUE US West Coast US West Coast Russia FALSE US West Coast Dublin
  • 22. 0% 5% 10% 15% 20% 25% 30% India Pakistan Singapore Russia Brazil PercentageImprovement LinkedIn Homepage Download Time Improvement Median Improvement 90th Percentile Improvement
  • 23. Trust, but Verify Is 100% of India connecting to Singapore PoP? Can RUM identify which PoP served this page view? 1. RUM is JavaScript 2. PoP selected through DNS 3. JavaScripts can’t know DNS answers  JavaScripts can read response headers
  • 24. Can RUM identify which PoP served this page view? After page load: 1. Fetch www.linkedin.com/pop/admin 2. Read X-Li-Pop header 3. Add that data to RUM and send it back Offline analysis: 1. What % of traffic in India going to Singapore PoP? 2. A/B testing PoP assignment.
  • 25. Plot Twist: Assignment far from optimal • About 31% of US traffic gets assigned to a suboptimal PoP. – 45% of East Coast • About 10% of traffic globally gets assigned to a suboptimal PoP.
  • 26. DNS PoP assignment is suboptimal • Assignment based on Resolver IP, not Client IP DNS Resolver PoP US East PoP US West New YorkCalifornia
  • 27. DNS PoP assignment is suboptimal • Assignment based on Resolver IP, not Client IP • Bad IP to Geo databases – Resolver really in NY, but database says CA
  • 28. Story so far 1. We built PoPs 2. …used RUM to assign users to Optimal PoPs 3. …found DNS based assignment is suboptimal
  • 29. Accurate PoP assignment Problem • Bug our DNS providers (31% -> 27%) • Run our own DNS How about Anycast?
  • 30. Anycast – One IP, Multiple Servers PoP A PoP B PoP C Bob 1.1.1.1 1.1.1.1 1.1.1.1 • Unicast – one IP per server • Seamless support at routing layer • Packets routed to closest server by # of hops  Client IP, not Resolver IP used!  No Geo-IP Databases
  • 31. How does Anycast compare to DNS? Will anycast send more users to optimal PoP? Lets test it!
  • 32. RUM to rescue For each PoP: 1. Announce same anycast IP (108.174.13.10) 2. Configure a domain ac.perf.linkedin.com to point to 108.174.13.10
  • 33. RUM to rescue For each page view: 1. RUM downloads a tiny object : ac.perf.linkedin.com/pop/admin 2. Read X-Li-Pop response header to record which PoP served the object 3. Send this back to LinkedIn with RUM data Data: 1. For each user, the anycast PoP 2. For each user, the optimal PoP (from pop beacons)
  • 34. Results  Region or Country DNS % Optimal Assignment Anycast % Optimal Assignment Illinois 70 90 Florida 73 95 Georgia 75 93 Pennsylvania 85 95
  • 35. Results  Region or Country DNS % Optimal Assignment Anycast % Optimal Assignment Arizona 60 39 Brazil 88 33 New York 77 74
  • 36. Fewer hops != Lower Latency • Carriers prefer to haul packets within their own network • Peering can create inter-continental short cuts Z X Alice Y 1.1.1.1 1.1.1.1 1.1.1.1
  • 37. Maybe DNS wasn’t so bad Continent level identification City / State level identification
  • 38. “Regional” Anycast 1 anycast IP per continent Ran a RUM experiment, all was fine Z X Alice Y 2.2.2.2 1.1.1.1 1.1.1.1
  • 39. 50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00 20141206 20141208 20141210 20141212 20141214 20141216 20141218 %TrafficgoingtoOptimalPoP Date Illinois Florida North Carolina Indiana New York New Jersey Virginia West Virginia Louisiana USA Ramp Results Ramp outside USA In progress
  • 40. Story so far 1. We built PoPs 2. …used RUM to assign users to Optimal PoPs 3. …found DNS based assignment is suboptimal 4. …evaluated Anycast as a solution using RUM 5. …now using Anycast to assign users to PoPs Next play: • Build more PoPs!
  • 41. Build More PoPs: But where? • Previously, based on – Intuition – Knowledge • Can we use RUM data? – Get a list of candidate PoP locations – Build a machine learning model – Rank locations based on most performance benefit
  • 42. Plot Twist: Miami vs Sao Paolo • Can we use RUM again? – Fetch a tiny object and measure latency – But we have no servers there  Our CDNs do! 1. Get IP address of CDN servers 2. Configure a domain to point to the IP 3. RUM to fetch a tiny object ( /cdo/rum/id ) 4. Measure time and report back
  • 43. Miami vs Sao Paolo - Results  Sao Paolo is better than US East Coast  Miami is better than Sao Paolo for many in South America Country Median beacon time Miami Sao Paolo US East Coast Columbia 648 970 778 Argentina 918 970 1213 Brazil 868 549 1074
  • 44. Story: Recap 1. We built PoPs 2. …used RUM to assign users to Optimal PoPs 3. …found DNS based assignment is suboptimal 4. …evaluated Anycast as a solution using RUM 5. …now using Anycast to assign users to PoPs 6. …new PoPs locations chosen using RUM 10-25% gain 30% suboptimal in US “Regional” Anycast 30% -> 10%
  • 45. Story: The End • Clients are your measurement agents • Make data-driven decisions • Trust, but verify • Collaboration leads to bigger impact
  • 47. ©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.
  • 48. What about Mobile? • Packet Gateways are likely to be in the same part of the country as users • Wired measurements should be applicable for Mobile – India wired users are closer to Singapore, Mobile should be similar
  • 49. Is Synthetic Monitoring dead? No, not yet • Much more powerful than RUM • Not everything can be done in production
  • 50. Next Play • Keep evaluating Anycast • Keep building new PoPs • Dynamic RUM-based PoP assignment

Notes de l'éditeur

  1. Here's a slide summarizing the high level story of how it went. There were ups and downs, success and failure and interesting plot twists on the way. Overall it was a bumpy ride.
  2. Simplified example
  3. That’s a 900ms gain! This is huge! Just by adding a proxy server closer to the user, we have managed to gain 900ms in download time.
  4. So we know now that PoPs are great, lets get into the mechanics of how a user gets assigned to a PoP. At LinkedIn, our primary methodology of doing this is through DNS.
  5. There are two obvious ways to do this:….. But we know that network distance is not equal to geographical distance
  6. Median time to download the tiny object
  7. This is a slide that makes me very happy. This is the .....
  8. Not so fast...... I always like to verify major decisions with data. In this case, we wanted to verify whether the changes we made were effective.
  9. So, what did we find?
  10. We found that ...
  11. Digging deeper
  12. 15 mins more
  13. And how do we test it? Lets use RUM
  14. But of course, there was another plot twist. Can you add graph instead?
  15. So why was it happening?
  16. So we came up with what we “regional” anycast concept. Basically, we will still use DNS, but only to decide which continent is this user from. Within a continent, we will use anycast
  17. 5 min more
  18. Pulled some strings.