SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Wikimedia Architecture
Doing More With Less
Asher Feldman <asher@wikimedia.org>
Ryan Lane <ryan@wikimedia.org>
Wikimedia Foundation Inc.
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Overview
• Intro
• Scale at WMF
• How We Work
• Architecture Dive
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Company Users Revenue Employees Server count
1+ billion $37.9B 32,000+ 1,000,000+
905 million $69B 93,000 50,000+
714 million $3.8B 3,000+ 60,000+
689 million $6B 1,200 50,000+
490 million $30m 75 700
Top Five Worldwide Sites (Q411)
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
What We're Working
With
• > 490 Million unique visitors per month
• 1.6 Billion ArticleViews per day
• 10 Billion HTTP Requests Served per day
• Mostly served by around 130 Apache/PHP/memcached,
50 MySQL, 100 Squid/Varnish, 24 Lucene, and 20 swift +
image processing servers
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Open Source, Open
Culture
• Around 25 Full Time Software and Operations Engineers,
but a global community of Open Source developers
helping to develop and extend MediaWiki
• Even Operations follows the model – our puppet cluster
confguration code is public, as are our monitoring
services. Volunteers can model changes in our virtualized
Labs cluster
• Everything we use is Open Source. We often modify apps
(such as squid) to meet our needs and scale, but we
always release the changes
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Basic Architecture
Users
GeoDNS
LVS
(DR)
Media
Varnish
API
Squid
Text
Squid
Bits
Varnish
(js, css, etc)
Mobile
Varnish
Swift
ResourceLoader
Apache
PHP
App Servers
Apache
PHP
MediaWiki
LVS
(DR)
API Servers
Apache
PHP
MediaWiki
Image
Scalers
Memcached
Parser
Cache
Core
MySQL
Revision
MySQLLucene
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
The Wiki software
• All Wikimedia projects run on a MediaWiki platform
• Designed primarily for Wikimedia sites
• Very scalable (mostly), very good localization
• Open Source PHP software (GPL)
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
MediaWiki optimization
• We try to optimize by...
• not doing anything stupid (this is hard!)
• caching expensive operations
• focusing on the hot spots in the code (profling!)
• If a MediaWiki feature is too expensive, it doesn’t get
enabled on Wikipedia
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
MediaWiki caching
• Objects + queries cached in memcached
• Article html (parsed wikitext) stored in Parser Cache –
combined memcached + MySQL blob storage, to scale
far beyond available ram
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Wikimedia architecture
MediaWiki profling
http://gdash.wikimedia.org/
(and others)
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Core databases
• Mysql 5.1 with the Facebook patch set
• One master per shard, many replicated slaves
• Reads are load balanced amongst unlagged slaves, writes
to master
• Separate big, popular wikis from smaller wikis (shard by
wiki, but we need to do better to keep scaling)
• Profle with mk-query-digest and the percona-toolkit
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Database profling
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Squid caching
• Caching reverse HTTP proxy
• Serves our article and api traffc, other clusters have
moved toVarnish
• Uses CARP to consistently route requests to backend
caches
• Hit rate is > 85%
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Wikimedia architecture
CARP
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Varnish Caching
• Used for delivering static content such as js and css fles
(bits), media (uploads), and our mobile site
• 2-3 times more effcient than Squid
• We've served >40k requests/second from a single server
• URI hashing between tiered caches replaces CARP (we
are adding consistent hashing toVarnish)
• Will eventually replace squid entirely in our
infrastructure, but some development is still needed
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Cache invalidation
• Wiki pages are edited at an unpredictable rate
• Users should always see current revision
• Invalidation through expiry times not acceptable
• Purge implemented using multicast UDP based HTCP
protocol
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Media storage
• Images inline in articles (thumbnails) are stored in Swift,
recently replacing an unscalable NFS based solution
• Part of OpenStack, Swift is like an Open Source
implementation of S3 (even supports the S3 API)
• Soon originals and video will be stored in and served
from swift as well
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Content Distribution
Network (CDN)
• 3 clusters on 2 different continents:
• Redundant application and cache clusters in Tampa,
Florida and Ashburn,Virginia. Currently Tampa is our
primary application cluster but cached traffc is served
from Ashburn
• Europe served by a caching-only cluster in Amsterdam
(uncached content is proxied to Tampa)
• Soon building a new caching cluster in California which
will lower latency to Asia
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Geographic Load
Balancing
• Most users use DNS resolver close to them
• Map IP address of resolver to a country code
• Deliver CNAME of close datacenter entry based on
country
• Using PowerDNS with a Geobackend
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Wikimedia architecture
Load Balancing: LVS-DR
• Linux Virtual Server
• Direct Routing mode
• All real servers share the
same IP address
• The load balancer divides
incoming traffc over the
real servers
• Return traffc goes
directly!
Closing notes
• We rely heavily on open source
• Still some big scaling issues to tackle around parsing and editing
• Always looking for effciencies
• Looking for more effcient management tools
• Looking for more contributors
Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc.
Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc.
Questions, comments?
• E-mail: Asher Feldman <afeldman@wikimedia.org>

Contenu connexe

Tendances

Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users mauerbac
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013Amazon Web Services
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReadersRakuten Group, Inc.
 
"How to optimize the architecture of your platform" by Julien Simon
"How to optimize the architecture of your platform" by Julien Simon"How to optimize the architecture of your platform" by Julien Simon
"How to optimize the architecture of your platform" by Julien SimonTheFamily
 
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS Amazon Web Services
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Amazon Web Services
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling PinterestC4Media
 
Website design & developemet
Website design & developemetWebsite design & developemet
Website design & developemetApurva Tripathi
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Aplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAmazon Web Services
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBAmar Das
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosAmazon Web Services LATAM
 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaGeorge Wilson
 
WKS404 7 Things You Must Know to Build Better Alexa Skills
WKS404 7 Things You Must Know to Build Better Alexa SkillsWKS404 7 Things You Must Know to Build Better Alexa Skills
WKS404 7 Things You Must Know to Build Better Alexa SkillsAmazon Web Services
 

Tendances (20)

Apache Jackrabbit
Apache JackrabbitApache Jackrabbit
Apache Jackrabbit
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders
[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders
 
"How to optimize the architecture of your platform" by Julien Simon
"How to optimize the architecture of your platform" by Julien Simon"How to optimize the architecture of your platform" by Julien Simon
"How to optimize the architecture of your platform" by Julien Simon
 
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS
AWS Public Sector Symposium 2014 Canberra | Black Belt Tips on AWS
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
Website design & developemet
Website design & developemetWebsite design & developemet
Website design & developemet
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Aplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuarios
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
Java to scala
Java to scalaJava to scala
Java to scala
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuarios
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
 
WKS404 7 Things You Must Know to Build Better Alexa Skills
WKS404 7 Things You Must Know to Build Better Alexa SkillsWKS404 7 Things You Must Know to Build Better Alexa Skills
WKS404 7 Things You Must Know to Build Better Alexa Skills
 

En vedette

Aldo Rossi and The Architecture of the City
Aldo Rossi and The Architecture of the CityAldo Rossi and The Architecture of the City
Aldo Rossi and The Architecture of the Cityhollan12
 
Post modernism powerpoint
Post modernism powerpointPost modernism powerpoint
Post modernism powerpointjweber0205
 
Post modern architecture
Post modern architecturePost modern architecture
Post modern architectureAlex Waktola
 
Postmodernism lesson 2 ppt
Postmodernism lesson 2 pptPostmodernism lesson 2 ppt
Postmodernism lesson 2 pptsandraoddy2
 
Post-Modern Architecture and the architects involoved in it.
Post-Modern Architecture and the architects involoved in it.Post-Modern Architecture and the architects involoved in it.
Post-Modern Architecture and the architects involoved in it.Rohit Arora
 
Powerpoint(post modernism)
Powerpoint(post modernism)Powerpoint(post modernism)
Powerpoint(post modernism)Anthony Nguyen
 
Modernism & postmodernism in architecture
Modernism & postmodernism in architectureModernism & postmodernism in architecture
Modernism & postmodernism in architectureHarshita Singh
 

En vedette (10)

Post modernism
Post modernismPost modernism
Post modernism
 
Ludwig Mies van der rohe
Ludwig Mies van der roheLudwig Mies van der rohe
Ludwig Mies van der rohe
 
Aldo Rossi and The Architecture of the City
Aldo Rossi and The Architecture of the CityAldo Rossi and The Architecture of the City
Aldo Rossi and The Architecture of the City
 
works of Aldo Rossi
works of Aldo Rossiworks of Aldo Rossi
works of Aldo Rossi
 
Post modernism powerpoint
Post modernism powerpointPost modernism powerpoint
Post modernism powerpoint
 
Post modern architecture
Post modern architecturePost modern architecture
Post modern architecture
 
Postmodernism lesson 2 ppt
Postmodernism lesson 2 pptPostmodernism lesson 2 ppt
Postmodernism lesson 2 ppt
 
Post-Modern Architecture and the architects involoved in it.
Post-Modern Architecture and the architects involoved in it.Post-Modern Architecture and the architects involoved in it.
Post-Modern Architecture and the architects involoved in it.
 
Powerpoint(post modernism)
Powerpoint(post modernism)Powerpoint(post modernism)
Powerpoint(post modernism)
 
Modernism & postmodernism in architecture
Modernism & postmodernism in architectureModernism & postmodernism in architecture
Modernism & postmodernism in architecture
 

Similaire à Wikimedia-Architecture-More-With-Less

HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General SessionCloudera, Inc.
 
Evaluating Drupal for the Enterprise
Evaluating Drupal for the EnterpriseEvaluating Drupal for the Enterprise
Evaluating Drupal for the Enterpriseultimike
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Project RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxProject RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxOpenStack Foundation
 
Scaling Social Games
Scaling Social GamesScaling Social Games
Scaling Social GamesPaolo Negri
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaSGleicon Moraes
 
"Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful..."Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful...softwaretrainer2elys
 
7 Apache Process Cloudstack Developer Day
7 Apache Process Cloudstack Developer Day7 Apache Process Cloudstack Developer Day
7 Apache Process Cloudstack Developer DayKimihiko Kitase
 
End to-end W3C - JS.everywhere(2012) Europe
End to-end W3C - JS.everywhere(2012) EuropeEnd to-end W3C - JS.everywhere(2012) Europe
End to-end W3C - JS.everywhere(2012) EuropeAlexandre Morgaut
 
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...WGBH Media Library and Archives
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud Colin Charles
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionIWMW
 

Similaire à Wikimedia-Architecture-More-With-Less (20)

HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General Session
 
Evaluating Drupal for the Enterprise
Evaluating Drupal for the EnterpriseEvaluating Drupal for the Enterprise
Evaluating Drupal for the Enterprise
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Project RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxProject RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptx
 
Scaling Social Games
Scaling Social GamesScaling Social Games
Scaling Social Games
 
Data Science
Data ScienceData Science
Data Science
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
"Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful..."Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful...
 
7 Apache Process Cloudstack Developer Day
7 Apache Process Cloudstack Developer Day7 Apache Process Cloudstack Developer Day
7 Apache Process Cloudstack Developer Day
 
End to-end W3C - JS.everywhere(2012) Europe
End to-end W3C - JS.everywhere(2012) EuropeEnd to-end W3C - JS.everywhere(2012) Europe
End to-end W3C - JS.everywhere(2012) Europe
 
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Migrating to aws
Migrating to awsMigrating to aws
Migrating to aws
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource Condition
 
Openstack Xen and XCP
Openstack Xen and XCPOpenstack Xen and XCP
Openstack Xen and XCP
 

Wikimedia-Architecture-More-With-Less

  • 1. Wikimedia Architecture Doing More With Less Asher Feldman <asher@wikimedia.org> Ryan Lane <ryan@wikimedia.org> Wikimedia Foundation Inc.
  • 2. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Overview • Intro • Scale at WMF • How We Work • Architecture Dive
  • 3. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Company Users Revenue Employees Server count 1+ billion $37.9B 32,000+ 1,000,000+ 905 million $69B 93,000 50,000+ 714 million $3.8B 3,000+ 60,000+ 689 million $6B 1,200 50,000+ 490 million $30m 75 700 Top Five Worldwide Sites (Q411)
  • 4. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. What We're Working With • > 490 Million unique visitors per month • 1.6 Billion ArticleViews per day • 10 Billion HTTP Requests Served per day • Mostly served by around 130 Apache/PHP/memcached, 50 MySQL, 100 Squid/Varnish, 24 Lucene, and 20 swift + image processing servers
  • 5. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Open Source, Open Culture • Around 25 Full Time Software and Operations Engineers, but a global community of Open Source developers helping to develop and extend MediaWiki • Even Operations follows the model – our puppet cluster confguration code is public, as are our monitoring services. Volunteers can model changes in our virtualized Labs cluster • Everything we use is Open Source. We often modify apps (such as squid) to meet our needs and scale, but we always release the changes
  • 6. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Basic Architecture Users GeoDNS LVS (DR) Media Varnish API Squid Text Squid Bits Varnish (js, css, etc) Mobile Varnish Swift ResourceLoader Apache PHP App Servers Apache PHP MediaWiki LVS (DR) API Servers Apache PHP MediaWiki Image Scalers Memcached Parser Cache Core MySQL Revision MySQLLucene
  • 7. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. The Wiki software • All Wikimedia projects run on a MediaWiki platform • Designed primarily for Wikimedia sites • Very scalable (mostly), very good localization • Open Source PHP software (GPL)
  • 8. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. MediaWiki optimization • We try to optimize by... • not doing anything stupid (this is hard!) • caching expensive operations • focusing on the hot spots in the code (profling!) • If a MediaWiki feature is too expensive, it doesn’t get enabled on Wikipedia
  • 9. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. MediaWiki caching • Objects + queries cached in memcached • Article html (parsed wikitext) stored in Parser Cache – combined memcached + MySQL blob storage, to scale far beyond available ram
  • 10. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Wikimedia architecture MediaWiki profling http://gdash.wikimedia.org/ (and others)
  • 11. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Core databases • Mysql 5.1 with the Facebook patch set • One master per shard, many replicated slaves • Reads are load balanced amongst unlagged slaves, writes to master • Separate big, popular wikis from smaller wikis (shard by wiki, but we need to do better to keep scaling) • Profle with mk-query-digest and the percona-toolkit
  • 12. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Database profling
  • 13. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Squid caching • Caching reverse HTTP proxy • Serves our article and api traffc, other clusters have moved toVarnish • Uses CARP to consistently route requests to backend caches • Hit rate is > 85%
  • 14. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Wikimedia architecture CARP
  • 15. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Varnish Caching • Used for delivering static content such as js and css fles (bits), media (uploads), and our mobile site • 2-3 times more effcient than Squid • We've served >40k requests/second from a single server • URI hashing between tiered caches replaces CARP (we are adding consistent hashing toVarnish) • Will eventually replace squid entirely in our infrastructure, but some development is still needed
  • 16. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Cache invalidation • Wiki pages are edited at an unpredictable rate • Users should always see current revision • Invalidation through expiry times not acceptable • Purge implemented using multicast UDP based HTCP protocol
  • 17. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Media storage • Images inline in articles (thumbnails) are stored in Swift, recently replacing an unscalable NFS based solution • Part of OpenStack, Swift is like an Open Source implementation of S3 (even supports the S3 API) • Soon originals and video will be stored in and served from swift as well
  • 18. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Content Distribution Network (CDN) • 3 clusters on 2 different continents: • Redundant application and cache clusters in Tampa, Florida and Ashburn,Virginia. Currently Tampa is our primary application cluster but cached traffc is served from Ashburn • Europe served by a caching-only cluster in Amsterdam (uncached content is proxied to Tampa) • Soon building a new caching cluster in California which will lower latency to Asia
  • 19. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Geographic Load Balancing • Most users use DNS resolver close to them • Map IP address of resolver to a country code • Deliver CNAME of close datacenter entry based on country • Using PowerDNS with a Geobackend
  • 20. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Wikimedia architecture Load Balancing: LVS-DR • Linux Virtual Server • Direct Routing mode • All real servers share the same IP address • The load balancer divides incoming traffc over the real servers • Return traffc goes directly!
  • 21. Closing notes • We rely heavily on open source • Still some big scaling issues to tackle around parsing and editing • Always looking for effciencies • Looking for more effcient management tools • Looking for more contributors
  • 22. Asher Feldman, asher@wikimedia.org,Wikimedia Foundation Inc. Ryan Lane, ryan@wikimedia.org,Wikimedia Foundation Inc. Questions, comments? • E-mail: Asher Feldman <afeldman@wikimedia.org>