Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
#twitterflight
The Open Source… 
Behind the Tweets 
October 22, 2014 #twitterflight
Open source is everywhere! 
On your phone, in your car… and within Twitter! 
! 
http://www4.mercedes-benz.com/manual-cars/...
Chris Aniszczyk 
Head of Open Source 
@cra
Twitter runs on Open Source
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Tweet!
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Netty at Twitter 
Netty is open source Java NIO framework 
Used heavily at Twitter 
Healthy adopter community: 
http://net...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch 
...
Finagle at Twitter 
Why Scala? 
Scala enables succinct expression (vs Java) 
Less typing is less reading; brevity enhances...
Finagle Service Example 
// #1 Create a client for each service! 
val timelineSvc = Thrift.newIface[TimelineService](...)!...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
MySQL at Twitter 
Maintain a public fork of v5.5/v5.6 
Goal is to“work” with upstream 
https://github.com/twitter/mysql 
C...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Redis at Twitter 
Redis is used for caching timelines and more! 
Added custom logging, data structures 
We are working to ...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
Everyone searches for tweets: htt...
Lucene (earlybird) at Twitter 
Earlybird* is Twitter’s real-time search engine 
built on top of Apache Lucene 
! 
We optim...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch 
...
Parquet/Scalding at Twitter 
Parquet* is a columnar storage format 
Initially a collaboration between Twitter/Cloudera 
In...
Scalding Example 
import com.twitter.scalding._! 
! 
// can’t have a Hadoop example without word count!! 
class WordCountJ...
Life of a Tweet 
What open source technology do we use behind the scenes when we tweet? 
tweet write fanout search batch f...
Sharing is caring, contribute! 
Lets all make Twitter better! 
! 
! 
! 
opensource.twitter.com https://github.com/twitter
New Open Source API Samples 
Hack on the samples and improve them! 
https://github.com/twitterdev (t.co/code) 
! 
Also, la...
Thank You
Q&A 
The Open Source Behind the Tweets 
http://opensource.twitter.com 
! 
Hope you learned something new! 
Come see us at ...
Resources 
https://opensource.twitter.com 
https://github.com/twitter/finagle 
https://github.com/twitter/zipkin 
https://...
Backup Slides 
October 22, 2014 #twitterflight
Where does it all run? 
Main concept: Datacenter as a computer 
Aggregation and not virtualization 
! 
! 
! 
mesos.apache....
Profiles 
Search / S&R 
Trends / S&R 
Home timeline / TLS 
PTw / Ads 
Compose 
Contact import / 
Growth 
DMs / Social Disc...
Prochain SlideShare
Chargement dans…5
×

The Open Source... Behind the Tweets

2 395 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

The Open Source... Behind the Tweets

  1. 1. #twitterflight
  2. 2. The Open Source… Behind the Tweets October 22, 2014 #twitterflight
  3. 3. Open source is everywhere! On your phone, in your car… and within Twitter! ! http://www4.mercedes-benz.com/manual-cars/ba/foss/content/en/assets/FOSS_licences.pdf iOS: General->About->Legal->Legal Notices ! Vine: General->About->Legal !
  4. 4. Chris Aniszczyk Head of Open Source @cra
  5. 5. Twitter runs on Open Source
  6. 6. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  7. 7. Tweet!
  8. 8. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin https://dev.twitter.com/rest/reference/post/statuses/update Your first stop as a tweet: Twitter Front End (TFE) A fancy reverse proxy for HTTP traffic built on the JVM Handles authentication, rate limits and more! Powered by the open source project Netty: http://netty.io
  9. 9. Netty at Twitter Netty is open source Java NIO framework Used heavily at Twitter Healthy adopter community: http://netty.io/wiki/adopters.html ! Cloudhopper sends billions of SMS messages per month using Netty https://github.com/twitter/cloudhopper-smpp ! We contributed SPDY support to Netty: http://netty.io/news/2012/02/04/3-3-1-spdy.html *https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead
  10. 10. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin Twitter backend architecture is *service-oriented (on the JVM) Core services are built on top of Finagle (using an API framework) Finagle is written in Scala and built on top of Netty https://github.com/twitter/finagle *http://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture
  11. 11. Finagle at Twitter Why Scala? Scala enables succinct expression (vs Java) Less typing is less reading; brevity enhances clarity Two open source Scala/Finagle guides from Twitter: https://twitter.github.io/effectivescala/ https://twitter.github.io/scala_school/ ! Finagle is our fault tolerant protocol-agnostic RCP framework built on Netty Emphasizes services modularity via async futures Handles failover semantics, metrics, logging etc… *https://blog.twitter.com/2014/netty-at-twitter-with-finagle
  12. 12. Finagle Service Example // #1 Create a client for each service! val timelineSvc = Thrift.newIface[TimelineService](...)! val tweetSvc = Thrift.newIface[TweetService](...)! val authSvc = Thrift.newIface[AuthService](...)! ! // #2 Create new Filter to authenticate incoming requests! val authFilter = Filter.mk[Req, AuthReq, Res, Res] { (req, svc) =>! authSvc.authenticate(req) flatMap svc(_)! }! ! // #3 Create a service to convert an authenticated timeline request to a json response! val apiService = Service.mk[AuthReq, Res] { req =>! timelineSvc(req.userId) flatMap {tl =>! val tweets = tl map tweetSvc.getById(_)! Future.collect(tweets) map tweetsToJson(_) }! }! }! ! // #4 Start a new HTTP server on port 80 using the authenticating filter and our service! Http.serve(":80", authFilter andThen apiService)!
  13. 13. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  14. 14. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin Tweets need to be stored somewhere (via a Finagle-based core service) TBird: persistent storage for tweets Built originally on Gizzard: https://github.com/twitter/gizzard Tweets stored in sharded and replicated MySQL TFlock: track relations between users and tweets Built originally on FlockDB: https://github.com/twitter/flockdb
  15. 15. MySQL at Twitter Maintain a public fork of v5.5/v5.6 Goal is to“work” with upstream https://github.com/twitter/mysql Co-founded the WebScaleSQL.org effort
  16. 16. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  17. 17. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin When a tweet is generated it needs to be written to all relevant timelines Timelines are essentially a list of tweet ids (heavily cached) Fanout is the process where tweets are delivered to timelines For caching we rely on the open source project Redis https://github.com/antirez/redis
  18. 18. Redis at Twitter Redis is used for caching timelines and more! Added custom logging, data structures We are working to upstream some changes… @thinkingfish gave a fantastic talk on this: https://www.youtube.com/watch?v=rP9EKvWt0zo ! Open Source Proxy for Redis: Twemproxy https://github.com/twitter/twemproxy Used by Vine, Pinterest, Wikimedia, Snapchat etc…
  19. 19. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  20. 20. Life of a Tweet What open source technology do we use behind the scenes when we tweet? Everyone searches for tweets: https://dev.twitter.com/rest/public/search In fact, one of the most heavily trafficked search engines in the world Back in the day, Twitter search was built on MySQL Today, Twitter search is an optimized real-time search/indexing technology Powered by Apache Lucene: http://lucene.apache.org ! ! tweet write fanout search batch fin
  21. 21. Lucene (earlybird) at Twitter Earlybird* is Twitter’s real-time search engine built on top of Apache Lucene ! We optimized Lucene (cut corners) to handle tweets only since that’s all we do e.g., less space: 140 characters only need 8 bits ! Read about Blender, our search front-end https://blog.twitter.com/2011/twitter-search-now-3x-faster *http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf
  22. 22. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  23. 23. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch Hadoop is used for many things at Twitter, like counting words :) scribe logs, batch processing, recommendations, trends, user modeling and more! 10,000+ hadoop servers, 100,000+ daily hadoop jobs,10M+ daily hadoop tasks Parquet is a columnar storage format for Hadoop https://parquet.incubator.apache.org Scalding is our Scala DSL for writing Hadoop jobs https://github.com/twitter/scalding ! ! fin
  24. 24. Parquet/Scalding at Twitter Parquet* is a columnar storage format Initially a collaboration between Twitter/Cloudera Inspired by Google Dremel paper** Now at Apache: http://parquet.incubator.apache.org/ ! Scalding built on top of Scala and Cascading https://github.com/Cascading/cascading Makes it easier* to write Hadoop jobs (using Scala) *https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
  25. 25. Scalding Example import com.twitter.scalding._! ! // can’t have a Hadoop example without word count!! class WordCountJob(args : Args) extends Job(args) {! TextLine( args("input") )! .flatMap('line -> 'word) { line : String => line.split("""s+""") }! .groupBy('word) { _.size }! .write( Tsv( args("output") ) )! } https://github.com/twitter/scalding/wiki/Rosetta-Code
  26. 26. Life of a Tweet What open source technology do we use behind the scenes when we tweet? tweet write fanout search batch fin
  27. 27. Sharing is caring, contribute! Lets all make Twitter better! ! ! ! opensource.twitter.com https://github.com/twitter
  28. 28. New Open Source API Samples Hack on the samples and improve them! https://github.com/twitterdev (t.co/code) ! Also, later today check out the lightning talk by Andrew Noonan later about the “Twitter’s developer toolbox” !
  29. 29. Thank You
  30. 30. Q&A The Open Source Behind the Tweets http://opensource.twitter.com ! Hope you learned something new! Come see us at the @TwitterOSS Booth! Chris Aniszczyk (@cra)
  31. 31. Resources https://opensource.twitter.com https://github.com/twitter/finagle https://github.com/twitter/zipkin https://github.com/twitter/scalding https://github.com/twitter/mysql https://github.com/twitter/twemproxy https://twitter.github.io/scala_school http://webscalesql.org http://mesos.apache.org http://parquet.incubator.apache.org !
  32. 32. Backup Slides October 22, 2014 #twitterflight
  33. 33. Where does it all run? Main concept: Datacenter as a computer Aggregation and not virtualization ! ! ! mesos.apache.org framework offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM masters aurora.incubator.apache.org
  34. 34. Profiles Search / S&R Trends / S&R Home timeline / TLS PTw / Ads Compose Contact import / Growth DMs / Social Discover / S&R WtF / S&R

×