Ce diaporama a bien été signalé.

MongoFr : MongoDB as a log Collector

11

Partager

Chargement dans…3
×
1 sur 37
1 sur 37

MongoFr : MongoDB as a log Collector

11

Partager

Télécharger pour lire hors ligne

MongoDB can be used simply as a log collector using for example a capped collection. Fotopedia has such a system which is used for quick introspection and realtime analysis.

Speech done the 23rd of March, 2011 at MongoFR days in Paris, la Cantine by Pierre Baillet and Mathieu Poumeyrol

MongoDB can be used simply as a log collector using for example a capped collection. Fotopedia has such a system which is used for quick introspection and realtime analysis.

Speech done the 23rd of March, 2011 at MongoFR days in Paris, la Cantine by Pierre Baillet and Mathieu Poumeyrol

Plus De Contenu Connexe

MongoFr : MongoDB as a log Collector

  1. 1. MONGODB AS A LOG COLLECTOR photo by Jean-Michel BAUD Pierre Bai!et & Mathieu Poumeyrol oct & kali @ fotopedia.com
  2. 2. DB.SLIDES.FIND({‘TYPE’:‘TITLE’}) Fotopedia, who we are, what we do, how we do MongoDB at Fotopedia, current state of our art Logging, the answer to life, the universe and everything How we fullfilled this need Log usage on a daily basis Future work
  3. 3. FOTOPEDIA «Photos de fami!e»
  4. 4. FOTOPEDIA WHO ARE WE ? Company created in 2006 Located in Paris, near the Opéra 17 people, including 8 MongoDB regular users (aka developers) we’re hiring
  5. 5. FOTOPEDIA WHAT DO WE DO ? Images for Humanity Open to anyone, Amateur or professionnal Creative Commons aware Beautiful Wikipedia (http://www.fotopedia.com) iPad tablebooks (iPhone too): Heritage, National Parks and Memory of Color
  6. 6. INFRASTRUCTURE Based on Amazon Web Services Around 20 servers located in the US datacenters Use centralized deployment procedure (Chef) Deploy at least once a week with no downtime
  7. 7. KEY TECHNOLOGIES Ruby on Rails (with REE) Lackr (in house java proxy) Unicorn Sinatra Varnish Redis and Resque HAProxy Mysql NGinx MongoDB
  8. 8. MONGODB AT FOTOPEDIA «C:UtilisateursfotopediaMes Documents»
  9. 9. CURRENT STATE OF OUR ART Last year speech about our MongoDB powered metacache Store complete Wikipedia data in > 10 languages Since spring 2010, all new database-centric features have been developped with MongoDB Our goal : slowly migrate all DB feature to MongoDB whenever possible
  10. 10. MYSQL MIGRATIONS Alter table 30 22.5 15 7.5 0 08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
  11. 11. OUR SETUP 4 clusters (business data, log and reporting, wikipedia, and one more) 3 EC-2 XL virtual machines hosting 5 replica-set at the current time, one machine is master on all RS 5 replica-set are allocated to one of the clusters every instance holds the 4 mongos
  12. 12. SOME FIGURES in production since september 2009 wikipedia data: wikipedia/en: 5GB, 8M documents (and about 10 other languages), batch load: 17k insert/s webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s overall, average 250 op/s
  13. 13. jm3 LOGGING «l’oeil du lynx»
  14. 14. ORIGINAL PHILOSOPHY Log everything, don’t delete Collected by Scribe Comprehensive daily log stored in AWS S3 Hadoop jobs to generates statistics grep and his merry friends for issue inquiring Quite efficient, but cumbersome and slow
  15. 15. WHY IMPROVE Issue analysis in realtime (debugging) Realtime activity analysis Traffic spikes Misbehaving crawlers and other suspicious activity
  16. 16. ORIGINAL STACK LAYOUT
  17. 17. Stefano Constanzo HOW WE SOLVED THIS ISSUE «démons et mervei!es»
  18. 18. NORMALIZED LOG FORMAT { "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"), "facility" : "varnish", "instance" : "a01", "date" : NumberLong("1300107724534"), "http_host" : "www.fotopedia.com", "method" : "GET", "http_version" : "HTTP/1.1", "path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires", "status" : "404", "size" : 13, "elapsed" : 0.00007748600182821974 }
  19. 19. LOG COLLECTING File logging daemons (NGinx, HAProxy) Ruby tailer script Memory logging daemons (Varnish) Dedicated binary that streams varnish SHM into MongoDB Other Daemons (Lackr, Picor) Extended logging system to store data in MongoDB also log ruby exceptions into MongoDB
  20. 20. MONGO SHARDING All servers host the «logs» mongos on port 27002. All daemons push their logs to«localhost:27002» The actual storage is a capped collection in a non-sharded database.
  21. 21. CURRENT STACK LAYOUT
  22. 22. Jesús García Ferrer LOG USAGE ON A DAILY BASIS «l’aigui!e dans la meule de sapin»
  23. 23. SAPIN: EXCEPTION LOGGING View Latest Errors
  24. 24. SAPIN: EXCEPTION LOGGING Useful informations: •Source url and parameters •Date and time •Browser identifiers (IP, cookie values, User-Agent) •Full stack dump •Full headers dump •Full user model dump
  25. 25. SAPIN: EXCEPTION LOGGING Searching in Exceptions
  26. 26. RAMPLR: SAMPLING ANALYSIS Sample analysis
  27. 27. SAPIN: REALTIME LOGGING jQuery-ui based interface Sinatra Backed Filter by Facility Searchable criterias: IP Address, Follow Operation-ID Display HTTP execution Timeline
  28. 28. SAPIN: REALTIME LOGGING Facility Filtering
  29. 29. SAPIN: REALTIME LOGGING Url Filtering
  30. 30. SAPIN: REALTIME LOGGING IP Address Filtering
  31. 31. SAPIN: REALTIME LOGGING Operation ID Filtering
  32. 32. SAPIN: REALTIME LOGGING Timeline display
  33. 33. ISSUE WITH MONGODB Scalability of using a capped collection Official doc says no indices Size limit vs indices efficiency (400 000 lines for < 2 hours of log) : our plan is to have 2 days worth of logs.
  34. 34. The Library of Congress FUTURE WORK «vers l’infini et au delà»
  35. 35. FUTURE WORK Leaner interface Ugly and jquery-ui based. Should switch to Sencha framework Keep more log Abandon Capped collections Keep log longer, one collection per day(?)
  36. 36. Great Beyond QUESTIONS ? «je vous dis : au revoir.»

Remarques

  • \n
  • pierre baillet, server architect\nmathieu poumeyrol, director of cloud engineering\n
  • \n
  • \n
  • next slide is what we do\n\n
  • next slide is about how we do\n
  • next slide is about key technologies\n
  • Dernier slide de la section\n
  • \n
  • \n
  • Dernier slide de la section\n
  • \n
  • \n
  • \n
  • next slide is why should we improve\n
  • next slide show original logging layout\n
  • Dernier Slide de la section\n
  • \n
  • \n
  • \n
  • \n
  • Dernier slide de la section\n
  • \n
  • details on next slide\n
  • search in exception in next slide\n
  • next slide is about sampling and ramplr\n
  • next slide is about technologies used in sapin\n
  • next slide is about facility filtering\n
  • describe sapin facility:\n- column selection\n- reloading\n- list of facility\n\nnext slide is about url filtering\n
  • next slide is about url filtering\n
  • next slide details an op-id session\n
  • next slide shows a timeline\n
  • next slide is about current issues\n
  • Dernier slide de la section\n
  • \n
  • Dernier slide de la pr&amp;#xE9;sentation et de la section avant les questions\n
  • \n
  • ×