Moving from C#/.NET to Hadoop/MongoDB

3. Global Reach 137M+ 47M+ Viewers use our guide technologies Storefronts with entertainment services through service provider offerings powered by Rovi Entertainment Store 266M+ Consumer electronic (CE) devices Data coverage: have our CE guide technologies 4.5M+ TV shows, movies, sports and celebrities 40M+ Households reached globally by Rovi Advertising Network 3.3M+ Album releases and 32M music tracks 600M+ Devices certified for high quality DivX video playback 500K+ Movie titles 7 © 2012 Rovi Corporation. Company confidential.

8. ETL/Cache Loading Data Takes Too Long Node 1 MemcacheD MemcacheD DB (Scratch Cluster DSG DB Server Server(s)) WSP ETL Server Backup & MemcacheD Server(s) Restore MemcacheD Transform CI Cache MemcacheD DSG Extract Database CI Table Loading Database Loading Database Database Process Process MemcacheD MemcacheDB Node 2 Cluster DB Server MemcacheDB Backup & Restore MemcacheDB CI Database Page 16

10. Hadoop/MongoDB 18 Copyright ®2012 Rovi Corporation. Company confidential.

11. Network Diagram 20 Copyright ®2012 Rovi Corporation. Company confidential.

12. Mongo Sharding 21 Copyright ®2012 Rovi Corporation. Company confidential.

14. Challenges • Transition existing Windows/.NET team to Linux/Java – Environment setup. Technology framework choices – Coding differences – Cultural differences – Platform differences – Easier than expected to transition team from .NET to Java – No religious battles • Backwards compatibility of CXF web services to Microsoft .NET web services • Managing new releases of Hadoop • BCP took too long – Converted to base tables. Used Pig to join the data • Writes to Mongo are very fast. Updates are slower and saturated disks – Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo 24 © 2012 Rovi Corporation. Company confidential.

16. Lessons Learned • General – Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products • We purchased enterprise support agreements from both Cloudera and 10gen – Create a developers VM image – Deploy early and often even if not ready for real customers – Use the same setup in test and production environments • Sharding caused differences • SQL – Get raw tables without any transformation or joins • Let Hadoop do the processing for you • Hadoop – Do as much work as you can in Hadoop – Take the time to create small datasets to iterate fast – Take the time to learn and use Pig • It is very fast and provides tons of functionality that you don’t need to code in Java – Don’t create Runners - Use Oozie workflows – Measure, benchmark and track performance – Use Hadoop counters 26 © 2012 Rovi Corporation. Company confidential.

17. Lessons Learned - 2 • MongoDB – RAM, RAM, RAM!!! – Many writes from Hadoop can easily overwhelm MongoDB • Single database lock • Drive bandwidth saturation – Can be expanded through sharding • Do as much as possible to minimize writes • Measure where your application is blocking and optimize – Don’t shard unless you have to – if you do shard, preconfigure your shard key • You need a good shard key – Use Replica sets. They are easy to setup and work good. • Make sure repllog is large enough. – Use MongoDB Monitoring Service (MMS) – It’s free – Mongo queries are fast! 27 © 2012 Rovi Corporation. Company confidential.

20. Follow-up Information • Email: robert.vandehey@rovicorp.com • LinkedIn: http://www.linkedin.com/in/bvandehey • Twitter: @bvandehey • Rovi Cloud Services: http://developer.rovicorp.com/ 32 © 2012 Rovi Corporation. Company confidential.

Notes de l'éditeur

This is the new Data Load Process. It makes it look easy…
…The reality it is quite complex. This is just one of our workflows. The orange/tan-ish boxes are Java map/reduce processes. The pink boxes are pig processes. The white boxes are BCP processes. The green boxes are MongoDB collections.
Here is our sharding scheme. We actually have 6 more servers than is shown because we decided to have multiple replicas at each remote site.

Moving from C#/.NET to Hadoop/MongoDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Moving from C#/.NET to Hadoop/MongoDB

Similaire à Moving from C#/.NET to Hadoop/MongoDB (20)

Plus de MongoDB

Plus de MongoDB (20)

Dernier

Dernier (20)

Moving from C#/.NET to Hadoop/MongoDB

Notes de l'éditeur