SlideShare une entreprise Scribd logo
1  sur  27
MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
What is Wordnik Project to track language  like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until…
Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo =  new BasicDBObject("sentence",s.getSentence())  .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(…) before 2M characters Reduces network overhead Very fast inserts
Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
Software Design TMGO
Software Design MongoDB’s Document Storage let us… Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next());     ... }
Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features
Questions?

Contenu connexe

Tendances

Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachJeremy Zawodny
 
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBHow to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBMicrosoft Tech Community
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовGeeksLab Odessa
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaYann Cluchey
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation confShridhar Joshi
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB CompassMongoDB
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDBYann Cluchey
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 
Nosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureNosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureVito Flavio Lorusso
 
Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028David Fetter
 

Tendances (20)

Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
 
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBHow to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB Compass
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDB
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
Nosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureNosql why and how on Microsoft Azure
Nosql why and how on Microsoft Azure
 
Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Intro Couchdb
Intro CouchdbIntro Couchdb
Intro Couchdb
 

Similaire à From MySQL to MongoDB at Wordnik (Tony Tam)

Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN StackRob Davarnia
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentationHyphen Call
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialPHP Support
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling Sachin Bhosale
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptxSigit52
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBChun-Kai Wang
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesAshishRathore72
 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than YouRobert Cooper
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)Maarten Balliauw
 
Introduction to meteor
Introduction to meteorIntroduction to meteor
Introduction to meteorNodeXperts
 

Similaire à From MySQL to MongoDB at Wordnik (Tony Tam) (20)

Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN Stack
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than You
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)
 
Introduction to meteor
Introduction to meteorIntroduction to meteor
Introduction to meteor
 

Plus de MongoSF

Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) MongoSF
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)MongoSF
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)MongoSF
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)MongoSF
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoSF
 
Administration
AdministrationAdministration
AdministrationMongoSF
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoSF
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)MongoSF
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)MongoSF
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)MongoSF
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoSF
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 HoursMongoSF
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)MongoSF
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)MongoSF
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)MongoSF
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...MongoSF
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 

Plus de MongoSF (20)

Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)
 
Administration
AdministrationAdministration
Administration
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 Hours
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

From MySQL to MongoDB at Wordnik (Tony Tam)

  • 1. MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
  • 2. What is Wordnik Project to track language like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
  • 3. Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
  • 4. Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
  • 5. Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
  • 6. Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
  • 7. Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until…
  • 8. Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
  • 9. Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
  • 10. Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
  • 11. Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
  • 12. Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(…) before 2M characters Reduces network overhead Very fast inserts
  • 13. Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
  • 15. Software Design MongoDB’s Document Storage let us… Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
  • 16. Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
  • 17. Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ... }
  • 18. Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
  • 19. Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
  • 20. Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
  • 21. Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
  • 22. Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
  • 23. Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
  • 24. Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
  • 25. Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
  • 26. What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features