2. What this Talk is About
Wordnik left the cloud and came back
• What?!?
• Why we left
• Decisions
• Why we came back (and what we did differently)
3. Who is Wordnik?
• World’s fastest updating English dictionary
• Based on input of text at ~8k words/second
• Word Graph as basis to our analysis
• Synchronous & asynchronous processing
• 10’s of Billions of documents in NR
storage
• Concept & Meaning Discovery Engine
• > 20M daily REST API calls, billions
served
4. So Why the Detour?
• Architectural Choices
• Business Choices
• Feedback, tooling, infrastructure
• Learning
• Changes in use case
• Progress!
5. Architecture History
• EC2-based LAMP Stack
• POC (and seed funding)
• A manageable corpus < 1M records
• REST API
• Web + public
• MySQL in master/slave
• ~1B documents
• Operational nightmare
6. Architecture History
• MongoDB
• First-order MySQL issues solved
• But it got slow…
• Real Servers to the rescue!
• Faster, bigger disks
• MongoDB for Corpus, Structured Data
• Faster Reads + Writes!
• More metal (72GB RAM)
• More cores
• “cold” query from 400ms to < 100
7. Why Change?
Easy!
• Can’t beat metal…except
• Quick expansion
• Batch jobs/experiments
• Add a datacenter
• Full cluster migration
• The bill for unused capacity
8. Architectural Mindshift
1. Anything can die, anytime
2. Centralized, redundant state (see point 1)
3. Server performance is *different*
• CPU, I/O, Memory—choose one
• Smart design makes it work!
10. Your Infrastructure Cloud
Hero
• Deploying Servers
• Going to need a lot!
• Configuration
• Updates to your software
What about
Data?
11. Let’s make this Work!
• MySQL Master Slave
• Take a snapshot (yes, this will block)
• Keep your binlogs!
change master to MASTER_HOST='app1',
MASTER_USER='XXXX', MASTER_PASSWORD='XXXX',
MASTER_LOG_FILE='app1-relay.0038774',
MASTER_LOG_POS=6754205951;
12. Let’s make this Work!
But…
• Your master is down!
• Quick, promote a slave!
• Point the other slaves to the new master
• As for the clients…
“Well, we
never really
tried that…”
13. Better with Mongo
• Easy up, easy down!
• Startup: Sync your data, and announce to clients
when ready for business
• Shutdown: Announce your departure and leave
• Replica sets
rs.add("db4.wordnik.com:27017");
rs.remove("db1.wordnik.com:27017");
17. But what about Performance?
• “It’s the database!”
• What is it?
• Mapping layer
• Mysql (12+ joins) => 50 records/sec
• Mongo JSON POJO => 1000 records/sec
• Mongo DBO POJO => 35,000 records/sec
• How do you know?
Profile
it!
19. It’s Still Slow!
• Balance your B-Tree
• Can't always keep index in ram. MMF "does it's
thing"
• Right-balanced b-tree keeps necessary index hot
• If you hit indexes on disk, mute your pager
1
7
1 2
5 7
20. But it’s Still Slow!
• Look at your Schema design
• Design to limit index size/number
• _id is your friend—make it meaningful
• Record size consistency
• Hierarchal Data beware!
• Split documents even in same collection!
db.posts.find({_id:/^tony_posts_/})
{_id:"tony_posts_1”, posts:[...]}
{_id:"tony_posts_2”, posts:[...]} YOUR
{_id:"tony_posts_3”, posts:[...]} app knows
best
21. Really, it’s STILL slow!
• Your monolithic app/DB won’t scale same
on VMs
• Specialize!
• Wordnik uses SOA
Powered API
swagger.wordnik.com
• Data tiers follow service types
• Smaller *everything*
22. Really, it’s STILL slow!
• Your monolithic app/DB won’t scale same
on VMs
• Specialize!
• Wordnik uses SOA A contract
for your
swagger.wordnik.com
Powered API
clients
• Data tiers follow service types
• Smaller *everything*
23. Be the Boss of your Data
• Your app *should* be smarter than your
DB
• Lots of users?
• Lots of blog posts?
• Lots of images?
• Shard? On what?
• Data dimensionality
• Keep active data hot
• Don’t try to boil the ocean
24. Cloud Computing + Mongo
• It can work extremely well
• No “Save as Cloud!” menu item
• Shifting constraints
• Optimize for RAM on VM
• Virtual disk => virtual performance
• Be “Deployable”
• Mongo Replica Sets are made for this
25. Cloud Computing + Mongo
• System Durability
• Design your software for abuse
• Your old design doesn’t apply
• Add APM hooks, now!
• Dissect your app
• Build to micro services with dedicated MongoDB
clusters
• Deployment Infrastructure
• Don’t wait until it’s too late
26. See More
• See more about Wordnik APIs
http://developer.wordnik.com
• Migrating from MySQL to MongoDB
http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik
• Maintaining your MongoDB Installation
http://www.slideshare.net/fehguy/mongo-sv-tony-tam
• Swagger API Framework
http://swagger.wordnik.com
• Mapping Benchmark
https://github.com/fehguy/mongodb-benchmark-tools
• Wordnik OSS Tools
https://github.com/wordnik/wordnik-oss