2. Who is Wordnik?
• Founded in 2008 by Erin McKean
• "Understand meaning of words
automatically"
• Patented "Free-Range Definition"
technology
• Constructed largest (known) English Word
Graph
We do Discovery
4. Data?
• Word Graph is 80 S
built by data
reads!
• Runtime answers
needed fast
50M+
Nodes!
80M+
Edges!
5. What we do with Data
• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
current, real-world data
6. What we do with Data
• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
current, real-world data
Language
is NOT
static
7. What we do with Data
• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
Next???
current, real-world data Twitter?
Language
is NOT
static
Tumblr? Wordpres
s
9. How we do it
• Amazon EC2-based deployment
• Efficiency through constraint-based
architecture
• Small is Big!
• Horizontal scaling by adding servers!
• Yea, we can always go vertical
• Blah, blah, more details!
10. Micro Services
• Services are stand-alone building blocks
• Increase capacity through a "more like this"
button
11. Micro Services
• Big application => micro services
Monolithic
application
"Isn't this
just
SOA?"
12. Micro Services
• Big application => micro services
Monolithic
application
"Isn't this
just
SOA?"
13. Micro Services
• Big application => micro services
Monolithic
application
"Isn't this
just
SOA?"
14. Micro Services
• Big application => micro services
Monolithic
application
"Isn't this
just
SOA?"
15. Not PO-SOA
• This is different
• No proprietary message bus
• Decoupled objects
• Dedicated storage***
• Speak REST
• Develop your services in…
• Java
• Scala
• Ruby
• Php
16. Speak REST?
• Sounds good but…
• REST semantics vary wildly
• HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
/api/pet.json/1 (DELETE) Al
/api/pet.json/1 (POST empty) valid!
So…
17. Speak REST?
• Sounds good but…
• REST semantics vary wildly
• HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
Peer All
/api/pet.json/1 (DELETE)
Review! valid!
/api/pet.json/1 (POST empty)
Better
Docs!
So…
API
API Styleguide
Council! !
18. SOA makes new Challenges
• It's communication (not easy)
• Need a consumer & provider contract
• Driving force to create Swagger
19. What is Swagger?
• Swagger is…
• Spec for declaring and documenting an API
• A framework for auto-generating the spec
• A library for client library generation
• A JSON-based test framework
• It's open source!
• http://swagger.wordnik.com
20. How?
• Swagger Codegen
• Creates a client based on your Swagger Spec
scala src/main/scala/Codegen.scala
${swagger-spec-url}
Scal
a
Ruby
21. In the Wordnik Workflow
• Jenkins will…
• Build a service library
• Build a stand-alone application distro
• Build an installable image (RPM)
• Build a compatible client library
• Consumers will…
• Declare dependency on a service version
• Use a client for that version
• Be given a list of compatible services, by
cluster, version
22. Back to Data
• Micro services have small(ish) databases
• Share nothing across services
• YES To replica sets
• Deployed to ephemeral storage
• (more in a bit)
• Small by design
• How to keep them small?
23. Keeping Databases Small
• Some easy tricks
• Schema-less => "schema per document"
• Keep field names short!
db.foo.save({user_name:"Tony"})
Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
• They can get *huge*
• Make _id matter!
24. Keeping Databases Small
• Some easy tricks
• Schema-less => "schema per document"
• Keep field names short!
db.foo.save({user_name:"Tony"})
Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
• They can get *huge*
• Make _id matter!
25. Keeping Databases Small
• Don't make _id just an "auto increment"
You're stuck with it! Be smart
• User collection? Try _id: username
• Email collection? Try _id: email
• Date-driven collection? How about _id: "20120502"
• db.logins.find({_id:/^201205/}) 1
7
Be lazy until
you can't
anymore! 1 2
5 7
26. Keeping Databases Small
• DAO or die!
• Fancy index scheme => control access to
collections
NO!!!!
Yes
28. Keeping Databases Small
• Again, why keep them small?
• Starting a new replica
• Initial sync
• Index rebuilding
• Backups
• Index Compaction
• Speed
• TCO
29. Keeping Databases Small
• Again, why keep them small?
Everythin
• Starting a new replica g is
• Initial sync easier
• Index rebuilding
• Backups
• Index Compaction
• Speed This can
• TCO take
DAYS
30. Ephemeral Storage?
• Every EC2 instance type has some
(except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
31. Ephemeral Storage?
• Every EC2 instance type has some
(except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
But dies
on host
reboot!
34. Which Zone? Which Region?
Arbiter handles
external
connectivity
issue detection
35. How does this really stack up?
• Tuned indexes & access, split with services
• Was: 3 DAS Devices w/18 TB disk
• Now: 21 M1.large + M1.xlarge instances
• 3 Zones, 2 regions
• The Gory Details
blog.wordnik.com/with-software-small-is-the-new-big
36. As for Services
• ~1,000 requests/sec via Swagger-enabled
micro services
• Direct to Consumer via SwaggerSocket
37. What's Next
• Migrating all services to SwaggerSocket
• OSS WebSocket subprotocol
https://github.com/wordnik/swaggersocket
• 25%-100% speed increase (sync & async)
• Discovery via Wordnik