"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
mongoDB at Visibiz
1. MongoDB at Visibiz
Why and How We’re Using MongoDB In Our Application
Mike Brocious
Tech Lead
Visibiz, Inc.
2. Why We’re Here
• Discuss why we chose MongoDB at Visibiz
• Show how we’re using it
• Made mistakes Learned along the way
• Collections
• Schema
3. About Us
• Startup
• Founded April, 2010
• 8 employees
• Located just outside of Philadelphia, PA
• Social CRM
• ‘know your net work...sell better’
• Currently in limited beta
• Sign up at www.visibiz.com
8. Extensible Objects
• We provide core objects
• person, company, prospect, relationship, etc.
9. Extensible Objects
• We provide core objects
• person, company, prospect, relationship, etc.
• Customer can add their own attributes
• including relationships with other objects
• A ‘person’ object for customer #1 may not be defined like a
‘person’ object for customer #2
10. CUSTOMER #1 CUSTOMER #2
• Person • Person
• name • name
• address <= Core Attributes => • address
• date of birth • date of birth
• employment history • gender
• list: <= Customer Defined =>
• hobbies
• company Attributes • list:
• begin date • name
• end date
• job title
11. Customer Definable Objects
• Give customers ability to define their own objects
• collection of attribute names, types
• relationships with other objects
12. Scalability
• Will eventually have large amount of data
• social net works, blogs, articles, etc.
• email
• Scaling should be (relatively) easy
17. Relational Database Example
Date of
Person Id Name Address
Birth
1 Mike Brocious Malvern, PA 6/10/1984
2 Doug Smith Philadelphia, PA 2/24/1980
Person Id User Defined 1 User Defined 2 User Defined 3 User Defined 4 ...
1 Good Burger 3/20/1998 3/31/2010 Flipper
1 Visibiz 4/1/2010 <null> Tech Lead
2 10gen 7/9/2006 3/18/2009 Engineer
Column Name Type
User Defined 1 Company String
User Defined 2 Begin Date Date
User Defined 3 End Date Date
User Defined 4 Job Title String
22. More MongoDB Goodness
• Scalability
• replication - easy to setup
• sharding
• Uses JSON
• “the new XML”
• easy to build, parse and read
• great support from tools, languages and ser vices
24. JSON-eze
• Every layer of the application understands common format
Presentation
• Not forced into transforming data
View
• DB result sets <--> DO/DTO J
S Controller
• DO/DTO <--> view O Services
• view <--> presentation N
Data Access
• Enables rapid development DB
25. What Else We Liked About MongoDB
• Active product development
• Community support
• plugins, drivers, viewers
• 10gen support
• developers very active on forum, JIRA
• MongoDB conferences, webcasts
• Deep list of sites already using MongoDB
31. Things
• Our main collection: “things”
• Domain objects (person, company, note, event, etc.)
• Customer-defined objects
• Takes advantage of ‘s chema-free’ nature of MongoDB
• able to easily query across all types
35. Relationships - Act One
• Connections bet ween things
• employment history
• co-workers, friends
36. Relationships - Act One
• Connections bet ween things
• employment history
• co-workers, friends
• First cut: separate documents in the things collection
• essentially a mapping/join table
38. Relationships - Act One
A rel B
• Single atomic insert (good! no transaction concerns)
• Made retrieval inefficient (no JOINs, nested SELECTs)
39. Relationships - Act One
Find all things I’m related to
me = ObjectId(“4d10f60e39fe153b20316478”)
// Get all the relationships I’m involved in
related = db.things.find({$or: [left.id: me,
right.id:me]})
// Build up a list of ids I’m related to
i = 0
relatedId = new Array()
for(relationship in related)
if (relationship.left.id == me) {
relatedIds[i++] = relationship.right.id
} else {
relatedIds[i++] = relationship.left.id
}
)
// Now get the things
relatedThings = db.things.find({_id: { $in : relatedIds } })
40. Relationships - Act Two
• Moved relationships to nested document inside thing
• natural approach for document-based datastores
A B
41. Relationships - Act Two
• Queries easy and fast - awesome!
// Get all the things I’m related to
relatedThings = db.things.find({$or: [rels.left.id: me,
rels.right.id:me]})
42. Relationships - Act Two
• Queries easy and fast - awesome!
// Get all the things I’m related to
relatedThings = db.things.find({$or: [rels.left.id: me,
rels.right.id:me]})
• Two updates required to insert new relationship
• relationship stored in both things
• bad! - transaction concerns
44. Relationships - Act Three (final?)
• Best of both worlds
• Separate “rels” collection
A rel B
• master source of relationship details
• atomic insert (good!)
• unique index (good!)
45. Relationships - Act Three (final?)
• Best of both worlds
• Separate “rels” collection
A rel B
• master source of relationship details
• atomic insert (good!)
• unique index (good!)
• Nested “rels” documents in things
A B
• easy, fast queries (good!)
46. Relationships - Act Three (final?)
• Best of both worlds insert / delete
• Separate “rels” collection
A rel B
• master source of relationship details
• atomic insert (good!)
• unique index (good!)
• Nested “rels” documents in things
A B
• easy, fast queries (good!)
47. Relationships - Act Three (final?)
• Best of both worlds insert / delete
• Separate “rels” collection
A rel B
• master source of relationship details
• atomic insert (good!)
• unique index (good!)
• Nested “rels” documents in things
A B
• easy, fast queries (good!)
48. Other MongoDB Collections
• Workflow status log
• Query log
• Staging area for imported data
• Users
• Duplicates scan log
49. REST
• RESTful interface on top of ser vices
• Expose services for internal and
external development (API)
50. REST
• RESTful interface on top of ser vices
• Expose services for internal and
external development (API)
• MongoDB doesn’t enforce datatypes or object shape
• Need a way to validate data to prevent “garbage in”
• JSON schema
• http://json-schema.org/
52. Thing Schemas
• Separate “schemas” MongoDB collection
• Every “thing” passes through validation before being stored
• Uses:
• validate incoming data
• track customer-specific schema extensions
• generate UI to display things
53. Why MongoDB Works For Us
• Schema-free
• Document-oriented
• JSON
• Scalable
• Active product
“MongoDB...The Database Chuck Norris Uses”
• Free
thanks for joining us today, i am mike brocious, tech lead here at visibiz, and \ni&#x2019;m gonna take you through a brief discussion of why we chose mongodb and how it fits into our application\n\nTurn off all message notifications\n- tweet deck\n- gmail\n- skype\n\n
why we chose mongo as primary datastore\n\nNot a sales pitch - Can&#x2019;t tell you whether it&#x2019;s right for you\n\nHow we&#x2019;ve fit it into our architecture\n\n
startup\nbeen around about a year\n\nour application is in the social CRM market space - take traditional CRM features, throw in social networks, email and some intelligence...we help you understand your network of contacts, and prospects to help you during the sales cycle\n\n\n
a little more context about the application we&#x2019;re developing it in the Grails framework from SpringSource\nso grails application hosted inside of tomcat, front-ended by apache with mongo, recently added elastic search as search engine\nall hosted in the amazon ec2 cloud\n
a lot factors went into the decision, but there are two worth noting here\n-we want our customers to be able to extend the core domain objects\n we also want them to be able to define their own domain objects and plug them into our application\n-the other point is scalability....important to us that our datastore be prepared to scale up and scale out as necessary\n\n
so lets talk a little more about extensibility\nwe provide core objects with a set of attributes\nto help customers to fit our application into their own workflow, we want customers to be able to extend those objects with their own attributes\nso what this means is that a person object ........\n
C#1 is interested in tracking the employment history of people that are in the network\nC#2 need to know are they male or female and what they do in their spare time, what are their interests, what are their hobbies\n\nthose attributes add meaning for those customers based on their workflow, their marketspace, their targets\nobviously a trivial example, but it&#x2019;s meant to give you a feeling for what we want to want to be able to offer in our application\n
no real business logic, just the ability to store, associate and retrieve their own objects\n\n
Second concern that i mentioned earlier\nWanted to make sure that the datastore that we chose was designed with scalability in mind\nI mentioned that our application is dealing with social networks. as most of you are probably a member of one or more networks, be it twitter, facebook, linkedin, there&#x2019;s a ton of data out there related to social networks. but even beyond the social networks, we&#x2019;re going to be getting into email, blogs, articles, and whatever else is on the web that can provide information about your network....so we&#x2019;re going to be housing a lot of data...\nwe want scaling to be easy, shouldn&#x2019;t be rocket science....want to focus on developing the application, and not administering/supporting db clusters\n
So...given that criteria a backdrop, we looked at our options: relational, couple of NoSQL / schema-free datastores\nnot exhaustive evaluation, but enough to feel comfortable....obviously we chose mongo, or we wouldn&#x2019;t be having this conversation\n\nmakes it that much easier to support extensible objects...lots of other challenges, but at least the database is not one of them\nas a developer, you deal with your domain data on a daily basis....it&#x2019;s so nice to see the data presented in a way that easy for humans to understand\n
So...given that criteria a backdrop, we looked at our options: relational, couple of NoSQL / schema-free datastores\nnot exhaustive evaluation, but enough to feel comfortable....obviously we chose mongo, or we wouldn&#x2019;t be having this conversation\n\nmakes it that much easier to support extensible objects...lots of other challenges, but at least the database is not one of them\nas a developer, you deal with your domain data on a daily basis....it&#x2019;s so nice to see the data presented in a way that easy for humans to understand\n
\n
Had a design roughed out in relational model\n- team member had implemented something similar in previous life\n\n\n
\n
of course the name &#x2018;mongo&#x2019; comes from humongous so that&#x2019;s a pretty good indication that scalability is built-in\n\nfor us, JSON is new XML, it&#x2019;s everywhere\n\nso that&#x2019;s one thing that&#x2019;s good about JSON, the other thing is....\n\n(drill into this a little)\n\n
of course the name &#x2018;mongo&#x2019; comes from humongous so that&#x2019;s a pretty good indication that scalability is built-in\n\nfor us, JSON is new XML, it&#x2019;s everywhere\n\nso that&#x2019;s one thing that&#x2019;s good about JSON, the other thing is....\n\n(drill into this a little)\n\n
Take a look at the major components of our application\n\nHad already chosen Grails as app framework\n\nwhat this really means....\n
+Can move really fast, fewer moving parts to develop/maintain\n+Good to get application going, plan on refactoring \n\n
10gen JIRA list is active\n\nClear from mongo forums, etc. that mongo is very active product.\n- not quite as sure with some of the other projects\n\n- not alone\n\n
of course, nothing is perfect, life is a bunch of trade-offs\n\nyou don&#x2019;t have transactions or really complex queries....JOINs, nested SELECTs\n\nlack of transactions was the thing I was most nervous about. have to go into with your eyes wide open, have to understand how to minimize the risk, and reduce the impact.\n
\n
\n
the name 'things' represents that this collection will hold different object types\nhaving all Things in one collection allows us to query across all Things\ngood use of Mongo's 'schema free' document store\n show example thing (JSON)\n\n
\n
\n
\n
first cut: id of thing on left, thing on right and label\nwe were still thinking in relational terms\n\nwill be reading relationships much more often than creating new ones, so slow queries is definitely a bad thing\n\n
\n
\n
\n
\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
hybrid solution - gives us best of both worlds\n- insertion integrity\n- query performance\ndownside:\nnow inserting/updating a rel is a two-step process: update &#x201C;rels&#x201D; collection, then push the changes to the related things\nif that second update fails, we can always rebuild those relationships from the rels collection.\n\n
the last major thing i want to talk about came about because we&#x2019;re supporting a RESTful interface and mongo doesn&#x2019;t enforce datatypes.\n\nmongo is not going to stop you from storing any kind of data in any field - it&#x2019;s &#x2018;schema-free&#x2019;.\nenter JSON schema - provides a way to describe an object using JSON syntax\n\n
analogous to DTD or XML schema\n
\n
one last point....the ease with which you can create collections, and the fact that you can store different looking objects in the same collection opens up possibilities\n\n\n