SlideShare a Scribd company logo
1 of 23
Download to read offline
x.ai a personal assistant
who schedules meetings for you
MONGO DAY 2015 SV VISIT X.AI TO JOIN THE WAITLIST
Using MongoDB at x.ai
matt casey
@xdotai
Magically
Schedule Meetings
Just Cc: Amy@x.ai
Pain Solution
Jane John Jane Amy@x.ai John
CC: Amy @ x.ai
“Amy, please set
something up for
Jane and I next
week.”
System characteristics
● Need quick response
● Learning algorithms require large training data set
● # meetings scale linearly with # users
● 1 user meets with N people
● People share meeting, places, times and company
● Social relationships (assistants, coordinators)
Technical challenges
● Natural language understanding with extremely high
accuracy
● Natural conversation over email with people
● Complex data relationship
● Optimize for sparse data
● Speed of development and change
Stack
Queue Based Architecture
Data Access
● Schema Standardization
● Mongoose
● Supported by Elastic Load
Balancer
Data Models: Let’s Get Started
Here’s a few tips:
● Mongoose.js ODM for CRUD services
● mongoose-express-restify to provide a REST API
● MongoDB MMS for monitoring and backups
● Keep old data up-to-date with schema changes
● Don’t over-design, use models that are easy to change
How Do We Model Time?
Use Case:
● Capture times and constraints from an email for scheduling a meeting
● Data is created, saved, and read by a human
“Let’s meet the first week of February next year”
time: {
start: new Date(2017, 1, 1),
end: new Date(2017, 1, 2)
}
Try Again: Our Second Model
Use Case:
● A nested model captures the original intent
● Makes it possible for machines to detect and respond
timeV2: {
within: {
reference: {
weekOfYear: 4,
year: 2017
}
}
}
Still, some room for improvement
● Storing data closer to text makes it easier to improve accuracy
● But is harder to interpret and validate without tools
timeV3: [
{ timeId: id0, text: ‘first week’, weekOfMonth: 1 },
{ timeId: id0, text: ‘February’, month: 2 },
{ timeId: id0, text: ‘next year’, year: 2017 }
]
Edge cases dealing with time
● Scheduling phone calls for people in different timezones
● Dealing with partial information
○ “5pm is good” or “next week”
● Irrelevant times
○ “Can’t wait to see you next month. Can we talk tomorrow?”
● Need context for timezones: natural language vs tz database
○ Identify: “IST,” “CST,” “EST”
○ “I’m -5 from Dave” or “3pm London time”
○ Etc/GMT+3 is not what you think
Going from human to machine
● Machine detects and predicts predicts times
● Human work shifts from feeding new information to
correcting and re-affirming
● Focus shifts to generating more information and
ensuring product quality
Scala & JS - Happily Ever After
Data
Science
Intense Data
Processing
Functional
Programming
Type Safety
Complex UI
Interfacing w/
the World
● ODM
● Easy to set up, convenient
● Schema enforcement
● Use with caution: getters, setters,
virtual fields
● Client side joins across collections
and databases
● Nested documents with automatic
_id reference
Mongoose.js
object modeling and querying library for mongodb with node.js.
● Applies to only one language
● `populate` mutates the object
● Functional Programming expects
domain “events” vs objects
● Impossible to ‘clone’ instances
● using schema requires db connection
● No read validation - errors on updates
{
"messageId" : "<somerandommessage@xxxx>",
"classifierLabels" : [
{
"classifier" : "MeetingClassifier",
"class" : "A"
}
],
"featureVector" : [
{
"featureType" : "TaggedOneGram",
"featureKeys" : [
{
"featureKey" : "work",
"featureValue" : 3
},
{
"featureKey" : "that",
"featureValue" : 1
},
...
]
},
...
]
}
case class Features (
messageId: String,
classes: List[Classes],
featureVector: List[FeatureVectors]
)
case class Classes (
classifier: String,
class: String
)
case class FeatureVectors(
featureType: String,
featureKeys: List[FeatureKeys]
)
case class FeatureKey(
featureKey: String,
featureValue: 1
)
ScalaMongo
Mongo & Scala
Schemaless vs. Typesafe
● Discrepancy between models in Scala versus stored data in mongo.
○ MongoDB stores free form nested structures
○ Scala relies on strict, type safed models for data
● Models in flux
○ Same format for both data extraction and decision making layers
○ We are continuously learning new edge case about scheduling
meetings
Updating schemas in production
Step #1: Design new schema
● Avoid mutating the meaning of a field, add a new one intead
timeEntities => timeEntitiesV2
● Don’t stop writing to the old field until other processes are updated
Step #2: Migrate old documents, if possible, or use a ‘blacklisted’ field for training
Step #3: Transition all processes that read from schema to look for the new model
Design Tips:
● Avoid saving state to the db, keep the data as close to reality as possible.
● Separate collections for each service
● Future-proof your model, a list of objects is more extendable than a list of strings:
actions: [‘SENT’, ‘DELIVERED’]
actions: [{ timestamp, action: ‘SENT’ }, { timestamp, action ‘DELIVERED’ }]
Lessons Learned
● Let the data tell the story, keep track of what was
said or happened
● Avoid foreign keys to “moving” documents
● Be cautious of too much buy-in to your tools
● Write tests from Day One
● Make backups!
matt @ human.x.ai
cto & founder
25 Broadway. 9th Floor
New York, 10004 NY
E: hello@human.x.ai
T: @xdotai
Visit x.ai to join the waitlist
Embedding vs. Referencing
{
host :
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails : [String],
title : String,
signatures: [String],
…...
},
travelTime : String,
status : String,
timezone : String,
duration : Number,
…...
}
{
host : Participant,
travelTime : String,
status : String,
timezone : String,
duration : Number,
…...
}
Participant
{
name : {.....},
nicknames : [String],
phones : [{Type: String}]
primaryEmail : String,
secondaryEmails : [String],
title : String,
signatures: [String],
…...
},
Embedding Referencing
Considerations
● Query patterns
● Access to embedded doc
● # references to a doc
● Application level join
● 1-way or 2-way
referencing
Unsolved Schema issues
- Keeping schema changes in sync with Mongoose and Scala
- No libraries or tests to make sure the contracts don’t break
- ProtoBuffers or Avro?
- discriminator key
- Managing a ‘context collection’ where multiple copies of production are
stored per email
- Deploying schema changes sometimes requires re-training a model
- Schema validation from Scala supercedes Mongoose

More Related Content

Similar to MongoDB Days Silicon Valley: Building an Artificial Intelligence Startup with MongoDB at x.ai

Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Kwame Porter Robinson
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revisedMongoDB
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessMongoDB
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopTom Bocklisch
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornbcantrill
 
Building Services With gRPC, Docker and Go
Building Services With gRPC, Docker and GoBuilding Services With gRPC, Docker and Go
Building Services With gRPC, Docker and GoMartin Kess
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jJean-Francois James
 
BreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendBreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendHoracio Gonzalez
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsMongoDB
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020Thodoris Bais
 
Typescript - why it's awesome
Typescript - why it's awesomeTypescript - why it's awesome
Typescript - why it's awesomePiotr Miazga
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdfhamzadamani7
 
Google apps script
Google apps scriptGoogle apps script
Google apps scriptSimon Su
 

Similar to MongoDB Days Silicon Valley: Building an Artificial Intelligence Startup with MongoDB at x.ai (20)

Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler
 
Web Development
Web DevelopmentWeb Development
Web Development
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData Workshop
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
 
Building Services With gRPC, Docker and Go
Building Services With gRPC, Docker and GoBuilding Services With gRPC, Docker and Go
Building Services With gRPC, Docker and Go
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4j
 
BreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendBreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backend
 
quanphancv
quanphancvquanphancv
quanphancv
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
Typescript - why it's awesome
Typescript - why it's awesomeTypescript - why it's awesome
Typescript - why it's awesome
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf
 
Google apps script
Google apps scriptGoogle apps script
Google apps script
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

MongoDB Days Silicon Valley: Building an Artificial Intelligence Startup with MongoDB at x.ai

  • 1. x.ai a personal assistant who schedules meetings for you MONGO DAY 2015 SV VISIT X.AI TO JOIN THE WAITLIST Using MongoDB at x.ai matt casey @xdotai
  • 3. Pain Solution Jane John Jane Amy@x.ai John CC: Amy @ x.ai “Amy, please set something up for Jane and I next week.”
  • 4. System characteristics ● Need quick response ● Learning algorithms require large training data set ● # meetings scale linearly with # users ● 1 user meets with N people ● People share meeting, places, times and company ● Social relationships (assistants, coordinators)
  • 5. Technical challenges ● Natural language understanding with extremely high accuracy ● Natural conversation over email with people ● Complex data relationship ● Optimize for sparse data ● Speed of development and change
  • 8. Data Access ● Schema Standardization ● Mongoose ● Supported by Elastic Load Balancer
  • 9. Data Models: Let’s Get Started Here’s a few tips: ● Mongoose.js ODM for CRUD services ● mongoose-express-restify to provide a REST API ● MongoDB MMS for monitoring and backups ● Keep old data up-to-date with schema changes ● Don’t over-design, use models that are easy to change
  • 10. How Do We Model Time? Use Case: ● Capture times and constraints from an email for scheduling a meeting ● Data is created, saved, and read by a human “Let’s meet the first week of February next year” time: { start: new Date(2017, 1, 1), end: new Date(2017, 1, 2) }
  • 11. Try Again: Our Second Model Use Case: ● A nested model captures the original intent ● Makes it possible for machines to detect and respond timeV2: { within: { reference: { weekOfYear: 4, year: 2017 } } }
  • 12. Still, some room for improvement ● Storing data closer to text makes it easier to improve accuracy ● But is harder to interpret and validate without tools timeV3: [ { timeId: id0, text: ‘first week’, weekOfMonth: 1 }, { timeId: id0, text: ‘February’, month: 2 }, { timeId: id0, text: ‘next year’, year: 2017 } ]
  • 13. Edge cases dealing with time ● Scheduling phone calls for people in different timezones ● Dealing with partial information ○ “5pm is good” or “next week” ● Irrelevant times ○ “Can’t wait to see you next month. Can we talk tomorrow?” ● Need context for timezones: natural language vs tz database ○ Identify: “IST,” “CST,” “EST” ○ “I’m -5 from Dave” or “3pm London time” ○ Etc/GMT+3 is not what you think
  • 14. Going from human to machine ● Machine detects and predicts predicts times ● Human work shifts from feeding new information to correcting and re-affirming ● Focus shifts to generating more information and ensuring product quality
  • 15. Scala & JS - Happily Ever After Data Science Intense Data Processing Functional Programming Type Safety Complex UI Interfacing w/ the World
  • 16. ● ODM ● Easy to set up, convenient ● Schema enforcement ● Use with caution: getters, setters, virtual fields ● Client side joins across collections and databases ● Nested documents with automatic _id reference Mongoose.js object modeling and querying library for mongodb with node.js. ● Applies to only one language ● `populate` mutates the object ● Functional Programming expects domain “events” vs objects ● Impossible to ‘clone’ instances ● using schema requires db connection ● No read validation - errors on updates
  • 17. { "messageId" : "<somerandommessage@xxxx>", "classifierLabels" : [ { "classifier" : "MeetingClassifier", "class" : "A" } ], "featureVector" : [ { "featureType" : "TaggedOneGram", "featureKeys" : [ { "featureKey" : "work", "featureValue" : 3 }, { "featureKey" : "that", "featureValue" : 1 }, ... ] }, ... ] } case class Features ( messageId: String, classes: List[Classes], featureVector: List[FeatureVectors] ) case class Classes ( classifier: String, class: String ) case class FeatureVectors( featureType: String, featureKeys: List[FeatureKeys] ) case class FeatureKey( featureKey: String, featureValue: 1 ) ScalaMongo Mongo & Scala
  • 18. Schemaless vs. Typesafe ● Discrepancy between models in Scala versus stored data in mongo. ○ MongoDB stores free form nested structures ○ Scala relies on strict, type safed models for data ● Models in flux ○ Same format for both data extraction and decision making layers ○ We are continuously learning new edge case about scheduling meetings
  • 19. Updating schemas in production Step #1: Design new schema ● Avoid mutating the meaning of a field, add a new one intead timeEntities => timeEntitiesV2 ● Don’t stop writing to the old field until other processes are updated Step #2: Migrate old documents, if possible, or use a ‘blacklisted’ field for training Step #3: Transition all processes that read from schema to look for the new model Design Tips: ● Avoid saving state to the db, keep the data as close to reality as possible. ● Separate collections for each service ● Future-proof your model, a list of objects is more extendable than a list of strings: actions: [‘SENT’, ‘DELIVERED’] actions: [{ timestamp, action: ‘SENT’ }, { timestamp, action ‘DELIVERED’ }]
  • 20. Lessons Learned ● Let the data tell the story, keep track of what was said or happened ● Avoid foreign keys to “moving” documents ● Be cautious of too much buy-in to your tools ● Write tests from Day One ● Make backups!
  • 21. matt @ human.x.ai cto & founder 25 Broadway. 9th Floor New York, 10004 NY E: hello@human.x.ai T: @xdotai Visit x.ai to join the waitlist
  • 22. Embedding vs. Referencing { host : { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, travelTime : String, status : String, timezone : String, duration : Number, …... } { host : Participant, travelTime : String, status : String, timezone : String, duration : Number, …... } Participant { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, Embedding Referencing Considerations ● Query patterns ● Access to embedded doc ● # references to a doc ● Application level join ● 1-way or 2-way referencing
  • 23. Unsolved Schema issues - Keeping schema changes in sync with Mongoose and Scala - No libraries or tests to make sure the contracts don’t break - ProtoBuffers or Avro? - discriminator key - Managing a ‘context collection’ where multiple copies of production are stored per email - Deploying schema changes sometimes requires re-training a model - Schema validation from Scala supercedes Mongoose