5. History of MongoDB University
Started in 2012 with a fork of edX
MySQL DB, Python Django, XML
Django is designed for SQL databases
Future option to use MongoDB for course materials
7. Maybe we shouldn’t
Site works fine
SQL is fine
A lot of work to move to MongoDB
MongoDB is not a great fit for django
We don’t use many of MongoDB’s standout features (sharding)
8. Eat your own dog food
If we think MongoDB is good, then we should use it
Help test MongoDB products
9. MongoDB is good for University too
MongoDB is closer to application data
Arrays (배열)
Subclasses (flexible schema)
Ease of development (pymongo)
Integration with other MongoDB tools (Atlas, Compass, Charts)
10. “While you are attending PyCon, please visit the MongoDB booth to learn about
PyMongo!”
#PyCon #Cleveland #MongoDB #Python
MongoDB:
{
“text”: “While you are attending PyCon, please visit the MongoDB
booth to learn about PyMongo!”,
“tags”: [“PyCon”, “Cleveland”, “MongoDB”, “Python”]
}
SQL:
text id
“While you are
attending PyCon...”
1
blog_id tag
1 “PyCon”
1 “Cleveland”
11. PyMongo vs Python SQL connector
> database.collection.find_one({‘user_id’:1})
{
"email": "john.yu@mongodb.com",
"address":{
"street": "1633 Broadway",
"city": "New York",
"state": "NY",
"country": "United States"
},
}
> user[‘address’][‘country’]
“United States”
> connection.execute(‘SELECT * FROM people
where id=1’)
(‘john.yu@mongodb.com’, ‘1633 Broadway’, ‘New
York’, ‘NY’, ‘United States’)
> user[4]
“United States"
12. Flexible Schema Within a Collection
Analogous to subclasses in programming languages
Example: Multiple-choice problem (객관식) vs Text problem (주관식)
{
type: "multiple-choice",
question: "Who was the first president of the US?",
choices: [
{
"text": "Barack Obama",
"is_correct": false
},
{
"text": "George Washington",
"is_correct": true
}
]
}
{
type: "text"
question: "Who was the president during the civil war?",
answer: "Abraham Lincoln"
}
13. MongoDB can be normalized like SQL, but you can also have arrays and
embedded documents.
MongoDB gives you more options than a tabular DB.
Summary
15. Flexible schema is great, but more decisions to be made.
Top-down design
How will the data be CRUDed?
- What operations are needed to render a web page?
Optimize for queries, since querying happens more often
than creating/updating/deleting
16. What is a course?
Course (수업)
A course has one or more chapters
A chapter has one or more lessons
A lesson has one or more units
- Lecture (강의)
- Problems (multiple-choice, text)
Student progress (진행, 성적)
Did student view a
chapter/lesson/problem?
Student submissions for problems
- Problem ID
- Answer submitted
- Submitted date
Student grade
Many students per course
17. Old Way: Course
<course id=“M101P”, title=“MongoDB for Python Developers“ start=“Aug 12 17:00 UTC 2013”
end=“Sep 21 17:00 UTC 2013”>
<chapter title=“Week 1: Introduction” start=“Aug 12 17:00 UTC 2013” end=“Aug 19 17:00 UTC 2013”>
<lesson title=“Welcome to M101P”>
<problem id=“5d4340a7eba” title=“Quiz: MongoDB vs SQL”>
Which DB should you use?
<choice correct=”false”>MySQL</choice>
<choice correct=”false”>Postgres</choice>
<choice correct=”true”>MongoDB</choice>
</problem>
</lesson>
<lesson> … </lesson>
</chapter>
<chapter title=“Week 2: CRUD operations”>…</chapter>
</course>
18. New Way: Course
{
start: 2018-08-01 17:00 UTC,
end: 2018-09-01 17:00 UTC,
title: "US History",
chapters: [
{
title: "Chapter 1: Introduction",
lessons: [
{
title: "First President",
video: youtube.com/123456,
problem: {
type: "multiple-choice",
question: "Who was the first president of
the US?",
choices: [
{
"text": "Barack Obama",
"is_correct": false
},
{
"text": "George Washington",
"is_correct": true
}
]
}
},
{
title: "Second president",
video: youtube.com/2334566,
problem: {
type: "text"
question: "Who was the president
during the civil war?",
answer: "Abraham Lincoln"
}
}
]
},
{
title: "Chapter 2: Wars",
lessons: [
]
}
]
}
Good Bad
Courseware needs
bits of the entire
offering
Can project just
the fields we need
Offering can be a
big document
Note: previously, offerings were
in memory, not DB
19. Old way: Student Progress
student_id course_id problem_id state (상태)
71495 “M101P/2019_July” “5d4340a7eba” ‘{
"answer": [0,1,2,3],
"score": 1,
"submit_date": 2019-07-21
09:15 UTC
}’
13789 “M101/2015_May” “21b172e26113” ‘{
“answer”: “Barack Obama”,
“score”: 0,
“submit_date”: 2015-05-12
10:15 UTC
}’
?
20. Approach 1: Mechanically move SQL tables to
MongoDB collections
{
student_id: 71495,
course_id: “M101P/2019_July”,
problem_id: “5d4340a7eba”,
state: {
"answer": [0,1,2,3],
"score": 1,
"submit_date": 2019-07-21 09:15 UTC
}
},
{
student_id: 13789,
course_id: “M101P/2015_May”,
lecture_id: “21b172e26113”,
state: {
”last_viewed": 2015-05-12 10:15 UTC
}
}
Good Bad
Easy to migrate
Already better than the
previous table
Many queries required per
page
21. Approach 2A: All progress for a course in 1 document
{
course_id: "M101P/2019_May",
students: [
{
user_id: 11111,
units: [
{
id: "Problem 1",
attempts: [
{
date: 2016-06-02 15:02 UTC
index: 2
},
{
date: 2017-06-05 11:08 UTC,
index: 0
}
],
}
]
},
Good Bad
? Doesn’t fit
common use
case
Grows without
bound
{
user_id: 22222,
units: [
{
id: "Problem 1",
attempts: [
{
date: 2016-06-02 15:02 UTC
index: 2
},
{
date: 2017-06-05 11:08 UTC,
index: 0
}
],
}
]
}
]
}
22. Approach 2B: All courses for a student in 1 document
{
user_id: 11111,
courses: [
{
course_id: "M101P",
units: {
"lecture_1": {
last_viewed: 2016-06-01 10:10 UTC
},
"problem_1": {
attempts: [
{
date: 2016-06-02 15:02 UTC,
index: 2
},
{
date: 2017-06-05 11:08 UTC,
index: 0
}
]
},
},
{
course_id: "M101P",
units: {
"lecture_1": {
last_viewed: 2016-06-01 10:10 UTC
},
"problem_1": {
attempts: [
{
date: 2016-06-02 15:02 UTC
index: 2
},
{
date: 2017-06-05 11:08 UTC,
index: 0
}
]
},
},
]
}
Good Bad
? Will probably
grow larger
than
document size
limit
23. Better Approach: Fit to use case (progress)
{
user_id: 71495,
course_id: "M101P/2019_July",
units: {
"lecture_1": {
last_viewed: 2016-06-01 10:10 UTC
},
"problem_1": {
attempts: [{
date: 2016-06-02 15:02 UTC, index: 2
},
{
date: 2017-06-05 11:08 UTC, index: 0
}
]
},
"problem_2": {
attempts: [{
date: 2017-06-05 11:08 UTC, text: "Barack Obama"
}
]
}
}
}
Good Bad
- Courseware often
needs multiple units at
a time
- Grade student’s
progress in one
document
- Can still update just
parts of the document
Document can grow
without bound
24. ODM
We use PyModm (https://github.com/mongodb/pymodm)
We can use Python classes instead of dictionaries
- Application side schema validation (검증)
- Now there is MongoDB schema validation
- Type checking
- Convenience
Downsides:
- New querying language (but mimics Django ORM)
- Unclear when queries are actually being executed
25. How about performance (성능)?
• Performance gains from data model
• Basic indexes on queries
27. Summary
SQL is fine
But MongoDB is also good, and sometimes better
We moved to MongoDB because it is great for developers
Beware of pitfalls with document DBs
28. Future Plans
• Move the rest of the SQL tables to Mongo
• Try newer MongoDB features
• Schema validation
• Transactions