3. SQL Anti patterns
and related stuff
The eternal tree (rows refer to the table itself - think
threaded discussion)
Dynamic table creation (and dynamic query building)
Table as cache (lets save it in another table)
Table as queue (wtf)
Extreme JOINs (app requires a warmed up cache)
Your scheme must be printed in an A3 sheet.
Your ORM issue full queries for Dataset iterations
4. The eternal tree
Problem: Most threaded discussion example uses something
like a table which contains all threads and answers, relating to
each other by an id. Usually the developer will come up with
his own binary-tree version to manage this mess.
id - parent_id -author - text
1 - 0 - gleicon - hello world
2 - 1 - elvis - shout !
NoSQL alternative: Document storage:
{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
{
'author': elvis, text:'shout', replies:[{...}]
}
]
}
5. Dynamic table creation
Problem: To avoid huge tables, one must come with a
"dynamic schema". For example, lets think about a document
management company, which is adding new facilities over the
country. For each storage facility, a new table is created:
item_id - row - column - stuff
1 - 10 - 20 - cat food
2 - 12 - 32 - trout
Now you have to come up with "dynamic queries", which will
probably query a "central storage" table and issue a huge join
to check if you have enough cat food over the country.
NoSQL alternative:
- Document storage, modeling a facility as a document
- Key/Value, modeling each facility as a SET
6. Table as cache
Problem: Complex queries demand that a result be stored in a
separated table, so it can be queried quickly.
NoSQL alternative:
- Really ?
- Memcached
- Redis (for persistence)
- Denormalization
7. Table as queue
Problem: A table which holds messages to be completed.
Worse, they must be ordered.
NoSQL alternative:
- RestMQ
- Any other message broker
- Redis (for LISTS)
- Use the right tool
8. Extreme JOINs
Problem: Business stuff modeled as tables. Table inheritance
(Product -> SubProduct_A). To find the complete data for a
user plan, one must issue gigantic queries with lots of JOINs.
NoSQL alternative:
- Document storage, as MongoDB
- Denormalization
9. Your scheme fits in an A3 sheet
Problem: Huge data schemes are difficult to manage. Extreme
specialization creates tables which converges to key/value
model. The normal form get priority over common sense.
Product_A Product_B
id - desc id - desc
NoSQL alternative:
- Denormalization
- Another scheme ?
- Document store
- Key/Value
10. Your ORM ...
Problem: Your ORM issue full queries for dataset iterations,
your ORM maps and creates tables which mimics your
classes, even the inheritance, and the performance is bad
because the queries are huge, etc, etc
NoSQL alternative:
Apart from denormalization and good old common sense,
ORMs are trying to bridge two things with distinct impedance.
There is nothing to relational models which maps cleanly to
classes and objects. Not even the basic unit which is the
domain(set) of each column. Black Magic ?
11. No silver bullet
- Consider alternatives
- Think outside the norm
- Denormalize
- Simplify
12. Cycle of changes - Product A
1. There was the database model
2. Then, the cache was needed. Performance was no good.
3. Cache key: query, value: resultset
4. High or inexistent expiration time [w00t]
(Now there's a turning point. Data didn't need to change often.
Denormalization was a given with cache)
5. The cache needs to be warmed or the app wont work.
6. Key/Value storage was a natural choice. No data on MySQL
anymore.
13. Cycle of changes - Product B
1. Postgres DB storing crawler results.
2. There was a counter in each row, and updating this counter
caused contention errors.
3. Memcache for reads. Performance is better.
4. First MongoDB test, no more deadlocks from counter
update.
5. Data model was simplified, the entire crawled doc was
stored.
14. Stuff to think about
Think if the data you use aren't denormalized (cached)
Most of the anti-patterns contain signs that the NoSQL route
(or at least a partial NoSQL route) may simplify.
Are you dependent on cache ? Does your application fails
when there is no cache ? Does it just slows down ?
Are you ready to think more about your data ?
Think about the way to put and to get back your data from the
database (be it SQL or NoSQL).