2. Outline
What are ORMs and Active Records?
Tradeoffs
Playing Nice with your Database
Managing Indexes
Eager Loading and Client-Side Joins
Lazy Loading
Conclusion
3. Object-Relational Mapper
Systems to bridge the gap between object-oriented languages
and relational databases
class Employee < ActiveRecord::Base
belongs_to :office
end
class Office < ActiveRecord::Base
has_one :employee
end
Inherently difficult:
Normalization (splitting data across tables)
Databases can only store scalar values
Add an extra layer of abstraction
4. Active Record Pattern
The ‘meat’ of an ORM that handles the CRUD work
Allows regular objects to be treated as persistent objects
Ideally, totally abstracts all database interaction
my_office = Office.new()
my_office.number = 123
me = Employee.new
me.name = ‘Eric Farrar’
me.office = my_office
5. Examples of ORMs/Active Records
LINQ (Language Integrated Query)
Hibernate / NHibernate
Django
Ruby on Rails (ActiveRecord)
Many more…
For our purposes, we will use Rail’s ActiveRecord for the
examples
6. Trade-offs
Advantages
Easy to learn
Simplifies database creation and management
No context switching between languages
You don’t need know about the database
Disadvantages
Performance suffers (up to 50% slower)
Often uses lowest-common denominator solution
Concurrency semantics often very difficult
You don’t need know about the database
7. Managing Indexes
Indexes are used to make things quick to look up
phone book vs. reverse look-up
Indexes should be present on anything you will search for
Searching for non-indexed properties will result in full table
scan
By default, indexes are usually only put on primary keys
Lack of indexes often will not appear during development
Result will be a gradual slowdown (as data volume increases)
as opposed to avalanche failure
Why not put an index on everything?
Multi-column indexes vs. single column indexes
8. Client-Side Join
Objects are usually ‘related’ to each other
belongs_to
has_one
has_many
has_and_belongs_to_many
ORMs use these relationship to allow object traversal
ex. me.office
Assuming 10000 employees, how many queries will this code
produce?
Employees.find(:all).each do |e|
puts e.office.number
end
9. “Man, this is heavy!”
Answer: 10001
Employees.find(:all).each do |e| # <-- 1 query here
puts e.office.number # <-- 10,000 queries here
end
Why? The application is doing the work of joining the data, not
the database. This is called a ‘client-side’ join
This is solved by giving a hint to the ORM and the database
that you intend to use the ‘office’ property
Employees.find(:all :include => :office).each do |e|
puts e.office.number
end
This pattern is called eager loading
10. Inviting the Database to the Party
Eager loading solves the N+1 problem, but it is still only half
way there
In ORMs, the relations are defined inside the object models
The ORM may know that Employees are Offices are related,
but the database doesn’t know that
The database will obediently execute the query, but don’t
expect it to do anything clever
Modern query optimizers will use every statistic available when
determining query paths
Keeping them ignorant will result in bare-bones optimization
11. Lazy Loading
Eager loading deals with the case where you want more than
your class includes
What if you want less?
Suppose your Employee class includes a picture field that is a
high resolution bitmap (~ 3 mb)
The previous query will actually return the picture in order to fully
populate the object
Employees.find(:all).each do |e|
puts e.name
end
This innocent code will naively return > 30 Gb of data
12. Be Lazy
Instead, lazily load your object properties
Employees.find(:all :select => [“name”]).each do |e|
puts e.name
end
Accessing e.picture will work by issuing another database
query
This simple example ignores potential problems with
concurrency
Use locking
13. Conclusions
ORMs and Active Records can provide large productivity
advantages, typically at the expense of performance
ORMs should never be seen as an alternative to learning
about databases (although it can be a good introduction)
At times, you will likely need to drop down to the database
level (profiling, etc) to diagnose problems
Ideally, a programmer using a ORM will always consider how
their code will actually look once it hits the database
Similarities to a C compiler
You should be able to answer “Yes!” to the question, “Do you
know where your queries are?”