I explain how you can make sure the data in the application more bulletproof. It covers some techniques you can use in your applications to remove chances of introducing incoherent data.
15. Learnings
No Code Modification
Less Complexity–you never have to deal with both
nils and blank strings
Work on the assumption that body is never nil
34. Learnings
The ideal fix never allows someone to directly introduce
orphan data, but still does the optimized cascading
behavior when deleted in ActiveRecord.
40. Ways of Removing Duplicate
Data
Use SQL to arbitrarily remove duplicates
Use scripts to automatically merge content in rows
Manually merge content/remove duplicate rows
41. Unique Index Protects Data
from having Duplicates
PG::Error: ERROR:
duplicate key value violates unique constraint
"index_authors_on_name"
DETAIL: Key (title)=(Mr. Duplicate) already exists
This error is thrown every time the
Active Record validation is bypassed
42. Unique Index Protects Data
from having Duplicates
def save_with_retry_on_unique(*args)
retry_on_exception(ActiveRecord::RecordNotUnique) do
save(*args)
end
end
Retries saving when error is thrown,
so the validation can take over
43. Unique Index Protects Data
from having Duplicates
def save_with_retry_on_unique(*args)
retry_on_exception(ActiveRecord::RecordNotUnique) do
save(*args)
end
end Retries only once
Calls the block only once
45. Learnings
Active Record validations are not meant for data integrity.
Incoherent Data can still be introduced.
Database level index on unique makes sure data is never
duplicated.
Rails will skip validations a lot in concurrent situations, so
always handle the underlying
ActiveRecord::RecordNotUnique Error.
Don’t forget to add unique index on one-to-one relationships.
47. Polymorphic Association
class Post
has_many :comments, as: :commentable
end
class Comment
belongs_to :commentable, polymorphic: true
end
Both commentable_type and
commentable_id are stored in the database.
48. Polymorphic Association
class Post
has_many :comments, as: :commentable
end
class Comment
belongs_to :commentable, polymorphic: true
end
There is no way to add foreign keys
to polymorphic associations.
49. Learnings
There is no SQL standard way of adding polymorphic
associations.
Referential Integrity is compromised when we use this
ActiveRecord pattern.
Expensive to index.
The data distribution isn’t usually uniform.
Harder to JOIN in SQL.
51. Learnings
Adding one table for each child type maintains data integrity.
Foreign keys can be added.
Extract similar behaviors using modules in Ruby in the
application.
Create a non-table backed Ruby class for creating comments
Use class_name option to designate which class name to
use when retrieving records.
52. Learnings
Easier to grok and operate.
Harder to aggregate over all comments regardless of type.
More expensive to add another parent type.
Use specific tables if you care for data integrity.
If data integrity is a non-issue, use polymorphic
associations. Event logging or activity feeds are good
examples.
54. Data Integrity Test Suite
MAX_ERRORS = 50
def test_posts_are_valid
errors = []
Post.find_each do |post|
next if post.valid?
errors << [post.id, post.errors.full_messages]
break if errors.size > MAX_ERRORS
end
assert_equal [], errors
end
55. Data Integrity Test Suite
def test_post_bodys_are_not_nil
assert_equal 0, Post.where(body: nil).count
end
56. Learnings
Proactive techniques work best
They’re not always feasible if you have bad data already
Reactive integrity checks are a good alternative
Run these regularly against production data to surface
errors up.
Avoid using complex constraints.
57. Recap
Not null constraints
Unique indexes
Foreign keys
Refactor Polymorphic association into separate tables
Reactive integrity checks