8. IBM Databases Run Amok
1.losing data
2.duplicate data
3.wrong data
4.crappy performance
5.downtime for database redesign
whenever anyone made an
application change
15. Domains (types)
Relation
(table, view, rowset)
Tuple (row) Tuple (row) Tuple (row)
INT TEXT INT TEXT INT Attribute
TEXT
DATE DATE Attribute
DATE
Tuple (row) Tuple (row)
INT Attribute
TEXT INT Attribute
TEXT
Attribute
DATE Attribute
DATE
24. What's Atomic?
The simplest form of a datum, which is not
divisible without loss of information.
name
Josh Berkus
SELECT SUBSTR(name,STRPOS(name, ' ')) ...
Status
ai
… WHERE status = 'a'???status = 'u' ...
… WHERE or ...
25. What's Atomic?
The simplest form of a datum, which is not
divisible without loss of information.
first_name last_name
Josh Berkus
active access
TRUE a
27. Atomic, Shmomic. Who Cares?
● Atomic Values:
– retain data
– make joins easier
– make constraints easier
● Non-atomic Values:
– make data loss more likely
– increase CPU usage
– make you more likely to forget something
29. Splitting a Bulletin Board Thread
the hard way
INSERT INTO threads VALUES ( .... );
If $dbh('success') then
for $these_posts.date > $cutdate loop
UPDATE posts SET thread = $newthread
WHERE id = $these_posts.id;
if not $dbh('success') then
for $these_posts.id > $last_id loop
UPDATE posts SET thread = $oldthread
WHERE id = $these_posts.id;
DELETE FROM threads
WHERE id = $newthread;
30. Splitting a Bulletin Board Thread
the transactional way
BEGIN;
INSERT INTO threads VALUES ( .... );
$newthread = curval();
UPDATE posts SET thread = $newthread
WHERE thread = $oldthread
AND date > $cutdate;
END;
31. name user_name
admin
Josh Berkus Josh Berkus
Berkus
Joshua Berkus
Berkus, Josh
user_name
Josh Berkus
Josh
Joshua Berkus
Problem 2:
Duplicate Data
34. A Good Key
● Should have to be unique because the
application requires it to be.
● Expresses a unique predicate which describes
the tuple (row):
– user with login “jberkus”
– post from “jberkus” on “2009-05-02 13:41:22” in
thread “Making your own wine”
● If you can't find a good key, your table design is
missing data.
37. Abby Normal
login level last_name
jberkus u Berkus
selena a Deckelman
login title posted level
jberkus Dinner? 09:28 u
selena Dinner? 09:37 u
jberkus Dinner? 09:44 a
38. How can I be “Normal”?
1. Each piece of data only appears in one relation
– except as a “foreign key” attribute
● No “repeated” attributes
login level last_name
jberkus u Berkus
selena a Deckelman
login title posted
jberkus Dinner? 09:28
selena Dinner? 09:37
jberkus Dinner? 09:44
42. Ensure Your Data is Consistent
no matter where it came from
first_name last_name email login password active level
Josh Berkus josh@pgexperts.com jberkus jehosaphat TRUE a
NULL Kelley kelley@ucb k NULL TRUE u
Mark Twain www.pm.org samuel halleys NULL I
S F gavin@sf.gov gavin twitter FALSE x
43. Users Under Constraint
● first_name (text) length() > 1
● last_name (text) length() > 1
● email (text) ILIKE '%@%.%'
● login (text) length() > 5
● password (text) length() > 5
● active (boolean) NOT NULL
● access (char) IN ( 'a','u' )
note: email and other validators would, of course, be more complex
45. Posts Table
● title (text) NOT NULL
REFERENCES threads ( title )
ON DELETE CASCADE ON UPDATE CASCADE
● posted (timestamp) NOT NULL
● user (text) NOT NULL
REFERENCES users ( login )
ON DELETE CASCADE ON UPDATE CASCADE
● content (text) NOT NULL
46. Beautiful Cascades
posts.content
Josh Berkus
What's up?
users.login
I'm going crazy!
Josh Berkus
jberkus
www.pornking.com
jerkyboy
Why?
selena
www.whitehouse.com
OSB! It's too much!
www.whiteslavery.com
www.lolcats.com
I told you so ...
48. Things We've Already Done
● Atomicization
– less CPU on parsing, calculations
● Normalization
– less data duplication
– smaller tables
● Transactions
– more batches, less iteration
– less locking
51. Stuff We've Already Done
to make our data “agile”
● Atomicization
– data isn't in specific interface version formats
● Normalization
– where to extend data is more obvious
– create a new table if you have to
● Transactions
– prevent partial failures from changed schema
53. Extending the Users Table
● first_name (text) CREATE VIEW oldapp.users
● last_name (text) AS
SELECT
● email (text) first_name || ' ' || last_name,
● login (text) email, login, password,
active, access
● password (text) FROM users;
● active (boolean)
● access (char)
● created (timestamp)
● last_login (timestamp)
54. The Rest
you already know
● Write Migrations
– deploy these in transactions, if supported
– if not, write rollback scripts
● Write Tests
– use a realistic staging environment
55. Some Other Tips
● Pick a naming scheme
– and stick to it
● Don't do your joins in the application
– the database does them better
● Repurposing fields will bite you
– sure you'll remember when that changed
● Don't micro-optimize
– INT3, anyone?
56. More Information
● me
– josh@pgexperts.com
– www.pgexperts.com
– it.toolbox.com/blogs/database-soup
● postgresql: www.postgresql.org
● tutorial at OSCON
– monday, 8:30 am!
– see you in San Jose!
This presentation copyright 2009 Josh Berkus, licensed for distribution under the
Creative Commons Attribution License, except for photos, most of which were
stolen from other people's websites via images.google.com. Thanks, Google!