Когда задают вопрос: "Что такое хороший дизайн базы данных?", - первое что всплывает в голове - нормализация и правила, которые нам рассказывали ещё в институте. Но почему-то всё чаще мы слышим что нормализация это зло, сейчас диски и память дешёвые, и в принципе лучше иметь денормализированную схему.
Но так ли это?
В этом мы и разберёмся в ходе доклада, посмотрев реальные примеры, преимущества и недостатки как нормализации, так и денормализации.
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
Денис Резник "Relational Database Design. Normalize till it hurts, then Denormalize till it works"
1. Golden Database Design Rule:
Normalize till it hurts
Denormalize till it works
Denis Reznik
Data Architect at Intapp, Inc.
Microsoft Data Platform MVP
http:/reznik.uneta.com.ua
@denisreznik
2. Database History
1960s 1970s 1980s 1990s 2000s Nowadays
Object
Databases
RDMS
Commercial
Success
SQL
RDBMS
Ingress
System R
E.F. Codd’s
Paper
CODASYL
IMS
NoSQL
(Johan Oskarsson)
(?)
Google BigTable
Paper
Amazon
Dynamo Paper
3. Id UserId Name Date
1 1 Work 07/03/2014
2 2 Test 09/09/2015
4 2 Rest 12/08/2015
Id Name Phone
1 Bill +380678455732
2 John NULL
3 Mike +380501233427
Matter Client
Relations (Tables)
Attribute (Column)
Tuple (Row)
Relational Model
4. Normalization
• Normalization is the process of organizing the columns (attributes) and tables
(relations) of a relational database to minimize data redundancy
Redundancy ComplexityTable Count
5. Company User Phone Phone Type
Microsoft John Dow +380969785732 NULL
Microsoft John Dow +32345409123 NULL
Microsoft Larry McGregor +45678904692 NULL
Oracle Corp. John Snow +380988958371 NULL
Amazon Jack Snack +23348902385 Home
Amazon Jack Snack +69058763287 Work
First Normal Form (1NF)
• Each cell contains an atomic value
Company User Phone
Microsoft John Dow Tel1: +380969785732, Tel2: +32345409123
Microsoft Larry McGregor Tel: +45678904692
Oracle Corp. John Snow +380988958371
Amazon Jack Snack Home: +23348902385 Work: +69058763287
MattersMatters
6. Second Normal Form (2NF)
• Table has a Key (Key = Primary Key)
• All non-key columns of the relation are depend from a a whole Key
Matters
Company User Company Address Manager
Microsoft John Dow Redmond Jane Daw
Microsoft Duncan MacLeod Redmond John Dow
Microsoft John Snow Redmond TonyStark
Oracle Corp. John Dow California Rick Brick
Amazon Jack Snack Seattle George Black
Google Dale Cooper California Diana Smith
User Company Manager
John Dow Microsoft Jane Dow
Duncan MacLeod Microsoft John Dow
John Snow Microsoft Tony Stark
John Dow Oracle Corp. Rick Brick
Jack Snack Amazon George Black
Dale Cooper Google Diana Smith
Company Address
Microsoft Redmond
Oracle Corp. California
Amazon Seattle
Google California
ClientsKey: (Client, Matter)
7. Second Normal Form (2NF)
• Table has a Key (Key = Primary Key)
• All non-key columns of the relation are depend from a a whole Key
Matters
Company User Company Address Manager
Microsoft John Dow Redmond Jane Daw
Microsoft Duncan MacLeod Redmond John Dow
Microsoft John Snow Redmond TonyStark
Oracle Corp. John Dow California Rick Brick
Amazon Jack Snack Seattle George Black
Google Dale Cooper California Diana Smith
User Company Manager
John Dow Microsoft Jane Dow
Duncan MacLeod Microsoft John Dow
John Snow Microsoft Tony Stark
John Dow Oracle Corp. Rick Brick
Jack Snack Amazon George Black
Dale Cooper Google Diana Smith
Company Address
Microsoft Redmond
Oracle Corp. California
Amazon Seattle
Google California
ClientsKey: (Client, Matter)
8. Third Normal Form (3NF)
• Every non-prime attribute of Relation is non-transitively dependent
on every Key of Relation
Matters
Company User Manager Manager Age
Microsoft John Dow Peter Parker 23
Microsoft Patrik Jones Steven Wu 45
Microsoft Jackie Adams Steven Wu 45
Oracle Corp. Ashley Grey John James 67
Amazon Scott McMillan John Smith 34
Amazon Mary Smith John Smith 34
Key: (Client, Matter)Matters
Company User Manager
Microsoft John Dow Peter Parker
Microsoft Patrik Jones Steven Wu
Microsoft Jackie Adams Steven Wu
Oracle Corp. Ashley Grey Jean Claude
John Smith Scott McMillan John Smith
Adam Gram Mary Smith John Dow
Attorneys
Manager Manager Age
Peter Partner 23
Steven Wu 45
Jean Claude 67
John Smith 34
9. Fourth Normal Form (4NF)
• Eliminates independent many-to-one relationships between columns
Matters
Id Company Consultant
1 Microsoft Peter Partner
2 Microsoft John Dow
3 Microsoft Amy Chen
4 Oracle Jim Beam
5 Amazon John Snow
6 Google John Snow
Matters
Id Company
1 Microsoft
2 Oracle
3 Amazon
4 Google
Attorneys
Id Consultant
1 Peter Partner
2 John Dow
3 Amy Chen
4 Jim Beam
5 John Snow
MatterAttorneys
CompanyId ConsultantId
1 1
1 2
1 3
2 4
3 5
4 5
10. Foreign Keys
Users
User Company Manager
John Dow Microsoft Jane Dow
Duncan MacLeod Microsoft John Dow
John Snow Microsoft Tony Stark
John Dow Oracle Corp. Rick Brick
Jack Snack Amazon George Black
Dale Cooper Google Diana Smith
Company Address
Microsoft Redmond
Oracle Corp. California
Amazon Seattle
Google California
CompaniesKey: (User, Company)
FK_USERS_COMPANY
11. Foreign Keys
Users
User Company Manager
John Dow Microsoft Jane Dow
Duncan MacLeod Microsoft John Dow
John Snow Microsoft Tony Stark
John Dow Oracle Corp. Rick Brick
Jack Snack Amazon George Black
Dale Cooper Google Diana Smith
Company Address
Microsoft Redmond
Oracle Corp. California
Amazon Seattle
Google California
CompaniesKey: (User, Company)
FK_USERS_COMPANY
UPDATE Users SET Company = 'Microsoft'
WHERE User = Dale Cooper
AND Company = Google'
1NF Goal: Columns, Rows and cells are used as they were designed for
Attribute: Store domain (structural) information
Tuple: Store Data
Cell: Store atomic value
2NF Goal: Keys should be irreplaceable
Example: A car and a key
2NF Goal: Keys should be irreplaceable
Example: A car and a key