This is presntation on how you can read a data model and understand the data and business rules contained in it. It is intended for non-technical people
3. Goal
• To develop basic literacy about data models.
– To understand what it contains.
– To understand how information in it can be used
more effectively.
We would not touch upon
technicalities of developing
a data model.
3
4. Session Structure
• What is a data model, its need and context.
• Different types of data models
• Semantics of data models
• How to read a data model
• How to use data models more effectively
• Question – answers.
4
5. Why Model?
• John Boyd (1927-1997)– Military Strategist and Thinker
• Most original military thinker since Sun Tzu (600BC)
• OODA Loop: Every organization/organism uses OODA loop to
adapt to its surroundings and survive.
5
6. Why model?
• Observation is information gathering.
• Orientation is developing a mental framework of information by
understanding its structure and relationships .
• Models are observation as well as orientation tools which use
symbols for real world facts.
• Models are effective because human mind absorbs more
information visually than textually.
• Models in business and IT – Enterprise Models,
Business Process Models, Workflow Models
Interaction Models, Network Models etc.
6
7. Why model data?
• Data is a distinct component of an information system – the other
component is application logic.
• It needs to be described in such a way that it is clearly and precisely
communicated to all stake holders- information analysts,
application developers, data analysts, database administrators etc.
• Every data element must have a
defined business purpose.
A data model is an un-ambiguous
and precise description of data,
its structure and relationships
agreed upon by all stakeholders.
7
8. What is a data model?
• It is a paper sheet with coloured rectangles and tangled web
of crow-feet lines joining them……
• For a given information system, it is graphical representation
of data elements, their relationships and constraints
governing the data.
InReportOther
InReportHospital occupReportTypeCode: int NULL
servProvOrgRoleID: int NULL
inReportID: int NOT NULL (FK)
servProvLU: text NULL
hospitalOrgRoleID: int NULL
therapistPersonRoleID: int NULL
hospitalLU: text NULL
therapistLU: text NULL
admissionDate: datetime NULL
inReportID: int NOT NULL (FK)
dischargeDate: datetime NULL
findings: text NULL
dischargeNote: text NULL
recs: text NULL
HospitalConsultationReport
HospitalOtherReport
inReportID: int IDENTITY (FK)
inReportID: int IDENTITY (FK)
hospConReportID: int NOT NULL
hospitalOtherReportID: int NOT NULL
doctorPersonRoleID: int NOT NULL
reportDate: datetime NULL
doctorLU: text NULL
hosOthReportTypeCode: int NULL
dictationDate: datetime NULL
source: varchar(50) NULL
diagnosis: text NULL
findings: text NULL
findings: text NULL
procedures: text NULL
procedures: text NULL
comments: text NULL
HospitalImagingReport
inReportID: int IDENTITY (FK)
hospImgReportID: int NOT NULL
reportDate: datetime NULL
proceduresCode: int NULL
findings: text NULL
opinions: text NULL
8
10. Types of data models
• Data can be described with different perspectives: Object-
Role Models, Entity-Relationship Diagrams(ERDs), Data Flow
Diagrams ( DFDs), UML Class Diagrams etc.
• Entity-Relationship (ER) Diagrams most popular for data
modeling as they can easily be converted into relational
database designs.
InReportHospital
inReportID: int NOT NULL (FK)
hospitalOrgRoleID: int NULL
hospitalLU: text NULL
admissionDate: datetime NULL
dischargeDate: datetime NULL
dischargeNote: text NULL
HospitalConsultationReport
HospitalOtherReport
inReportID: int IDENTITY (FK)
inReportID: int IDENTITY (FK)
hospConReportID: int NOT NULL
hospitalOtherReportID: int NOT NULL
doctorPersonRoleID: int NOT NULL
reportDate: datetime NULL InReportPsychTest
doctorLU: text NULL
hosOthReportTypeCode: int NULL
dictationDate: datetime NULL inReportID: int NOT NULL (FK)
source: varchar(50) NULL
diagnosis: text NULL source: varchar(50) NULL
findings: text NULL
findings: text NULL assessmentDate: varchar(30) NULL
procedures: text NULL
procedures: text NULL conclusions: text NULL
comments: text NULL
summary: text NULL
recs: text NULL
HospitalImagingReport
inReportID: int IDENTITY (FK)
hospImgReportID: int NOT NULL InReportAmb NeuroPsychTest
reportDate: datetime NULL neuroPsychTestID: int IDENTITY
inReportID: int NOT NULL (FK)
proceduresCode: int NULL inReportID: int NOT NULL (FK)
scene: varchar(50) NULL
findings: text NULL test: varchar(75) NULL
sceneTime: datetime NULL
opinions: text NULL result: varchar(75) NULL
destination: varchar(50) NULL
destinationTime: datetime NULL note: text NULL
GlasgowComaScale complaint: varchar(100) NULL
gcsID: int IDENTITY injuryMech: text NULL
inReportID: int NOT NULL (FK) history: text NULL
time: datetime NULL medications: text NULL
eyes: tinyint NULL allergies: text NULL
verbal: tinyint NULL consciousness: varchar(50) NULL
motor: tinyint NULL airwayControlCode: int NULL
total: tinyint NULL note: text NULL
10
11. Types of ERD – domain model
• Domain Model(Subject Area Model): A very high level (10,000
feet) conceptual model showing the major entities and their
relationships in a business or problem domain
• Only entities are shown
11
12. Scope of domain models
• Business Domain Models or Business Subject
Area Models – Very high level covering entire
business
• Application Domain Models or Application
Subject Area Models – covering an
application/package.
12
13. Types of ERD – logical models
• Logical Models: Showing entities and their logical
relationships for a given information system.
TOTAL LOSS REQUEST RECORD CLAIM FILE
Claim File Id CLAIM FILE ESTIMATE GROUP
Claim File Id (FK)
RequestNumber Claim File Id (FK)
ICBC Claim Number
ActualMileageFlag ICBC Form Id Claim Program Type
CommentsNotOnValuation ClaimStatus Estimating Business Facility Number
CommentsOnValuation ControlLogNumber Maximum Estimate Id
Condition EstimateCount Current Status
Equipment Creation Date Last Status Change Timestamp
MarketValue Creation Time Stale Claim Flag
Other LastNet BF Logical Supplement Count
OtherAdj PrimaryImpactPoint
OtherDesc SecondaryImpactPoint
Packages Entered Car Model Year
RequestUploadFlag Entered Car Model VIN
SalvageType ADPHostControlLogNumber
SearchDays DeviceAssetNumber
SearchExtent PenPro Claim Number
TransferFee AcctControlNo
ValuationLevel Adjuster Resource Name
ValuationStatus Adjuster Resource Number
Create Date LossSecondPayee
LossPayee
LossType
LossDate
ESTIMATE DAIS CHUNKS VEHICLE REPAIR LOG PolicyNumber
Claim File Id (FK) Vehicle Repair Log Claim File Id (FK) Insured Name
Sequence Number Vehicle Repair Log Secondary Id Claim Centre Number
Claim Centre Name
DAIS Data Vehicle Repair Log Logon Id CLF_DAIS_NUM_BYTES
Vehicle Repair Log TimeStamp CLF_DAIS_NUM_ROWS
Vehicle Repair Log Car In Date Claim Number Check Digit
Vehicle Repair Log Car In Time Exposure Code
Vehicle Repair Log Customer Contact Date Kind Of Loss Code
Vehicle Repair Log Customer Contact Time Person Organization Id
Vehicle Repair Log Car Out Date Licence Series Year
Vehicle Repair Log Car Out Time Declared Value
Vehicle Repair Log Exclude Flag Gross Vehicle Weight
VRL_PVRT_NUM_DAYS
ESTIMATE PRINT IMAGE LINE
Claim File Id (FK)
EstimateID (FK)
Estimate Print Line Number
Estimate Print Line Text
13
14. Types of ERD-physical models
• Physical Models: The model showing the physical
implementation of logical model at data storage level.
• Contains columns for implementing relationships and fast
data access. CLAIM_FILE
AUTOSOURCE_REQUEST CLF_ID: DECIMAL(15,0) NOT NULL
• Most tools can create
ASR_CLF_ID: DECIMAL(15,0) NOT NULL (FK) CLF_ICBC_CLM_NUM: CHAR(7) NOT NULL
ASR_REQ_ID: SMALLINT NOT NULL
CLF_ICBC_FORM_ID: CHAR(1) NOT NULL
ASR_ADXE_CREATE_ID: VARCHAR2(35) NOT NULL CLF_CLM_STAT: SMALLINT NOT NULL
ASR_EST_ID: SMALLINT NOT NULL CLF_CNTL_LOG_NUM: CHAR(25) NOT NULL
ASR_PRODUCT_TYP: CHAR(1) NOT NULL CLF_EST_CNT: SMALLINT NOT NULL
ASR_DEVICE_NME: VARCHAR2(10) NOT NULL CLF_SCHED_DTE: DATE NULL VEHICLE_REPAIR_LOG
schema scripts from
ASR_SEARCH_DAYS: VARCHAR2(30) NOT NULL CLF_SCHED_TME: DATE NULL
CLF_LAST_NET: DECIMAL(8,2) NOT NULL VRL_CLF_ID: DECIMAL(15,0) NOT NULL (FK)
ASR_SEARCH_PROV_CD: VARCHAR2(30) NOT NULL
ASR_SEARCH_PROV: VARCHAR2(30) NOT NULL CLF_PRIM_IMP_PNT: SMALLINT NOT NULL VRL_SEC_ID: SMALLINT NOT NULL
ASR_SEARCH_POSTAL: VARCHAR2(30) NOT NULL CLF_SEC_IMP_PNT: SMALLINT NOT NULL VRL_LOGON_ID: CHAR(8) NOT NULL
ASR_SEARCH_CITY: VARCHAR2(30) NOT NULL CLF_SCHED_YEAR: SMALLINT NOT NULL VRL_TMESTMP: TIMESTAMP NOT NULL
ASR_ASHOST_REQ_NUM: CHAR(8) NOT NULL CLF_SCHED_VIN: CHAR(20) NOT NULL VRL_CAR_IN_DTE: DATE NOT NULL
ASR_CURRENT_STAT: CHAR(18) NOT NULL CLF_ADPH_CNTL_NUM: CHAR(7) NOT NULL
physical models.
VRL_CAR_IN_TME: DATE NOT NULL
ASR_ADJ_POLARITY: CHAR(6) NOT NULL CLF_DEV_ASSET_NUM: CHAR(10) NOT NULL VRL_CUST_CNTCT_DTE: DATE NULL
ASR_ADJ_VALUE: DEC(8,0) NOT NULL CLF_PENPRO_CLM_NUM: CHAR(25) NOT NULL VRL_CUST_CNTCT_TME: DATE NULL
ASR_ADJ_DESC: VARCHAR2(30) NOT NULL CLF_ACCT_CNTL_NUM: CHAR(17) NOT NULL VRL_CAR_OUT_DTE: DATE NULL
ASR_TITLE_FEE: DEC(4,0) NOT NULL CLF_ADJ_RSRC_NME: CHAR(35) NOT NULL VRL_CAR_OUT_TME: DATE NULL
ASR_TRANSFER_FEE: DEC(4,0) NOT NULL CLF_ADJ_RSRC_NUM: CHAR(5) NOT NULL VRL_EXCLUDE_FLG: CHAR(1) NOT NULL
ASR_SALVAGE_TYP: SMALLINT NOT NULL CLF_LOSS_SECND_PAY: CHAR(30) NOT NULL VRL_PVRT_NUM_DAYS: SMALLINT NULL
ASR_PUB_COMMENT: VARCHAR2(1000) NOT NULL CLF_LOSS_PAYEE: CHAR(30) NOT NULL
ASR_PRIV_COMMENT: VARCHAR2(1000) NOT NULL CLF_LOSS_TYP: SMALLINT NOT NULL
ASR_RECEIVED_DTE: DATE NULL CLF_LOSS_DTE: DATE NULL
CLF_PLCY_NUM: CHAR(12) NOT NULL
CLF_INS_NME: CHAR(27) NOT NULL
CLF_CLM_CNTR_NUM: CHAR(3) NOT NULL
CLF_CLM_CNTR_NME: CHAR(30) NOT NULL
CLF_DAIS_NUM_BYTES: INTEGER NOT NULL
AS_REQ_CONDITION CLF_DAIS_NUM_ROWS: SMALLINT NOT NULL
ASRC_CLF_ID: DECIMAL(15,0) NOT NULL (FK) CLF_CLM_NUM_CD: CHAR(1) NOT NULL
ASRC_REQ_ID: SMALLINT NOT NULL (FK) CLF_EXP_CDE: CHAR(1) NOT NULL
ASRC_SEQ_NUM: SMALLINT NOT NULL CLF_KOL_CDE: CHAR(2) NOT NULL
CLF_PO_ID: DECIMAL(15,0) NOT NULL
ASRC_COMPONENT: VARCHAR2(72) NOT NULL CLF_LIC_SER_YEAR: CHAR(1) NOT NULL
ASRC_COND_TYP: CHAR(1) NULL CLF_DEC_VALUE: DECIMAL(7,0) NOT NULL
ASRC_CNDTYP_RATING: CHAR(18) NOT NULL CLF_GR_VEH_WT: CHAR(6) NOT NULL
ASRC_COND_RATE: SMALLINT NOT NULL CLF_PR_ID: DECIMAL(15,0) NULL
ASRC_COND_DATE: DATE NULL CLF_AQT_CDE: CHAR(3) NOT NULL
ASRC_COND_VALUE: DECIMAL(6,0) NOT NULL CLF_MIN_NO_DAM_TYP: CHAR(2) NOT NULL
ASRC_COND_NAME: VARCHAR2(30) NOT NULL CLF_EST_REM_CRC: INTEGER NOT NULL
ASRC_COND_NOTES: VARCHAR2(30) NULL CLF_EST_REM_CH_FLG: CHAR(1) NOT NULL
CLF_PURGE_FLG: CHAR(1) NOT NULL
CLF_PURGE_DTE: DATE NULL
14
15. Semantics of data models
• Data models use graphical notations and text strings called
‘Verb Phrases’.
• The semantics of notations depends upon the modeling
technique followed and the tool being used.
15
16. Entities
• A Thing of significance for business for which data has to be
stored and manipulated.
• Nouns representing Objects, Events, Concepts, Relationships,
Actions…..
• In data models represented as rectangles.
• Examples: Insurance policy, Claim, Vehicle, Event etc.
16
17. Entity sub-types
• Some entities have many subtypes
• PERSON and ORGANIZATION entities are sub types of PARTY
entity
• FULL TIME EMPLOYEE and CONTRACT EMPLOYEE are sub
types of EMPLOYEE entity
• They are depicted as contained in main entity or as child of
main entity
Party
Employee
Full Time Contract
Person Organization
17
18. Attributes
• The properties of Entities for which data has to be collected
and stored.
• Attributes are represented as text strings contained inside the
entities in data models.
• Example- Policy holder`s name, event date, claim amount etc
18
19. Relationships
• Relationships represent how entities interact and
create, use, modify or delete each other.
• They are represented by different types of lines going
from one entity to another.
---------------- ________ -------------
_________ ________
19
20. Cardinality of relationship
• Cardinality of relationship is number of instances of entities at
the two ends of relationships.
• It is represented by 3 domain values – Zero, One or Many
• It may be shown as a circle, a vertical line and a crow feet at
the end of relationship lines or some other symbol.
• Sometimes it is represented as ‘0’, ‘1’ or ‘n’ on relationship
lines.
Policy Claim
..1.. 0…n
Product Line Item
20
21. Optionality of relationships
• Optionality of relationship means whether the entity ‘may be
present’ or ‘must be present’ in the relationship.
• It may be represented as ‘solid line’ or ‘broken line’ part in the
relationship ( or some other way)
Policy _____---------- Claim
21
23. Verb phrases
• Verb Phrases describe relationship between two
entities going from one entity to another in both
directions.
Employs
organization Employee
Works for
Paid to Policy
Claim Holder
Makes
23
24. Keys
• Keys are for navigating through data: information retrieval
• Primary Keys: A primary key is a group of attributes that
uniquely identifies an entity instance. Every entity has exactly
one primary key
• Foreign Keys: Navigating to attribute of an entity from another
entity. FK attributes implement relationships and are owned
by parent entities.
24
26. Domains
• A named set of data values all of the same data type, upon
which the actual value for an attribute instance is drawn.
• Every attribute must be defined on exactly one underlying
domain. Multiple attributes may be based on the same
underlying domain.
• Example of domain –
– Gender- M, F
– Province -Varchar(2) – BC, AB, ON, NF, QC, MN, SC, YU
– Short Description- Varchar(40)
– Long Description – Varchar(2000)
– Unique Identifier – Integer(9)
26
27. Cost of wrong domains
• NASA spacecraft Mars Climate Orbiter crashed on mars surface in
1998. The spacecraft was using domain with USMB units(pound
force seconds ) whereas the control center was using domain
based on SI units(newton seconds). Total cost - $327.6 million
• European Ariane 5 expendable launch system blast occurred 37
seconds after launch in 1996- Wrong use of domain(Integer vs
Float) caused integer overflow - Total cost - $8 Billion
27
28. Types of notations
• Different types of semantic notations are
available for ER diagramming
– Chen Notations
– IDEF1X
– Information Engineering
– Barker Notations
28
29. Types of notations-IDEF1X
.
Independent Dependent
Entities Entities
Discriminator
Identifying – Solid lines Non-Identifying- Dashed lines
------------
Category Complete Category In-Complete
Many-to-Many
------------
Zero-One or Many
Z
------------ Attributes
P
Optional
Mandatory
29
30. Types of notations-IDEF1X
• Supported by most of the available tools.
• More geared towards developing physical database design
• Needs combination of notations to capture rules.
• These combinations not easily understood by business
people- difficult to use in JAD sessions.
30
32. Types of notations – Information
Engineering(IE)
Entities Super Type
Identifying Non-Identifying
-----------------
Sub Type Sub Type
One to Many
Zero-or-One
Exclusive OR in Finkelstein
---------------
Many to Many One and only One
Zero-One or Many
Attributes Attributes
Sub Type Sub Type
32
33. Types of notations-Information
Engineering ( IE)
• Two variations - Clive Finkelstein and James Martin
• Different tools implement different variations of the
notations.
• In the original version, attributes not shown on the entities
but in a separate document like Martins` Bubble Chart
• Supported by most of the available modeling tools.
• Easy to understand notations
• Suitable for JAD sessions.
33
35. Types of notations- Barker
.
Entities
Solid-Dashed lines for Optionality
____
--------
One or More Zero or One
_____
--------
Zero or More One to One
Exclusive OR
Super Type
Sub Type
35
36. Types of notations: Barker
• # before attribute – unique identifier attribute
• Solid circle are for required attributes
• Blank circles for optional attributes
• Sub Types are mutually exclusive
• Sub Types are always complete.
• A line across relationship means the relationship is
identifying.
36
37. Types of notations- Barker
• Developed by Richard Barker in UK in 1986.
• Adopted by Oracle for its case methodology.
• Simple and easily understood by business
people.
• Not supported by all tools.
37
40. Reading business rules
Each <Entity 1>
{may be | must be } Optionality
<relationship> Verb Phrase
{zero |only one | one or more} Cardinality
<Entity 2>
An EMPLOYEE A DEPARTMENT
must be may be
staff of composed of
only one one or more
DEPARTMENT EMPLOYEE
40
41. Reading business rules
• A CLAIM FILE may contain Zero, One or More TOTAL LOSS
REQUEST RECORD
• A TOTAL LOSS REQUEST RECORD must be on only one CLAIM
FILE
41
42. Reading business rules
• A CLAIM FILE may have vehicle detail in zero one or more
VEHICLE RECORD
• A VEHICLE RECORD must be (..?..) one and only one CLAIM
FILE
42
43. Reading a data model
• Find out what notations are being used.
• Get a chart of the notations giving graphical representations and their
descriptions.
• Look at the important entities in the model – entities which are center of
many relationships.
• Look at the definition of the entity. The definition should convey the role
entity plays in business.
• Following relationship lines and reading verb phrases, move from one
entity to another.
• Note the relationships implemented in the model.
• Note the cardinality and optionality rules.
• Read the business rule implemented for the entities.
43
45. Reading a data model-gleaning the
business rules
• It is an attributed logical model.
• It is using Information Engineering
(IE) notations.
• A PARTY may place Zero, One or
Many PURCHASE ORDER
• A PURCHASE ORDER
must be received from only one PARTY.
• A PARTY must be of either PERSON
or ORGANIZATION type.
• A PURCHASE ORDER may contain Zero,
One or Many LINE ITEM.
• A LINE ITEM must be placed on
only one PURCHASE ORDER.
• A PRODUCT may be on Zero, One or More LINE ITEM
• A LINE ITEM must shows only one PRODUCT.
• A PRODUCT may be of SOURCED PRODUCT or SERVICE Type
• Party Identifier is key identifier for PARTY.
• Product Identifier is key identifier for PRODUCT.
45
46. Reading a data model-gleaning the
business rules
• Purchase Order Number combined
with PARTY Identifier is Primary
identifier for PURCHASE ORDER
• Line Item Number, Product Identifier,
Party Identifier and Purchase Order
Number combined is Primary identifier
for LINE ITEM
• Surname is attribute of
PERSON only
• Business Number is attribute of ORGANIZATION only.
• Sourced From is attribute of SOURCED PRODUCT only
• Cost Amount is attribute of SOURCED PRODUCT only.
• Service Location is attribute of SERVICE only.
• Rate Per Hour is attribute of SERVICE only.
46
47. Reading a data model- deriving real
value
• Very important exercise for flushing out hidden and missing
business rules- minimize ‘later day change requests’.
• Value is in critical examination of business rules.
– A PURCHASE ORDER must be received from only one PARTY :
• Can a party transfer its purchase order to another party?
• What if a party is dissolved, merged or acquired by another party after
placing a purchase order? Do we need to know about original party?
• Can two parties place a combined order to obtain volume discount?
– Business Number is attribute of ORGANIZATION only,
• There are individuals who are incorporated and have a business number.
Should we capture their business number?
– A PRODUCT may be of SOURCED PRODUCT or SERVICE Type
• What about sourced products requiring installation service and support? Should we
invoice service on a separate purchase order
47
49. Data models – maximizing ROI.
• Make data modeling mandatory part of development life
cycle.
• Standardize on use of data modeling tool so everybody is
familiar with its semantics.
• Provide training to users in modeling
tool and its semantics.
• Capture additional business rules
in separate documents for their
completeness.
• Keep data models up to date.
49
50. Further readings
• Help section of the data modeling tools: most of the tools
come with good support documentations on modeling
methodology and notations.
– Data Model Patterns: Convention of Thought by David C. Hay
– Data Modeling Made Simple: A Practical Guide for Business and IT
Professionals by Steve Hoberman
– Data Modeling for the Business: A Handbook for Aligning the Business
with IT using High-Level Data Models (Take It with You Guides) - By
Steve Hoberman, Donna Burbank, Chris Bradley
50