A database is a collection of organized information that can be accessed and managed efficiently. Clinical databases aim to accurately capture and store patient data to facilitate analysis and reporting. Relational databases are commonly used as they allow data to be organized into tables and linked together through common identifiers. Data entry involves transferring paper records into electronic format in the database. Double data entry checks for errors by having two people enter the same data, while single entry relies more on in-built validation checks. Databases must be designed carefully to collect only necessary variables and ensure high data quality.
Data base and data entry presentation by mj n somya
1. DATA BASE AND DATA ENTRY
Presented By:-
Mukesh Jaiswal
Somya Verma
ICRI, Dehradun.
2. Clinical Data Base
• A database is a method of organizing and
analyzing information.
• A database is a collection of information that
is organized so that it can easily be
accessed, managed, and updated. In one
view, databases can be classified according to
types of content: bibliographic, full-
text, numeric, and images.
3. Cont…
• In computing, databases are sometimes classified
according to their organizational approach. The
most prevalent approach is the relational
database, a tabular database in which data is
defined so that it can be reorganized and
accessed in a number of different ways. A
distributed database is one that can be dispersed
or replicated among different points in a network.
An object-oriented programming database is one
that is congruent with the data defined in object
classes and subclasses
4. Cont…
• The main objectives of data base design is to
capture and store clinical data accurately.
• The essential features of good design are ease
of data capture, efficient creation of analysis
datasets and accommodation of source data
transfer formets.
5. Why use a database?
• Organize and analyze information in different
ways
– Sorting
– Grouping
– Querying
– Reporting
– Exporting for statistical analysis
• Computerized database
– Speed
– Quality control
– Precision
– Automate repetitive tasks
6. Databases versus Excel
• Excel has some limited capabilities to sort data but its primary
function is to create financial spreadsheets
– Can create “what if” scenarios to determine financial consequences
– Can be used for small and limited research data sets and simple lists
– Not multi-user such that only one person can work on the file at a time
• Databases are designed to collect, sort, and manipulate data
– Data sets can process large amounts of data and is usually limited by
hardware constraints
– Structure is in the same format for each member record of a table
– Data quality control features ensure that valid data is entered
– A relational database allows for linking of an unlimited number of
tables
– Databases are multi-user because the data can reside on a server and
multiple people can have access at the same time
– Many databases offer web interfaces thereby eliminating the need for
each user to have a copy of the the program on their computer
7. Cont…
• Many databases offer audit functions required by
certain regulatory agencies
• Tracks date record created and modified
• Tracks original and changed values
• Requires user to give reason for the change
• Databases are more suitable for importing data
from multiple sources
• More robust in connecting to different data sources
• Imports of different data types into different tables can be
linked via common identifiers such as subject ID
• Merging multiple data sources into Excel so that the rows line
up properly in a flat file format can be a challenge
8. How is a database organized?
• One or more tables
• Tables store records
– Patient identifiers
– Demographics and history
– Test results
– Etc…..
• A record is a collection of fields
– Patient identifiers
• Name, DOB, address, …..are stored in separate fields
10. Differences between a clinical and
research database
• Clinical database
– Form or report oriented so data is displayed for
clinical decision making
– Emphasis on displaying or reporting of individual
data rather than accumulating multiple records
• Research database
– Table oriented so that data is accumulated for
eventual export to a statistical package for data
analysis and reporting
– Less emphasis on individual records
11. Types of Database
• Flat-File:- The flat-file style of database are
ideal for small amounts of data that needs to
be human readable or edited by hand.
Essentially all they are made up of is a set of
strings in one or more files that can be parsed
to get the information they store; great for
storing simple lists and data values, but can
get complicated when you try to replicate
more complex data structures.
12. Cont…
• Relational:- The relational databases such as
MySQL, Microsoft SQL Server and Oracle, have a much
more logical structure in the way that it stores data.
Tables can be used to represent real world
objects, with each field acting like an attribute.
• One major advantage of the relational model is that, if
a database is designed efficiently, there should be no
duplication of any data; helping to maintain database
integrity. This can also represent a huge saving in file
size, which is important when dealing with large
volumes of data.
13. Cont…
• Relational databases also have functions "built
in" that help them to retrieve, sort and edit
the data in many different ways. These
functions save script designers from having to
worry about filtering out the results that they
get, and so can go quite some way to speeding
up the development and production of web
applications.
14. Advantages of a Relational Database
• Elimination of Multiple Value Data – a relational database allows
creation of relationships for subordinate data. For example, a table
for laboratory testing and another table for clinical findings would
each have multiple subjects but the subject demographic
information is maintained in a separate table).
• Avoiding Update Anomalies – since data is stored in only one
place, it is easy to update (no other copies to remember to update).
• Avoiding Data Entry Anomalies – like updates, since data is only
stored in one place, it needs to be inserted in one place.
• Avoiding Data Deletion Anomalies – once again, since data is in
one place only, it is deleted only once.
15. Advantages of a database
• Collection of data in a centralized location
• Controls redundant data
• Data stored so as to appear to users in one
location
– Data can be stored in multiple tables and come
from multiple sources
– A relational database brings it all together
16. Database Design Considerations
• What to collect
– What questions are to be answered?
– Think of the data tables in your future publications
• Focus on the key data elements rather than collect as much as
possible
• What statistical package will be used
– Format of the data file to which the data will be exported
• Allowable characters
• Format for certain analyses
– For example, gender can be recorded in the database as M or F but
statistical package may require 0 and 1
• Length of data field labels
• Long or wide format
17. Long versus Wide Format
Long: each year is represented as its own observation in a record
Wide: each family is a record and each year is a field with that record
18. Quality Control of Data Before
Study
• Collect only needed variables
• Select appropriate computer hardware and
software
• Plan analyses with dummy tabulations
• Develop study forms
– Precode responses
– Format boxes for data entry
– Label each page with date, time, ID
– Consider scan technology
19. What needs to be in the research
database?
Research variables directly related to the
hypotheses being tested-YES
Clinical measures used for screening-MAYBE
◦ Blood work, ECG, medical history
Administrative data-NO
◦ Contact information
◦ Scheduling
20. What Do You Do With the Data?
• Ongoing monitoring
• Safety/adverse event reporting
• IRB reports/sponsor reports
• FDA reports
• Early analysis/late analysis
21. Data Entry
• Refers to the process of transferring data from
the paper CRF to the data base.
• This is also refers to as transcribing the data.
• Data entry result in creation of electronic data
, which corresponds to the CRF data.
• Once the data is entered into the database, it
is reviewed and validated by the data editor.
• Data entry consists of both double entry and
single entry.
22. Double Entry
• This involves entry of the same CRF page by two
independent data entry personnel.
• The first data entry personnel keys in the data
into the database. Later, a second independent
data entry personnel keys in the same data.
• In the case of difference or discrepancy between
first and second entry, a ‘pop up’ box throws
up, alerting the second data entry personnel
either key in what they see or to accept what the
first data entry personnel has entered.
23. Cont…
• Another option is to have a third personnel
review the differences/discrepancies and
resolve them.
• Thus double data entry serves as a quality
check in the data that is entered into the
database.
24. Cont…
• The system allowed design of data entry forms that
satisfied the needs of our
clinicians, biostatisticians, and administrative staff. The
system drastically reduced the time required to enter
patient exam, demographic, and laboratory
measurement data onto the study database, and
provided tools for verifying that the data were scanned
accurately. The system improved both the quality of
patient care and the integrity of clinical patient
data, allowing clinicians to quickly and easily retrieve
patient records, and permitted our biostatisticians to
generate periodic recruitment monitoring, patient
safety, protocol adherence, and data quality assurance
reports in a timely fashion.
25. Single Entry
• This involves entry by single data entry
personnel.
• This process is used when there are sufficient
and extensive checks built into the database
that would detect certain error that might be
missed out by the data entry personnel.
• Single data entry is extensively used in EDC
and RDC systems.
26. Cont…
• Thus single data entry eliminates having data
entry personnel within the data management
unit.
• Once the data is keyed directly at site, it is
already to be reviewed, edited and validated
by the data editor.
27. Cont…
The data entry could be of two types:-
• Data entry is done locally at the site database and
transmitted periodically to the central database via
internet or using a dialup line. Sometimes the data is
sent using other electronic media such as a CD, floppy
or as a mail attachment.
• Data entry is done online directly into the central
database via internet. Usually these systems are web-
based and the data is available in real time for review.
28. Rules for Data Entry
• Each variable has a field in the dataset
• Categorical and nominal values require a number or
string code
• Continuous values are entered directly
• Missing values must be different values from a real
response
– Common formats are “99” or bullets “·”
– Don’t know is a response—do not leave blank
– “0” is not the same as missing
• Coding instructions should be on form
• Avoid open-ended questions
29. Avoid open-ended questions
• Enter the subject’s
gender:___________________
• Enter the subject's level of
education:__________