Sponsors and CROs know the value of having a consolidated and regulatory-compliant data warehouse, such as Oracle’s Life Sciences Data Hub (LSH), as well as the importance of consistently loading data into that warehouse quickly and accurately.
However, as data structures from the source files change over time, it can be very time consuming to modify the data structure in the warehouse itself. Additionally, for the large groups of SAS datasets that are typical for a clinical trial, the out-of-the-box load times can be quite long, as the data is loaded one set at a time.
Perficient has the answer. In this webinar, we discussed and demonstrated an autoloader tool that greatly simplifies the data loading process for LSH. We showed how the autoloader can automatically load files, detect metadata changes, upgrade target structures, and load data, all with no human intervention. In addition, we demonstrated how Perficient’s autoloader tool can load multiple datasets in parallel to minimize load times.
Developer Data Modeling Mistakes: From Postgres to NoSQL
How to Load Data More Quickly and Accurately into Oracle's Life Sciences Data Hub
1. How to Load Data More Quickly and Accurately
into Oracle Life Sciences Data Hub
2. 2
ABOUT PERFICIENT
Perficient is a leading information
technology consulting firm serving
clients throughout North America.
We help clients implement digital experience, business
optimization, and industry solutions that cultivate and captivate
customers, drive efficiency and productivity, integrate business
processes, reduce costs, and create a more agile enterprise.
3. 3
Founded in 1997
Public, NASDAQ: PRFT
2014 revenue $456.7 million
Major market locations:
Allentown, Atlanta, Ann Arbor, Boston, Charlotte,
Chattanooga, Chicago, Cincinnati, Columbus,
Dallas, Denver, Detroit, Fairfax, Houston,
Indianapolis, Lafayette, Milwaukee, Minneapolis,
New York City, Northern California, Oxford (UK),
Southern California, St. Louis, Toronto
Global delivery centers in China and India
>2,600 colleagues
Dedicated solution practices
~90% repeat business rate
Alliance partnerships with major technology vendors
Multiple vendor/industry technology and growth awards
PERFICIENT PROFILE
4. 4
Business Process
Management
Customer Relationship
Management
Enterprise Performance
Management
Enterprise Information
Solutions
Enterprise Resource
Planning
Experience Design
Portal / Collaboration
Content Management
Information Management
Mobile
BUSINESSSOLUTIONS
50+PARTNERS
Safety / PV
Clinical Data
Management
Electronic Data Capture
Medical Coding
Data Warehousing
Data Analytics
Clinical Trial
Management
Precision Medicine
CLINICAL/HEALTHCAREIT
Consulting
Implementation
Integration
Migration
Upgrade
Managed Services
Private Cloud Hosting
Validation
Study Setup
Project Management
Application Development
Software Licensing
Application Support
Staff Augmentation
Training
SERVICES
OUR SOLUTIONS PORTFOLIO
5. 5
WELCOME & INTRODUCTION
Extensive clinical trial software implementation experience
• 20 years of experience in the life sciences industry
• Extensive experience with Oracle’s clinical data warehousing, analytics, and
precision medicine applications
• Expertise in improving and standardizing business processes to support best
practices and the ever-changing regulatory requirements
Kathryn Hanson
Solutions Architect, Life Sciences
Perficient
6. 6
WHAT’S TRENDING IN TECHNOLOGY?
• Big Data
• How do we acquire data from other sources?
• How do we manage high volume data?
• Data analytics
• What conclusions can we draw from the raw data?
• Data privacy and security
• How do we control who has access to our data?
7. 7
WHICH ISSUES ARE WE FACING?
The pharmaceutical industry has many of these same technology issues:
• How do we acquire data from external sources?
• How do we manage high volume data?
• How can we present that data for analysis?
• How can we secure our data against unauthorized access?
How can we acquire and manage the data we
receive from many different sources?
8. 8
WHAT WOULD WE LIKE IN A SOLUTION?
Hands off and automated
– After the initial setup only routine monitoring is needed
Flexible
• Adapts as data changes over time
• Handles multiple file types
• Can start other jobs as needed when the load is complete
Reliable and secure
Efficient and performs well on high-volume data
Simple to implement
9. 9
THE SOLUTION: AUTOMATED FILE LOAD
Quality
Assurance
Secure
Staging Area
File
Load
Utility
Warehouse
Study
Staged data
Transformed data
Analysis programs
Data file 1 2
3
4
10. 10
WHAT DO I NEED TO GET STARTED?
• A repository to receive and manage the clinical data
(in this presentation that’s the Oracle Life Sciences Warehouse)
• Resources to set up and monitor the system
• Secure directories to receive and process data files
• Utility software to process the files and load the data into the repository
• Scheduling software to control when, where, and how jobs run
• A way to register new data sources to the utility
11. 11
HOW DO I BEGIN LOADING DATA?
• Work with the vendor to
• Understand the file format, data structures, file naming conventions, etc.
• Provide secure access to the download area
• Receive a sample data file
• Register the new data source in the utility
• Set up the storage areas in the repository
• Test the new data source to verify it loads correctly into the repository
• Complete any other setup needed so authorized users can access the
data (for transformations, visualizations, etc.)
12. 12
SETTING UP THE DIRECTORIES
<root directory>
+
+
+
+
stagedir
processdir
rejectdir
scripts
successdir
+
—
The data file is dropped into this watched directory
The pre-processed files are moved here for final
processing and loading into the warehouse
The data file is moved here if the file load fails
The data file is moved here when the job finishes
successfully
Utility software is stored in this directory
13. 13
SETTING UP STUDY REGISTRATION
The first 3 attributes identify the study
and data type
These 3 attributes tell the utility where
to store the data in the repository
There are many options that control
how the data should be loaded
14. 14
NAMING CONVENTIONS FOR THE DATA FILE
File naming conventions ensure that the utility can identify
the registered study
CDISC01 – The study name
FULL – The type of data that will be loaded
DEV – Is this development, test, or production data?
201509211010 – A unique date and time stamp
15. 15
ADDING OTHER PROCESSING OPTIONS
The utility lets you specify how you want to handle the data:
• Running another job after the data loads
• Handling blinded data
• Sending out notifications
• Processing large files
• Managing changes in data structures
• Identifying file formats for text files
16. 16
SETTING UP THE REPOSITORY
The data will be loaded into the work area under which
you registered the study.
Warehouse
Study
Staged data
Transformed data
Analysis programs
17. 17
WHAT THE UTILITY DOES
Your vendor has uploaded a data file; now the utility…
1. Detects the file and runs a set of preprocessing checks
2. Extracts all the datasets (text files, etc.)
3. Extracts the metadata for each dataset
4. Verifies the metadata for each dataset. If the dataset has been
loaded before, either
• The new metadata must match that in the previous load
OR
• The study allows compatible metadata updates
18. 18
WHAT THE UTILITY DOES
The utility continues if everything checks out by …
5. Creating a load set for each dataset in the data file
(if one doesn’t already exist)
6. Updating the repository metadata, if required
7. Starting each of the load sets
8. Monitoring the running jobs for errors
9. Sending notifications to users, as required, when all
the the jobs are done
20. 20
THE RESULTS
…and when all the jobs are done the data is loaded and
available in the repository.
21. 21
WHAT HAPPENS IF THE METADATA CHANGES?
• One of the options you can choose is
whether or not to allow changes to a
table’s metadata
• If that flag is “Y”, the utility will accept
and process compatible changes
• For example, you need to add 2 new
columns to the table…
22. 22
WHAT HAPPENS IF THE METADATA CHANGES?
The table in the repository now has those two additional columns:
23. 23
DOES THE UTILITY MEET OUR GOALS?
Automated and hands off
Flexible
Efficient
Simple to implement