The presentation was delivered during the 1st International Conference on Health Information Science (HIS 2012) on April 9th, 2012 in Beijing, China.
Abstract:
In cytomics bookkeeping of the data generated during lab experiments is crucial. The current approach in cytomics is to conduct High-Throughput Screening (HTS) experiments so that cells can be tested under many different experimental conditions. Given the large amount of different conditions and the readout of the conditions through images, it is clear that the HTS approach requires a proper data management system to reduce the time needed for experiments and the chance of man-made errors. As different types of data exist, the experimental conditions need to be linked to the images produced by the HTS experiments with their metadata and the results of further analysis. Moreover, HTS experiments never stand by themselves, as more experiments are lined up, the amount of data and computations needed to analyze these increases rapidly. To that end cytomic experiments call for automated and systematic solutions that provide convenient and robust features for scientists to manage and analyze their data. In this paper, we propose a platform for managing and analyzing HTS images resulting from cytomics screens taking the automated HTS workflow as a starting point. This platform seamlessly integrates the whole HTS workflow into a single system. The platform relies on a modern relational database system to store user data and process user requests, while providing a convenient web interface to end-users. By implementing this platform, the overall workload of HTS experiments, from experiment design to data analysis, is reduced significantly. Additionally, the platform provides the potential for data integration to accomplish genotype-to-phenotype modeling studies.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and Management in High-Throughput Screening Experiments
1. Automation in Cytomics: A Modern
RDBMS Based Platform for Image
Analysis and Management in High-
Throughput Screening Experiments
HIS 2012 : The 1st. International Conference on Health Information
Science
Enrique Larios, Kuan Yan, Fons J. Verbeek (LIACS, Leiden University, The Netherlands)
Ying Zhang, Fabian Groffen (CWI, Amsterdam, The Netherlands)
Zi Di, Sylvia LeDévédec (Department of Toxicology, Leiden University, The Netherlands)
Centrum Wiskunde
& Informatica
Leiden University. The university to discover.
2. Introduction
§ The current approach in Time Lapse
Static
Sequence
cytomics is to conduct Images
Images
High Throughput
Screening (HTS)
experiments so that cells 2D 2D + T
can be tested under
different experimental
conditions.
§ In HTS experiments, as 3D 3D + T
more experiments are lined
up, the amount of data and Cytomics
computation needed to
analyze these increases
rapidly.
2
Leiden University. The university to discover.
3. Workflow in HTS experiments
Experiment Plate design
planning
Scientist
Scientist
3
Leiden University. The university to discover.
4. Workflow in HTS experiments
HTS process Storage
Setting up
the tiff files
microscope
BD Pathway
ics & ids
files
tiff files
Nikon 1
nd2 files
ics & ids
Scientist files
tiff files
File Server
Nikon 2
nd2 files
tiff files
nd2 files
Nikon 3
4
Leiden University. The university to discover.
5. Workflow in HTS experiments
Image
Analysis
Cell masks and motion trajectories
Bioinformaticians High-throughput image analysis
provides an automated quantification
of dynamic cell behavior in both
cellular level and structural level.
Data
Analysis
Scientist
By employing pattern recognition
theorem, the system provides objective
Data Map statistical conclusions to support
biological hypothesizes.
5
Leiden University. The university to discover.
6. Problems identified
Software Component
Duration of Data
tools used s used in
the (images and
are not the
experiments metadata) is
suitable for experiment
can take not linked
the work are not
months. properly.
performed. integrated.
There is no platform that can facilitate Scientists to learn from the
experience. Lack of a Knowledge Discovery System.
6
Leiden University. The university to discover.
7. Objectives
Develop an
integrated platform
to automate data
Design a management and
database to image analysis of
store almost all cytomic HTS
data produced experiments.
Establish an and used in
automated the HTS
workflow experiments.
system of the
HTS
experiments. 7
Leiden University. The university to discover.
8. Workflow of the HTS System
8
Leiden University. The university to discover.
9. Which data should be stored in
the database?
Experiment details
Users
Plates & Wells
HTS Database
Results of Data Results of Image
Analysis Analysis
Raw images 9
Leiden University. The university to discover.
10. Description of the System
Architecture
GUI layer HTS Analysis GUI
Plate Design Image Analysis Pattern recognition tools
API API API
Web
Services
layer
Glassfish - IIS
Data
storage / Scientific Super
Processin
g layer
Computer
10
Leiden University. The university to discover.
11. } + easy to add/modify a record } + only need to read in relevant data
} - might read in unnecessary } - tuple writes require multiple
data. accesses.
} Suitable for read-mostly, read-intensive, large data
repositories.
} MonetDB is a open-source database system for high-
performance applications in data mining, OLAP, GIS, XML
Query, text and multimedia retrieval. MonetDB often achieves
a 10-fold raw speed improvement for SQL and XQuery over
competitor RDBMSs. by Peter Boncz (CWI)
11
Leiden University. The university to discover.
12. ROW STORAGE COLUMN STORAGE
STRIPE STRIPE
by Peter Boncz (CWI)
12
Leiden University. The university to discover.
13. HTS System Database Schema
13
Leiden University. The university to discover.
14. How data is organized
in the schema?
Users
Experiment details
14
Leiden University. The university to discover.
15. How data is organized
in the schema?
Plates & Wells
15
Leiden University. The university to discover.
16. How data is organized in the
schema?
Raw images
16
Leiden University. The university to discover.
17. How data is organized in the
schema?
Results of Image
Analysis
Results of Data
Analysis
17
Leiden University. The university to discover.
18. How the platform works?
Authentication
Decision New idea Web User
making interface
(GUI)
Users
User Roles
System § Audit, maintenance of
Administrator users, roles, conditions.
§ Create Projects,
Administrator Experiments, Plates,
Upload the images from
the microscope, and Plate layout design (GUI)
perform data and image
§ analysis. images from
Upload the § Every user need to log in in the
Expert User platform and is administrator of their
the microscope, and
perform data and image own Projects-Experiments.
§ analysis.data and image
Perform § A user can also grant to other users a
Analyst User analysis and link the specific role (Administrator, Expert
results to the experiment. User or Analyst user) and create a
collaborative environment.
18
Leiden University. The university to discover.
19. How the platform works?
Web User
interface
(GUI)
Administration option:
• Create / Edit / Delete users
• Assign Roles to a user
19
Leiden University. The university to discover.
20. How the platform works?
Web User
interface
(GUI)
Project option:
• Create, Edit, Delete Projects
• Visualize Project’s metadata
20
Leiden University. The university to discover.
21. How the platform works?
Web User
interface
(GUI)
Experiments option:
• Create, Edit, Delete Experiments
• Visualize Experiment’s metadata
21
Leiden University. The university to discover.
22. How the platform works?
Web User
interface
(GUI)
Conditions option:
• Create, Edit, Delete, Import
Coating parameters, Cell line
tissues, Compounds, siRNA,
and Antibodies/reagents.
22
Leiden University. The university to discover.
23. How the platform works?
Web User
interface
(GUI)
Plates option:
• Create, Edit, Delete Plates
• Visualize Plate’s metadata.
23
Leiden University. The university to discover.
24. How the platform works?
Web User
interface
(GUI)
Reports option:
• Perform custom queries
through different datasets.
• Visualize predefined reports
about Projects/
Experiments/ Plates/ Well
metadata.
24
Leiden University. The university to discover.
25. How the platform works?
Web User
interface
(GUI)
Analysis option:
• Invoke the Data and
Image Analysis APIs .
• Visualize the results of the
data and image analysis.
25
Leiden University. The university to discover.
26. How the platform works?
Steps in the new Workflow
System
§ Create a Project
§ Create an Experiment
§ Design the layout of a culture
plate (4x6 wells, 6x8 wells , 8x12
wells, etc.).
§ Assign the experimental
conditions applied to the wells
(drag and drop).
§ Allow access to your project to
other users assigning them a
specific Wet lab experiment
role.
using the plate design
Time-Lapse
Image
Sequence /
HTS Static
Images
26
Leiden University. The university to discover.
27. How the platform works?
Upload HTS
Images
HTS
Time-Lapse § The files generated by the
Image
Sequence / microscope have a standard
Static
Images
named convention.
§ Through the GUI, the images are
uploaded to the platform.
Raw Images § The platform links the imported
images to the experiment and the
q 2D (XY): [1] Frame [1] Image
[1..n] Channels
plate designed previously.
§ The platform also reads from the
q 2D+T (XY+T): [1] Video [1..n] header of the files information
Frames [1] Image [1..n] Channels
associated to the microscope
q 3D (XYZ): [1] Frame [1..n] Sections settings.
[1] Image [1..n] Channels § According to the microscope used,
q 3D+T (XYZ+T): [1] Video [1..n] the image’s metadata has a
Frame [1..n] Sections [1] Image particular structure that is also
[1..n] Channel stored in the database.
27
Leiden University. The university to discover.
28. How the platform works?
Image
Analysis
Images
uploaded § Through the GUI it is
possible to invoke the
API for the image
analysis process.
§ As a result of the image
analysis, auxiliary
images are generated:
binary masks or
trajectories.
§ These auxiliary images
are linked to the plates
– wells and raw images
in the GUI.
Auxiliary
images
Binary mask Trajectories
28
Leiden University. The university to discover.
29. How the platform works?
Data
Analysis
Images
Analysis
§ Measurements
extracted from the
image analysis are
further analyzed using
Patter recognition tools.
§ Through the GUI it is
possible to invoke the
Binary mask Trajectories
API for the data
analysis process.
§ As a result, it is
generated CSV files
which are stored in the
database in order to
have later graphical
representations.
Example
Results
Cell migration analysis Structure dynamic
analysis
29
Leiden University. The university to discover.
30. Conclusions
} Using this platform for image analysis and management in HTS it is
possible to avoid typical man-made errors in the experiments.
} Using this system the time invested in post experiment analysis has
been reduced considerably. Now takes less than a week to accomplish the
data analysis that previously easily took more than a month with commercial
software, or a year by manual observation.
} The platform allows end-users to perform high-profile cytomics with a
minimum level of a prior experience on image analysis and machine
learning.
} The system uses web services, therefore, the framework is very flexible
as it allows the connection to other web services.
} The platform can eventually evolve into a sophisticated interdisciplinary
platform for cytomics.
} Having the HTS information comprehensively organized in a
sophisticated and scalable database is a fertile ground for knowledge
discovery.
30
Leiden University. The university to discover.