SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
transmart-data
Management of tranSMART’s Environment

Gustavo Lopes
The Hyve B.V.

November 6, 2013

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

1 / 22
Outline

1

Problems
Reproductibility
Versioning Control
Automation
Why?!
tranSMART Foundation’s
Version

2

3

Gustavo Lopes (The Hyve B.V.)

Solution: transmart-data
General Description
Configuration
Database Schema Management
Seed Data
ETL
RModules Analyses’
Rserve
Solr
transmartApp Configuration
Limitations

transmart-data

November 6, 2013

2 / 22
Typical Branch Distribution

Grails Code

Database

transmartApp (without full
repo history, always with
wrong ancestry information
⇒ merging quite difficult)
RModules (if you’re lucky),
but analyses definitions in
DB not provided

SQL scripts on top of GPL
1.0 dump or later. Probably
insufficent/won’t apply
Stored procedures for ETL.
Overlapping definitions with
yours, but no history ⇒
merging quite difficult
Manual fixups always
required (even if just
permissions/synonyms)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

3 / 22
Typical Branch Distribution (II)

ETL

Solr/Rserve/Configuration
High variablity in strategies
Instructions/sample data
rarely provided

Solr
schemas/dataimport.xml
perpetually forgotten

Kettle scripts are
problematic

Idem for information on R
packages
Sample configuration rarely
provided

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

4 / 22
Versioning Control

Version control used ONLY for Grails Code. . .
But often squashed and with wrong ancestor information.
Forget about database, Solr, most of ETL.

Result
Merges are very difficult.
Changes cannot easily be tracked
Changes’ wherefores are unknown
Regressions are introduced (no conflicts)
Collaboration is based on e-mail attachments

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

5 / 22
Automation
Even with all the pieces. . .
Setting up a new branch takes days;
weeks for non-basic functionality
No reproductibility in the process!

Result
Devs driven away from fully local
environment (too much work)
Robust environment for CI passed over
(too much work)
Bugs cannot be reliably reproduced (see
also: no consistent usage of VCS)
Time wasted with deployment specific
mistakes/inconsistencies
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

6 / 22
Why?!

The “source code” for a work means
the preferred form of the work for
making modifications to it.
— GPL v3, section 1

Is everyone holding back “source code”?
More likely explanation:
No appropriate tooling being used
Guillaume Duchenne (public domain)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

7 / 22
Situation for tranSMART 1.1
The situation is much better!
Some problems remain, though.

The Good
Create/populate DB
is easy
Most stuff is
versioned
CI for builds
Image available
Public issue tracking

Gustavo Lopes (The Hyve B.V.)

The Bad
No Oracle support
Changes to DB scripts/seed data are
ad hoc (lax structure)
No mechanism to support/compare
schemas with other branches
R analyses are json blobs in TSVs
No VCS for Solr or Rserve/images’ setup
Set up Sol/Rserve is time-consuming
Population of DB with sample data is still
time-consuming
Config changes required for dev

transmart-data

November 6, 2013

8 / 22
Description of transmart-data

We developed transmart-data to address most of these problems:
transmart-data is a set of
scripts for managing tranSMART’s environment and
certain application data (e.g. Solr schemas, DDL, seed data), which
is used by scripts and sometimes generated by them.
It has a makefile based interface.

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

9 / 22
transmart-data: Purposes

Purposes of transmart-data:
1

Allow setting up a complete dev environment quickly (< 30 min)

2

Bring versioning to the database schema and Solr files

3

Setup Solr runtime

4

Invoke ETL pipelines

5

Setup Rserve

Target audience: Programmers

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

10 / 22
transmart-data: Non-purposes

Non-purposes of transmart-data:
1

Setup a production environment
(some components can be used)

2

New users evaluating tranSMART
(use an pre-built image)

3

Building transmartApp or its plugin dependencies
(build them yourself or use artifacts from Bamboo/Nexus)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

11 / 22
Configuration
Environment variable based configuration
cp v a r s . s a m p l e v a r s
vim v a r s #e d i t f i l e
source v a r s

Gustavo Lopes (The Hyve B.V.)

PGHOST=/tmp
PGPORT=5432
PGDATABASE=t r a n s m a r t
PGUSER=$USER
PGPASSWORD=
TABLESPACES=$HOME/ pg / t a b l e s p a c e s /
PGSQL BIN=$HOME/ pg / b i n /
ORAHOST=l o c a l h o s t
ORAPORT=1521
ORASID=o r c l
ORAUSER=” s y s a s s y s d b a ”
ORAPASSWORD=mypassword
ORACLE MANAGE TABLESPACES=0
#c o n t i n u e s . . .

transmart-data

November 6, 2013

12 / 22
Database Schema Management
Support for Oracle and Postgres

Oracle

Postgres
Uses pg dump(all)

Queries dba * tables

Parses the dump files

Dumps DDL w/
DBMS METADATA

#Dump
make −C p o s t g r e s / d d l dump
make −C p o s t g r e s / d d l /
GLOBAL e x t e n s i o n s . s q l
roles . sql

#Dump
make −C o r a c l e / d d l dump
#Load
make o r a c l e

#Load
make −C p o s t g r e s / d d l l o a d

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

13 / 22
Seed Data
Only Postgres for now
#Dump
#T a b l e s t o dump i n p o s t g r e s / d a t a/<schema> l s t
make −C p o s t g r e s / d a t a dump
make −C p o s t g r e s /common m i n i m i z e d i f f s
#Load
make −C p o s t g r e s / d a t a l o a d
#Load DDL and d a t a
make p o s t g r e s

Only for basic stuff with no ETL!
Pretty fast (DDL+data loaded in 10s)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

14 / 22
ETL (I)
Unified interface for ETL

Prepare dataset

Load dataset

1

Prepare ETL-specific source
files

2

Prepare file with ETL
specific params

3

Upload dataset to CDN
(optional)

For each new ETL pipeline,
support must be added

Gustavo Lopes (The Hyve B.V.)

make −C s a m p l e s /{ o r a c l e ,
p o s t g r e s } l o a d <type>
<s t u d y i d >
#Example :
make −C s a m p l e s / p o s t g r e s
load clinical GSE8581

Everything is automated!

transmart-data

November 6, 2013

15 / 22
ETL (II)
Show TM CZ logs:
$ make -C samples/postgres showdblog
make: Entering directory `/home/gustavo/repos/transmart-data/samples/postgres'
groovy -cp postgresql-9.2-1003.jdbc4.jar ../common/dump_audit.groovy postgres `tput cols`
Procedure
| Description
| Stat |
Recs |
Date | Time spent
-----------------------------------------------------------------------------------------------------alysis_data.kjb | GSE8581
| DONE |
1 | 2013-10-15 13:23:22. |
0.0
.load_ext_files | Drop null samples rows
| Done |
0 | 2013-10-15 13:23:23. |
0.450529
.load_ext_files | Drop null cohorts rows
| Done |
0 | 2013-10-15 13:23:23. |
0.043125
.load_ext_files | Drop null analysis rows
| Done |
0 | 2013-10-15 13:23:23. |
0.066097
.load_ext_files | Read analysis file
| Done |
1 | 2013-10-15 13:23:23. |
0.048055
.load_ext_files | Read cohort file
| Done |
3 | 2013-10-15 13:23:23. |
0.085535
.load_ext_files | Read samples file
| Done |
57 | 2013-10-15 13:23:23. |
0.049993
.load_ext_files | Write rwg_cohorts_ext
| Done |
3 | 2013-10-15 13:23:23. |
0.099452
.load_ext_files | Write rwg_analysis_ext
| Done |
1 | 2013-10-15 13:23:23. |
0.047331
.load_ext_files | Write rwg_samples_ext
| Done |
57 | 2013-10-15 13:23:23. |
0.044567
.load_ext_files | Read analysis data file
| Done | 436898 | 2013-10-15 13:23:27. |
3.911089
.load_ext_files | Drop null analysis_data rows
| Done | 382223 | 2013-10-15 13:23:27. |
0.067765
.load_ext_files | Write rwg_analysis_data_ext
| Done | 54675 | 2013-10-15 13:23:28. |
1.332746
IMPORT_FROM_EXT | Start FUNCTION
| Done |
0 | 2013-10-15 13:23:29. |
0.117319
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
0.035825
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
6.26E-4
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
4.84E-4
IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_An | Done |
1 | 2013-10-15 13:23:29. |
0.001079
IMPORT_FROM_EXT | Update bio_assay_analysis_id on | Done |
0 | 2013-10-15 13:23:29. |
0.030793
IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_Co | Done |
3 | 2013-10-15 13:23:29. |
8.28E-4
... (continues)

Errors are also shown (if any)
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

16 / 22
RModules Analyses’(tsApp-DB)
Situation in transmartApp-DB:
u p d a t e searchapp . plugin_module
s e t params = ' {" id ":" survivalAnalysis " ," converter ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y
|| Common / dataBuilders . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common /
E xt ra ct Concepts . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / collapsingData . R ' ')
" ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / BinData . R ' ') " ," source ( ' ' ||
P L U G I N S C R I P T D I R E C T O R Y || Survival / Bui ldS urv iva lDa ta . R ' ') " ," tSurvivalData . build ( n 
tinput . dataFile = ' ' || T E M P F O L D E RD I R E C T O R Y || Clinical / clinical . i2b2trans ' ' , n 
tconcept . time = ' ' || TIME || ' ' , n  tconcept . category = ' ' || CATEGORY || ' ' , n  tconcept .
eventYes = ' ' || EVENTYES || ' ' , n  tbinning . enabled = ' ' || BINNING || ' ' , n  tbinning . bins = ' ' ||
NUMBERBINS || ' ' , n  tbinning . type = ' ' || BINNINGTYPE || ' ' , n  tbinning . manual = ' ' ||
BINNINGMANUAL || ' ' , n  tbinning . binrangestring = ' ' || B I NN IN G RA NG E ST R IN G || ' ' , n  tbinning
. variabletype = ' ' || B IN N I N G V A R I AB L E T Y P E || ' ' , n  tinput . gexFile = ' ' ||
T E M P F O L D E R D I R E CT O R Y || mRNA / Processed_Data / mRNA . trans ' ' , n  tinput . snpFile = ' ' ||
T E M P F O L D E R D I R E CT O R Y || SNP / snp . trans ' ' , n  tconcept . category . type = ' ' || TYPEDEP || ' ' , n
 tgenes . category = ' ' || GENESDEP || ' ' , n  tgenes . category . aggregate = ' ' || AGGREGATEDEP
|| ' ' , n  tsample . category = ' ' || SAMPLEDEP || ' ' , n  ttime . category = ' ' || TIMEPOINTSDEP
|| ' ' , n  tsnptype . category = ' ' || SNPTYPEDEP || ' ')  n  t "]} ," name ":" Survival Analysis " ,"
d a t a F i l e I n p u t M a p p i n g ":{" CLINICAL . TXT ":" TRUE " ," SNP . TXT ":" snpData " ," MRNA_DETAILED . TXT
":" mrnaData "} ," dataTypes ":{" subset1 ":[" CLINICAL . TXT "]} ," pivotData ": false ," view ":"
S u r v i v a lAnalysis " ," processor ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival /
C o x R e g r e s s i o n L oa d e r . r ' ') " ," CoxRegression . loader ( input . filename = ' ' outputfile ' ') " ,"
source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / S u r v i v a l Cu r v e L o a d e r . r ' ') " ," SurvivalCurve
. loader ( input . filename = ' ' outputfile ' ' , concept . time = ' ' || TIME || ' ') "]} ," renderer ":{"
GSP ":"/ survivalAnalysis / s u r v i v a l A n a l y s i s O u t p u t "} ,... ( goes on ) '
where module_name = ' p gs u rv iv a lA n al ys i s ';

Not very nice...
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

17 / 22
RModules Analyses’ (transmart-data)
In transmart-data:
One file per analysis
Files can be generated from DB data
Sanely formatted
But we really want to remove this from the DB!
array (
'id' => 'heatmap',
'name' => 'Heatmap',
'dataTypes' =>
array (
'subset1' =>
array (
0 => 'CLINICAL.TXT',
),
),
'dataFileInputMapping' =>
array (
'CLINICAL.TXT' => 'FALSE',
'SNP.TXT' => 'snpData',
'MRNA_DETAILED.TXT' => 'TRUE',
),
'pivotData' => false,
...

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

18 / 22
Rserve
Targets for Rserve:
Download/build R
Install R packages
Start Rserve
Install System V init
script for Rserve
Idem for systemd

cd R
make - j8 bin / root / R
# some packages don ' t support
concurrent builds
make install_packages
make start_Rserve
make start_Rserve . dbg
TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ in i t
TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ un i t

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

19 / 22
Solr
Solr (4.5.0) automatically
downloaded and configured
Solr cores automatically created
User only needs to create a schema
file and dataconfig.xml
# setup & solr ( psql )
make start
# just c o n f i g u r e
make solr_home
make < core > _full_import
make < core > _delta_import
make clean_cores
ORACLE =1 make start

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

20 / 22
transmartApp Configuration

Out-of-tree config management:
Targets for installing files
Zero configuration for
dev!
Customization allowed
without touching the target
files
Only supports ours branches
But a lot of configuration
should be in-tree instead!

Gustavo Lopes (The Hyve B.V.)

# install everything
# previous files are backed
up
make install
# just one file :
make install_Config . groovy
make install_ Bu il dC on fi g .
groovy
make install _D at aS ou rce .
groovy
# costumizations in :
# Config - extra . php
# BuildConfig . groovy (
limited )

transmart-data

November 6, 2013

21 / 22
Current Limitations

DB upgrades not handled
Only a few ETL pipelines
supported
Oracle support is behind
PostgreSQL
Tooling shares repository
with application data
© Joost J. Bakker, CC BY 2.0

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

22 / 22

Contenu connexe

Tendances

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on HadoopChung-Tsai Su
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室台灣資料科學年會
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapEDB
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxPayal Singh
 

Tendances (18)

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Hadoop
HadoopHadoop
Hadoop
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
MapReduce@DirectI
MapReduce@DirectIMapReduce@DirectI
MapReduce@DirectI
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheap
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
 

Similaire à tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data

Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLJim Mlodgenski
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBSeveralnines
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsMaxim Grinev
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Robert Treat
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Onyxfish
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Anton Chuvakin
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationKristofferson A
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearnJiang Jun
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
Polyglot persistence with Spring Data
Polyglot persistence with Spring DataPolyglot persistence with Spring Data
Polyglot persistence with Spring DataCorneil du Plessis
 

Similaire à tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data (20)

vega
vegavega
vega
 
Pig latin
Pig latinPig latin
Pig latin
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
PPT
PPTPPT
PPT
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Handout3o
Handout3oHandout3o
Handout3o
 
Oracle GoldenGate
Oracle GoldenGateOracle GoldenGate
Oracle GoldenGate
 
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Polyglot persistence with Spring Data
Polyglot persistence with Spring DataPolyglot persistence with Spring Data
Polyglot persistence with Spring Data
 

Plus de David Peyruc

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMARTtranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMARTDavid Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker DiscoverytranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker DiscoveryDavid Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CatDavid Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...David Peyruc
 

Plus de David Peyruc (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
 
Community
CommunityCommunity
Community
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMARTtranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker DiscoverytranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data

  • 1. transmart-data Management of tranSMART’s Environment Gustavo Lopes The Hyve B.V. November 6, 2013 Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 1 / 22
  • 2. Outline 1 Problems Reproductibility Versioning Control Automation Why?! tranSMART Foundation’s Version 2 3 Gustavo Lopes (The Hyve B.V.) Solution: transmart-data General Description Configuration Database Schema Management Seed Data ETL RModules Analyses’ Rserve Solr transmartApp Configuration Limitations transmart-data November 6, 2013 2 / 22
  • 3. Typical Branch Distribution Grails Code Database transmartApp (without full repo history, always with wrong ancestry information ⇒ merging quite difficult) RModules (if you’re lucky), but analyses definitions in DB not provided SQL scripts on top of GPL 1.0 dump or later. Probably insufficent/won’t apply Stored procedures for ETL. Overlapping definitions with yours, but no history ⇒ merging quite difficult Manual fixups always required (even if just permissions/synonyms) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 3 / 22
  • 4. Typical Branch Distribution (II) ETL Solr/Rserve/Configuration High variablity in strategies Instructions/sample data rarely provided Solr schemas/dataimport.xml perpetually forgotten Kettle scripts are problematic Idem for information on R packages Sample configuration rarely provided Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 4 / 22
  • 5. Versioning Control Version control used ONLY for Grails Code. . . But often squashed and with wrong ancestor information. Forget about database, Solr, most of ETL. Result Merges are very difficult. Changes cannot easily be tracked Changes’ wherefores are unknown Regressions are introduced (no conflicts) Collaboration is based on e-mail attachments Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 5 / 22
  • 6. Automation Even with all the pieces. . . Setting up a new branch takes days; weeks for non-basic functionality No reproductibility in the process! Result Devs driven away from fully local environment (too much work) Robust environment for CI passed over (too much work) Bugs cannot be reliably reproduced (see also: no consistent usage of VCS) Time wasted with deployment specific mistakes/inconsistencies Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 6 / 22
  • 7. Why?! The “source code” for a work means the preferred form of the work for making modifications to it. — GPL v3, section 1 Is everyone holding back “source code”? More likely explanation: No appropriate tooling being used Guillaume Duchenne (public domain) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 7 / 22
  • 8. Situation for tranSMART 1.1 The situation is much better! Some problems remain, though. The Good Create/populate DB is easy Most stuff is versioned CI for builds Image available Public issue tracking Gustavo Lopes (The Hyve B.V.) The Bad No Oracle support Changes to DB scripts/seed data are ad hoc (lax structure) No mechanism to support/compare schemas with other branches R analyses are json blobs in TSVs No VCS for Solr or Rserve/images’ setup Set up Sol/Rserve is time-consuming Population of DB with sample data is still time-consuming Config changes required for dev transmart-data November 6, 2013 8 / 22
  • 9. Description of transmart-data We developed transmart-data to address most of these problems: transmart-data is a set of scripts for managing tranSMART’s environment and certain application data (e.g. Solr schemas, DDL, seed data), which is used by scripts and sometimes generated by them. It has a makefile based interface. Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 9 / 22
  • 10. transmart-data: Purposes Purposes of transmart-data: 1 Allow setting up a complete dev environment quickly (< 30 min) 2 Bring versioning to the database schema and Solr files 3 Setup Solr runtime 4 Invoke ETL pipelines 5 Setup Rserve Target audience: Programmers Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 10 / 22
  • 11. transmart-data: Non-purposes Non-purposes of transmart-data: 1 Setup a production environment (some components can be used) 2 New users evaluating tranSMART (use an pre-built image) 3 Building transmartApp or its plugin dependencies (build them yourself or use artifacts from Bamboo/Nexus) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 11 / 22
  • 12. Configuration Environment variable based configuration cp v a r s . s a m p l e v a r s vim v a r s #e d i t f i l e source v a r s Gustavo Lopes (The Hyve B.V.) PGHOST=/tmp PGPORT=5432 PGDATABASE=t r a n s m a r t PGUSER=$USER PGPASSWORD= TABLESPACES=$HOME/ pg / t a b l e s p a c e s / PGSQL BIN=$HOME/ pg / b i n / ORAHOST=l o c a l h o s t ORAPORT=1521 ORASID=o r c l ORAUSER=” s y s a s s y s d b a ” ORAPASSWORD=mypassword ORACLE MANAGE TABLESPACES=0 #c o n t i n u e s . . . transmart-data November 6, 2013 12 / 22
  • 13. Database Schema Management Support for Oracle and Postgres Oracle Postgres Uses pg dump(all) Queries dba * tables Parses the dump files Dumps DDL w/ DBMS METADATA #Dump make −C p o s t g r e s / d d l dump make −C p o s t g r e s / d d l / GLOBAL e x t e n s i o n s . s q l roles . sql #Dump make −C o r a c l e / d d l dump #Load make o r a c l e #Load make −C p o s t g r e s / d d l l o a d Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 13 / 22
  • 14. Seed Data Only Postgres for now #Dump #T a b l e s t o dump i n p o s t g r e s / d a t a/<schema> l s t make −C p o s t g r e s / d a t a dump make −C p o s t g r e s /common m i n i m i z e d i f f s #Load make −C p o s t g r e s / d a t a l o a d #Load DDL and d a t a make p o s t g r e s Only for basic stuff with no ETL! Pretty fast (DDL+data loaded in 10s) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 14 / 22
  • 15. ETL (I) Unified interface for ETL Prepare dataset Load dataset 1 Prepare ETL-specific source files 2 Prepare file with ETL specific params 3 Upload dataset to CDN (optional) For each new ETL pipeline, support must be added Gustavo Lopes (The Hyve B.V.) make −C s a m p l e s /{ o r a c l e , p o s t g r e s } l o a d <type> <s t u d y i d > #Example : make −C s a m p l e s / p o s t g r e s load clinical GSE8581 Everything is automated! transmart-data November 6, 2013 15 / 22
  • 16. ETL (II) Show TM CZ logs: $ make -C samples/postgres showdblog make: Entering directory `/home/gustavo/repos/transmart-data/samples/postgres' groovy -cp postgresql-9.2-1003.jdbc4.jar ../common/dump_audit.groovy postgres `tput cols` Procedure | Description | Stat | Recs | Date | Time spent -----------------------------------------------------------------------------------------------------alysis_data.kjb | GSE8581 | DONE | 1 | 2013-10-15 13:23:22. | 0.0 .load_ext_files | Drop null samples rows | Done | 0 | 2013-10-15 13:23:23. | 0.450529 .load_ext_files | Drop null cohorts rows | Done | 0 | 2013-10-15 13:23:23. | 0.043125 .load_ext_files | Drop null analysis rows | Done | 0 | 2013-10-15 13:23:23. | 0.066097 .load_ext_files | Read analysis file | Done | 1 | 2013-10-15 13:23:23. | 0.048055 .load_ext_files | Read cohort file | Done | 3 | 2013-10-15 13:23:23. | 0.085535 .load_ext_files | Read samples file | Done | 57 | 2013-10-15 13:23:23. | 0.049993 .load_ext_files | Write rwg_cohorts_ext | Done | 3 | 2013-10-15 13:23:23. | 0.099452 .load_ext_files | Write rwg_analysis_ext | Done | 1 | 2013-10-15 13:23:23. | 0.047331 .load_ext_files | Write rwg_samples_ext | Done | 57 | 2013-10-15 13:23:23. | 0.044567 .load_ext_files | Read analysis data file | Done | 436898 | 2013-10-15 13:23:27. | 3.911089 .load_ext_files | Drop null analysis_data rows | Done | 382223 | 2013-10-15 13:23:27. | 0.067765 .load_ext_files | Write rwg_analysis_data_ext | Done | 54675 | 2013-10-15 13:23:28. | 1.332746 IMPORT_FROM_EXT | Start FUNCTION | Done | 0 | 2013-10-15 13:23:29. | 0.117319 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 0.035825 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 6.26E-4 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 4.84E-4 IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_An | Done | 1 | 2013-10-15 13:23:29. | 0.001079 IMPORT_FROM_EXT | Update bio_assay_analysis_id on | Done | 0 | 2013-10-15 13:23:29. | 0.030793 IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_Co | Done | 3 | 2013-10-15 13:23:29. | 8.28E-4 ... (continues) Errors are also shown (if any) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 16 / 22
  • 17. RModules Analyses’(tsApp-DB) Situation in transmartApp-DB: u p d a t e searchapp . plugin_module s e t params = ' {" id ":" survivalAnalysis " ," converter ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / dataBuilders . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / E xt ra ct Concepts . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / collapsingData . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / BinData . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / Bui ldS urv iva lDa ta . R ' ') " ," tSurvivalData . build ( n tinput . dataFile = ' ' || T E M P F O L D E RD I R E C T O R Y || Clinical / clinical . i2b2trans ' ' , n tconcept . time = ' ' || TIME || ' ' , n tconcept . category = ' ' || CATEGORY || ' ' , n tconcept . eventYes = ' ' || EVENTYES || ' ' , n tbinning . enabled = ' ' || BINNING || ' ' , n tbinning . bins = ' ' || NUMBERBINS || ' ' , n tbinning . type = ' ' || BINNINGTYPE || ' ' , n tbinning . manual = ' ' || BINNINGMANUAL || ' ' , n tbinning . binrangestring = ' ' || B I NN IN G RA NG E ST R IN G || ' ' , n tbinning . variabletype = ' ' || B IN N I N G V A R I AB L E T Y P E || ' ' , n tinput . gexFile = ' ' || T E M P F O L D E R D I R E CT O R Y || mRNA / Processed_Data / mRNA . trans ' ' , n tinput . snpFile = ' ' || T E M P F O L D E R D I R E CT O R Y || SNP / snp . trans ' ' , n tconcept . category . type = ' ' || TYPEDEP || ' ' , n tgenes . category = ' ' || GENESDEP || ' ' , n tgenes . category . aggregate = ' ' || AGGREGATEDEP || ' ' , n tsample . category = ' ' || SAMPLEDEP || ' ' , n ttime . category = ' ' || TIMEPOINTSDEP || ' ' , n tsnptype . category = ' ' || SNPTYPEDEP || ' ') n t "]} ," name ":" Survival Analysis " ," d a t a F i l e I n p u t M a p p i n g ":{" CLINICAL . TXT ":" TRUE " ," SNP . TXT ":" snpData " ," MRNA_DETAILED . TXT ":" mrnaData "} ," dataTypes ":{" subset1 ":[" CLINICAL . TXT "]} ," pivotData ": false ," view ":" S u r v i v a lAnalysis " ," processor ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / C o x R e g r e s s i o n L oa d e r . r ' ') " ," CoxRegression . loader ( input . filename = ' ' outputfile ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / S u r v i v a l Cu r v e L o a d e r . r ' ') " ," SurvivalCurve . loader ( input . filename = ' ' outputfile ' ' , concept . time = ' ' || TIME || ' ') "]} ," renderer ":{" GSP ":"/ survivalAnalysis / s u r v i v a l A n a l y s i s O u t p u t "} ,... ( goes on ) ' where module_name = ' p gs u rv iv a lA n al ys i s '; Not very nice... Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 17 / 22
  • 18. RModules Analyses’ (transmart-data) In transmart-data: One file per analysis Files can be generated from DB data Sanely formatted But we really want to remove this from the DB! array ( 'id' => 'heatmap', 'name' => 'Heatmap', 'dataTypes' => array ( 'subset1' => array ( 0 => 'CLINICAL.TXT', ), ), 'dataFileInputMapping' => array ( 'CLINICAL.TXT' => 'FALSE', 'SNP.TXT' => 'snpData', 'MRNA_DETAILED.TXT' => 'TRUE', ), 'pivotData' => false, ... Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 18 / 22
  • 19. Rserve Targets for Rserve: Download/build R Install R packages Start Rserve Install System V init script for Rserve Idem for systemd cd R make - j8 bin / root / R # some packages don ' t support concurrent builds make install_packages make start_Rserve make start_Rserve . dbg TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ in i t TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ un i t Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 19 / 22
  • 20. Solr Solr (4.5.0) automatically downloaded and configured Solr cores automatically created User only needs to create a schema file and dataconfig.xml # setup & solr ( psql ) make start # just c o n f i g u r e make solr_home make < core > _full_import make < core > _delta_import make clean_cores ORACLE =1 make start Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 20 / 22
  • 21. transmartApp Configuration Out-of-tree config management: Targets for installing files Zero configuration for dev! Customization allowed without touching the target files Only supports ours branches But a lot of configuration should be in-tree instead! Gustavo Lopes (The Hyve B.V.) # install everything # previous files are backed up make install # just one file : make install_Config . groovy make install_ Bu il dC on fi g . groovy make install _D at aS ou rce . groovy # costumizations in : # Config - extra . php # BuildConfig . groovy ( limited ) transmart-data November 6, 2013 21 / 22
  • 22. Current Limitations DB upgrades not handled Only a few ETL pipelines supported Oracle support is behind PostgreSQL Tooling shares repository with application data © Joost J. Bakker, CC BY 2.0 Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 22 / 22