AWS Community Day CPH - Three problems of Terraform
D02-NextGenSeq-MOLGENIS
1. Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands Morris Swertz , UMC Groningen, Netherlands and members of BBMRI-NL, NBIC, MOLGENIS BOSC 2011, July 15, Vienna
2. BOSC 2010 we demonstrated the MOLGENIS software toolkit Use (web) Animal Observatory NextGenSeq Mutation database Model organisms Model (xml) Generator (java) Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
3. Get stuff for free as others build it already Connect to annotation services Plugin rich analysis tools Connect to statistics UML documentation of your model Edit & trace your data Import/export to Excel find.investigation() 102 downloaded obs<-find.observedvalue( 43,920 downloaded #some calculation add.inferredvalue(res) 36 added
4. Three steps: Model –> Generate –> Use Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
5. Three steps: Model –> Generate –> Use 9200 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsForm.java 9293 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuarametersForm.java 9325 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenurotocolComponentsForm.java 9496 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologyTermsForm.java 9528 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesForm.java 9606 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesntologyTermsForm.java 9638 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsForm.java 9700 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsodesForm.java 9965 INFO [MenuScreenGen] generated generatedavaicreenopMenuMenu.java 10012 INFO [MenuScreenGen] generated generatedavaicreenopMenuainMenu.java 10059 INFO [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenuMenu.java 10152 INFO [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenurotocolApplicationsrotocolApplicationMenuMenu.java 10230 INFO [MenuScreenGen] generated generatedavaicreenopMenuainbservationTargetsMenu.java 10293 INFO [MenuScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuMenu.java 10324 INFO [MenuScreenGen] generated generatedavaicreenopMenuainntologiesMenu.java 11354 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuaineportPlugin.java 11557 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuainntologiesntologyManagerPlugin.java 11604 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuodel_documentationPlugin.java 11604 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuprojectApiPlugin.java 11620 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuttpApiPlugin.java 11635 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuebServicesApiPlugin.java 11651 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.ftl 11823 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.ftl 11823 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.ftl 11854 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.java 12057 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.java 12072 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.java 12103 INFO [MolgenisServletContextGen] generated WebContentETA-INFontext.xml 12259 INFO [SoapApiGen] generated generatedavaioapApi.java 12353 INFO [CsvExportGen] generated generatedavaoolssvExport.java 12431 INFO [CsvImportByNameGen] generated generatedavaoolssvImportByName.java 12636 INFO [CopyMemoryToDatabaseGen] generated generatedavaioolsopyMemoryToDatabase.java Real example: Generates 150 files, 30k lines of Java, MySQL, CXF, Tomcat config, and R code + docs
6. Three steps: Model –> Generate –> Use Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
7. Currently: Towards an integrated app suite XGAP for GWAS/GWL Disease specific databases BBMRI biobank catalogue GWAS central data manager NGS cyber infrastructure MAGE-TAB microarray AnimalDB Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
19. More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data Durbin et al., Nature 2010 common known
20. More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data shows that the majority of the newly identified and rare variants are population specific (and there are no Dutch in 1000G) Durbin et al., Nature 2010 common known new
21.
22. Idea 2: lets impute 100.000 existing Dutch GWAS data Imputation is the process of inferring any missing or untyped genetic variants from typed flanking genetic variants, based on the known local LD relationship GWAS data
23.
24.
25.
26.
27.
28.
29.
30. Challenge 2: Alignment, Variant Calling, and QC pipelines Alignment Variant calling Alignment to human genome (Build 37) Clean up alignment (mark duplicates, realignment, recalibration) Quality control SNP calling Indel calling Variant Filtering ~ 1 Week ~ 1 Week QC: Immunochip concordance
31.
32.
33. Challenge 4: Did we analyze it all? Correctly? Completely? Batches: UModqR 60 HUMcriR 90 HUMhxsR 222 HUMrutR 235 HUMjxbR 153 HUMsnrR 10