Bottoms bosc2010 bio_snp_inherit

•Télécharger en tant que PPT, PDF•

0 j'aime•303 vues

BOSC 2010

Technologie Business

5 million data “items” one CPU: 2+ days eight CPUs: 1-2 days

SNP ID Sample ID Base1 Base2 1 1 A A 1 2 A A 1 3 A G … … … … 1 5000 A A 2 1 C C … … … … … … … … 1106 5000 GG GG

“ Matrix” data file format SNP ID 1 2 3 … 5000 SNP1 AA AA AG … AA SNP2 CC GG GG … CG

Using new data format ,[object Object],[object Object]

ID’s file ID Name Group 1 B73 B73 2 B73xZ1 NAMF1 3 Mo17 Control 4 M100 IBM 5 Bob B73xZ1

“ Human Parsed” ID’s file ID Name Group A (ID) B (ID) AxB (ID) 1 B73 B73 2 B73xZ1 NAMF1 3 Mo17 Control 4 M100 IBM 1 3 5 Bob B73xZ1 1 2

Lessons learned ,[object Object],[object Object],[object Object],[object Object],[object Object]

Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Acknowledgements ,[object Object],[object Object],[object Object],[object Object]

Contenu connexe

En vedette

Lost in elysiumJay Lee

Twitterを利用した学生生活活性化案maruri0423

Gastcollege bibliotheekschool Gent mei 2016[automatisch opgeslagen]Erna Winters

Anne Meininger Usa555123

StartupLunch. Voices in the headalarin

Science.ppt [autosaved]MH BS

Boro01 2006alciput

안드로이드스터디 4jangpd007

Overzicht syllabus beroepspraktijk 1CVO-SSH

Ppt podcastJessWalker1

Cleaning Historic Bread And Cheese CreekClean Bread and Cheese Creek

Acorns and Habitat: Oaks Support a Diversity of Forest Wildlife Cary Institute of Ecosystem Studies

Desarrollo de Proyecto de aula omar

Ecological building keynoteholly

sounds in badaSamsung

Academic Honesty at Oxford College of Emory Universityoxfordcollegelibrary

2010 Powerpoint!Michelle

infrastructures Mare's Dreamsamsamaras

Social Media London Presentation 5th April 2011iohann Le Frapper

En vedette (19)

Lost in elysium

Twitterを利用した学生生活活性化案

Gastcollege bibliotheekschool Gent mei 2016[automatisch opgeslagen]

Anne Meininger Usa

StartupLunch. Voices in the head

Science.ppt [autosaved]

Boro01 2006

안드로이드스터디 4

Overzicht syllabus beroepspraktijk 1

Ppt podcast

Cleaning Historic Bread And Cheese Creek

Acorns and Habitat: Oaks Support a Diversity of Forest Wildlife

Desarrollo de Proyecto de aula

Ecological building keynote

sounds in bada

Academic Honesty at Oxford College of Emory University

2010 Powerpoint!

infrastructures Mare's Dream

Social Media London Presentation 5th April 2011

Similaire à Bottoms bosc2010 bio_snp_inherit

Deep Dive on Amazon DynamoDBAmazon Web Services

Ricostruzione forense di NTFS con metadati parzialmente danneggiatiAndrea Lazzarotto

Deep Dive on Amazon DynamoDBAmazon Web Services

Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick

Deep Dive on Amazon DynamoDBAmazon Web Services

Design Patterns using Amazon DynamoDBAmazon Web Services

Georgia Geospatial Workshop: Proper Care and Feeding of Metadatageospatialmetadata

Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Masahito Ohue

Console developmentspartasoft

Managing your black friday logs - Code EuropeDavid Pilato

DynamoDB Design WorkshopAmazon Web Services

Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Community

VB2015 Malware Classification meets crowd-sourcingJohn D. Park

Similaire à Bottoms bosc2010 bio_snp_inherit (13)

Deep Dive on Amazon DynamoDB

Ricostruzione forense di NTFS con metadati parzialmente danneggiati

Deep Dive on Amazon DynamoDB

Petascale Analytics - The World of Big Data Requires Big Analytics

Deep Dive on Amazon DynamoDB

Design Patterns using Amazon DynamoDB

Georgia Geospatial Workshop: Proper Care and Feeding of Metadata

Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...

Console development

Managing your black friday logs - Code Europe

DynamoDB Design Workshop

Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload

VB2015 Malware Classification meets crowd-sourcing

Plus de BOSC 2010

Mercer bosc2010 microsoft_frameworkBOSC 2010

Langmead bosc2010 cloud-genomicsBOSC 2010

Schultheiss bosc2010 persistance-web-servicesBOSC 2010

Swertz bosc2010 molgenisBOSC 2010

Rice bosc2010 embossBOSC 2010

Morris bosc2010 evokerBOSC 2010

Kono bosc2010 pathway_projectorBOSC 2010

Kanterakis bosc2010 molgenisBOSC 2010

Gautier bosc2010 pythonbioconductorBOSC 2010

Gardler bosc2010 community_developmentattheasfBOSC 2010

Friedberg bosc2010 iprstatsBOSC 2010

Fields bosc2010 bio_perlBOSC 2010

Chapman bosc2010 biopythonBOSC 2010

Bonnal bosc2010 bio_rubyBOSC 2010

Puton bosc2010 bio_python-modules-rnaBOSC 2010

Bader bosc2010 cytowebBOSC 2010

Talevich bosc2010 bio-phyloBOSC 2010

Zmasek bosc2010 aptxBOSC 2010

Wilkinson bosc2010 moby-to-sadiBOSC 2010

Venkatesan bosc2010 onto-toolkitBOSC 2010

Plus de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework

Langmead bosc2010 cloud-genomics

Schultheiss bosc2010 persistance-web-services

Swertz bosc2010 molgenis

Rice bosc2010 emboss

Morris bosc2010 evoker

Kono bosc2010 pathway_projector

Kanterakis bosc2010 molgenis

Gautier bosc2010 pythonbioconductor

Gardler bosc2010 community_developmentattheasf

Friedberg bosc2010 iprstats

Fields bosc2010 bio_perl

Chapman bosc2010 biopython

Bonnal bosc2010 bio_ruby

Puton bosc2010 bio_python-modules-rna

Bader bosc2010 cytoweb

Talevich bosc2010 bio-phylo

Zmasek bosc2010 aptx

Wilkinson bosc2010 moby-to-sadi

Venkatesan bosc2010 onto-toolkit

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

How to convert PDF to text with Nanonetsnaman860154

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

🐬 The future of MySQL is Postgres 🐘RTylerCroy

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Presentation on how to chat with PDF using ChatGPT code interpreter

Data Cloud, More than a CDP by Matt Robison

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Handwritten Text Recognition for manuscripts and early printed texts

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

CNv6 Instructor Chapter 6 Quality of Service

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

How to convert PDF to text with Nanonets

08448380779 Call Girls In Friends Colony Women Seeking Men

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

🐬 The future of MySQL is Postgres 🐘

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Breaking the Kubernetes Kill Chain: Host Path Mount

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Scaling API-first – The story of a global engineering organization

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Axa Assurance Maroc - Insurer Innovation Award 2024

Bottoms bosc2010 bio_snp_inherit

1. SNP Allele Designations (Bio::SNP::Inherit) Christopher Bottoms BOSC 2010

2. 5 million data “items” one CPU: 2+ days eight CPUs: 1-2 days

3. SNP ID Sample ID Base1 Base2 1 1 A A 1 2 A A 1 3 A G … … … … 1 5000 A A 2 1 C C … … … … … … … … 1106 5000 GG GG

4. SNP ID Sample ID Base1 Base2 1 1 A A 1 2 A A 1 3 A G … … … … 1 5000 A A 2 1 C C … … … … … … … … 1106 5000 GG GG

5. “ Matrix” data file format SNP ID 1 2 3 … 5000 SNP1 AA AA AG … AA SNP2 CC GG GG … CG

6. “ Matrix” data file format SNP ID 1 2 3 … 5000 SNP1 AA AA AG … AA SNP2 CC GG GG … CG

8. ID’s file ID Name Group 1 B73 B73 2 B73xZ1 NAMF1 3 Mo17 Control 4 M100 IBM 5 Bob B73xZ1

9. ID’s file ID Name Group 1 B73 B73 2 B73xZ1 NAMF1 3 Mo17 Control 4 M100 IBM 5 Bob B73xZ1

10. “ Human Parsed” ID’s file ID Name Group A (ID) B (ID) AxB (ID) 1 B73 B73 2 B73xZ1 NAMF1 3 Mo17 Control 4 M100 IBM 1 3 5 Bob B73xZ1 1 2

11.

12.

13.

14. End

Notes de l'éditeur

The data file had to read into the database and then the information from the database was used to determine inheritance codes.
We had 5000 samples of data associated with one “SNP ID” and we had over 1000 SNP ID’s, making our data file over 5 million lines long. It was actually much messier looking than this and I ended up processing each line and storing the results in a database. After talking with my boss about this, he provided me the same data in a different format.
We had 5000 samples of data associated with one “SNP ID” and we had over 1000 SNP ID’s, making our data file over 5 million lines long. It was actually much messier looking than this and I ended up processing each line and storing the results in a database. After talking with my boss about this, he provided me the same data in a different format.
This format really condensed the data file. From 800MB to less than 15MB, in fact. However, now each “data point” isn’t “tagged”, so some additional preprocessing needed to be done.
This format really condensed the data file. From 800MB to less than 15MB, in fact. However, now each “data point” isn’t “tagged”, so some additional preprocessing needed to be done.
The sample ID’s I showed you earlier, each represented a different individual corn plant. Knowing the relationships among the different plants was required for processing the data. Here, since I’m a human familiar the genetic system, I know that IBM stands for an Intermated B73 x Mo17 population. This is a simplified example of a manifest file. Z1, M100, and “Bob” are just made up names and any similarity to known names is purely coincidental. When you start looking at these, you see that the way the Relationships were defined in multiple ways. There isn’t anything here that directly tells that IBM and Mo17 and B73 are related. To take advantage of this information I wrote a long series of rules. Well, the break through came with the realization that I couldn’t keep this up forever. Instead of telling the computer how to understand these relationships, I decided to just tell the computer what the relationships are (next slide).
The sample ID’s I showed you earlier, each represented a different individual corn plant. Knowing the relationships among the different plants was required for processing the data. Here, since I’m a human familiar the genetic system, I know that IBM stands for an Intermated B73 x Mo17 population. This is a simplified example of a manifest file. Z1, M100, and “Bob” are just made up names and any similarity to known names is purely coincidental. When you start looking at these, you see that the way the Relationships were defined in multiple ways. There isn’t anything here that directly tells that IBM and Mo17 and B73 are related. To take advantage of this information I wrote a long series of rules. Well, the break through came with the realization that I couldn’t keep this up forever. Instead of telling the computer how to understand these relationships, I decided to just tell the computer what the relationships are (next slide).
This is organized in a way that is simple to both humans and computer programs to understand.
Configuration files are great for some tasks that are easy for humans but more difficult to program. They are also great for things that are variable Setting up the configuration file only takes minutes. If we don’t know what these relationships are to start with, then we’re in trouble anyway. Simple for humans ≠ simple for computers Something else I didn’t put up here is that reducing your dependencies sure makes it easier to install.

Bottoms bosc2010 bio_snp_inherit

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (19)

Similaire à Bottoms bosc2010 bio_snp_inherit

Similaire à Bottoms bosc2010 bio_snp_inherit (13)

Plus de BOSC 2010

Plus de BOSC 2010 (20)

Dernier

Dernier (20)

Bottoms bosc2010 bio_snp_inherit

Notes de l'éditeur