SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Technology development of
database integration to
make sense of big data in
lifescience
Hidemasa Bono
Database Center for Lifescience (DBCLS)
Research Organization of Information and
Systems (ROIS), JAPAN
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Who we are: togoDB
• The integrated database project in Japan
• Collaborative effort to recycle data
–Provide data which can easily reuse
–Retain data which is part of ‘public data’
2
TogoHeadquarters
Technology developer
DNA data archiver
Universities & institutes
Data organizer
http://biosciencedbc.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
NBDC portal
3
http://biosciencedbc.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
4
http://integbio.jp/dbcatalog/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
5
photo by @hirabat (1st Bono Conference on 20130113 )
• No registration
• Not only for academia,
also for-profit
Free!
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Big data in lifescience
• Output mostly from machines
–NGS(Next Generation Sequencers)
• over 100M lines, 2Gbyte in size/sample
• Ethical issues: Personal human genome
• So many variations in...
–Data format
–Application: re-sequencing, de novo seq, RNA-seq,...
–Annotation: granularity of metadata
Pictures from Togo Picture Gallery
http://g86.dbcls.jp/togopic/
NGS(SRA)
GEO
ArrayExpress
Genome
Metagenome
RNAseq
ChIPseq microarray (GeneChip,
Oligoarray)
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Making sense of big data...
1. Exhaustive, but
functional index
3. Highly curated
dataset
2. Search engine
for lifescience
NGS(SRA)
GEO
ArrayExpress
Genome
Metagenome
RNAseq
ChIPseq microarray (GeneChip,
Oligoarray)
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
What we have developed
1.Yellow pages for NGS data archived
http://SRA.dbcls.jp/
2.Search engine for nucleotide sequences
http://GGRNA.dbcls.jp/
3.Summarization and visualization of
reference transcriptome data
http://RefEx.dbcls.jp/
8
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
1. DBCLS SRA
• Yellow pages for NGS data archived
–Indexed by metadata. Search by....
• Statistics
• Publications
• Diseases
–Direct link to original DB(SRA)
• Pre-calculated QC data
9
Search data
Download
Quality Check
Data processing
Analysis
Pipeline to help users re-use public NGS data
http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Statistics: studies
10http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
11
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Statistics: samples
12
http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by publications
13
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by diseases
14
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by diseases(cont.)
15
http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
GGRNA
16
•Quickly finds nucleotiode
sequence as well as other
fields in RefSeq transcripts
using suffix array
•Easily highlights PCR
primers, microarray
probes and target
sequences of siRNA
2. GooGle like RNA search
engine http://GGRNA.dbcls.jp/
Naito Y. & Bono H.
Nucleic Acids Res. (2012) 40: W592-6.
doi: 10.1093/nar/gks448
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
17
Genome version of GGRNA? Yes, we can!
GooGle like Genome search
engine http://GGGenome.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
3. RefEx: Reference
Expression Dataset
• 40 organs dataset, 4 different methods,
with BodyParts3D
–Reference of gene expression in normal organs
throughout the mammalian body
–Practical example of reuse of useful public data
• The search for "tissue-specific genes"
18
EST
Classical Expressed Sequence Tags
GeneChip
Affymetrix’s microarray
CAGE
Cap Analysis of Gene Expression
RNAseq
Transcriptome Sequencing
http://RefEx.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
19
http://RefEx.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
20
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
21
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
What we have developed
1.Yellow pages for NGS data archived
http://SRA.dbcls.jp/
2.Search engine for nucleotide sequences
http://GGRNA.dbcls.jp/
3.Summarization and visualization of
reference transcriptome data
http://RefEx.dbcls.jp/
22
are developing
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
TogoTV
Archive of talks and tutorial videos expounding
how to use biological databases and tools
23
http://togotv.dbcls.jp/en/Acknowledgement
•Members in DBCLS for technology development
•NBDC for funding/DDBJ for storage & CPU time
•All people for sharing precious data

Contenu connexe

Plus de Hidemasa Bono

What was togofarm on earth?
What was togofarm on earth?What was togofarm on earth?
What was togofarm on earth?
Hidemasa Bono
 
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
Hidemasa Bono
 
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
Hidemasa Bono
 

Plus de Hidemasa Bono (9)

What was togofarm on earth?
What was togofarm on earth?What was togofarm on earth?
What was togofarm on earth?
 
“これから”のライフサイエンス研究とバイオインフォマティクス (Next Generation Life Science & Bioinformatics)
“これから”のライフサイエンス研究とバイオインフォマティクス (Next Generation Life Science & Bioinformatics)“これから”のライフサイエンス研究とバイオインフォマティクス (Next Generation Life Science & Bioinformatics)
“これから”のライフサイエンス研究とバイオインフォマティクス (Next Generation Life Science & Bioinformatics)
 
データベース活用による 知のめぐりのよい細胞生物学
データベース活用による 知のめぐりのよい細胞生物学データベース活用による 知のめぐりのよい細胞生物学
データベース活用による 知のめぐりのよい細胞生物学
 
バイオインフォマティクス(2013年度以降用改訂版)
バイオインフォマティクス(2013年度以降用改訂版)バイオインフォマティクス(2013年度以降用改訂版)
バイオインフォマティクス(2013年度以降用改訂版)
 
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
データベースから始まる分子生物学~トランスクリプトーム解析研究の新しいスタイル~
 
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
第57回日本人類遺伝学会大会 教育講演「バイオインフォマティクス:データベース統合化によるアプローチ」
 
TogoRecipes 120907
TogoRecipes 120907TogoRecipes 120907
TogoRecipes 120907
 
Integrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledgeIntegrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledge
 
bonohu's presentation in Osaka.R#6
bonohu's presentation in Osaka.R#6bonohu's presentation in Osaka.R#6
bonohu's presentation in Osaka.R#6
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Technology development of database integration to make sense of big data in lifescience

  • 1. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Technology development of database integration to make sense of big data in lifescience Hidemasa Bono Database Center for Lifescience (DBCLS) Research Organization of Information and Systems (ROIS), JAPAN
  • 2. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Who we are: togoDB • The integrated database project in Japan • Collaborative effort to recycle data –Provide data which can easily reuse –Retain data which is part of ‘public data’ 2 TogoHeadquarters Technology developer DNA data archiver Universities & institutes Data organizer http://biosciencedbc.jp/
  • 3. © 2013 DBCLS Licensed under CC BY 2.1JAPAN NBDC portal 3 http://biosciencedbc.jp/
  • 4. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 4 http://integbio.jp/dbcatalog/
  • 5. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 5 photo by @hirabat (1st Bono Conference on 20130113 ) • No registration • Not only for academia, also for-profit Free!
  • 6. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Big data in lifescience • Output mostly from machines –NGS(Next Generation Sequencers) • over 100M lines, 2Gbyte in size/sample • Ethical issues: Personal human genome • So many variations in... –Data format –Application: re-sequencing, de novo seq, RNA-seq,... –Annotation: granularity of metadata Pictures from Togo Picture Gallery http://g86.dbcls.jp/togopic/ NGS(SRA) GEO ArrayExpress Genome Metagenome RNAseq ChIPseq microarray (GeneChip, Oligoarray)
  • 7. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Making sense of big data... 1. Exhaustive, but functional index 3. Highly curated dataset 2. Search engine for lifescience NGS(SRA) GEO ArrayExpress Genome Metagenome RNAseq ChIPseq microarray (GeneChip, Oligoarray)
  • 8. © 2013 DBCLS Licensed under CC BY 2.1JAPAN What we have developed 1.Yellow pages for NGS data archived http://SRA.dbcls.jp/ 2.Search engine for nucleotide sequences http://GGRNA.dbcls.jp/ 3.Summarization and visualization of reference transcriptome data http://RefEx.dbcls.jp/ 8
  • 9. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 1. DBCLS SRA • Yellow pages for NGS data archived –Indexed by metadata. Search by.... • Statistics • Publications • Diseases –Direct link to original DB(SRA) • Pre-calculated QC data 9 Search data Download Quality Check Data processing Analysis Pipeline to help users re-use public NGS data http://SRA.dbcls.jp/
  • 10. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Statistics: studies 10http://SRA.dbcls.jp/
  • 11. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 11
  • 12. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Statistics: samples 12 http://SRA.dbcls.jp/
  • 13. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by publications 13
  • 14. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by diseases 14
  • 15. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by diseases(cont.) 15 http://SRA.dbcls.jp/
  • 16. © 2013 DBCLS Licensed under CC BY 2.1JAPAN GGRNA 16 •Quickly finds nucleotiode sequence as well as other fields in RefSeq transcripts using suffix array •Easily highlights PCR primers, microarray probes and target sequences of siRNA 2. GooGle like RNA search engine http://GGRNA.dbcls.jp/ Naito Y. & Bono H. Nucleic Acids Res. (2012) 40: W592-6. doi: 10.1093/nar/gks448
  • 17. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 17 Genome version of GGRNA? Yes, we can! GooGle like Genome search engine http://GGGenome.dbcls.jp/
  • 18. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 3. RefEx: Reference Expression Dataset • 40 organs dataset, 4 different methods, with BodyParts3D –Reference of gene expression in normal organs throughout the mammalian body –Practical example of reuse of useful public data • The search for "tissue-specific genes" 18 EST Classical Expressed Sequence Tags GeneChip Affymetrix’s microarray CAGE Cap Analysis of Gene Expression RNAseq Transcriptome Sequencing http://RefEx.dbcls.jp/
  • 19. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 19 http://RefEx.dbcls.jp/
  • 20. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 20
  • 21. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 21
  • 22. © 2013 DBCLS Licensed under CC BY 2.1JAPAN What we have developed 1.Yellow pages for NGS data archived http://SRA.dbcls.jp/ 2.Search engine for nucleotide sequences http://GGRNA.dbcls.jp/ 3.Summarization and visualization of reference transcriptome data http://RefEx.dbcls.jp/ 22 are developing
  • 23. © 2013 DBCLS Licensed under CC BY 2.1JAPAN TogoTV Archive of talks and tutorial videos expounding how to use biological databases and tools 23 http://togotv.dbcls.jp/en/Acknowledgement •Members in DBCLS for technology development •NBDC for funding/DDBJ for storage & CPU time •All people for sharing precious data