SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
nowomics
making science easy to follow

&

mongoDB
Richard Smith	

richard@nowomics.com

www.nowomics.com
OVERVIEW
•

About nowomics	


•

Why mongo?	


•

Data model	


•

Aggregation Framework	


•

A bit about replica sets
Ask lots of questions
Biomedical data are being generated	

and published at an unprecedented rate
model organisms

1500
literature

biological databases

proteins
pathways

genome annotation
gene expression
interactions

~20,000 journal articles a week
mutations
diseases
THE SOLUTION

BRCA2 gene

Follow

diabetes, type 2 disease Follow
neuron development

process

Follow
HOW NOWOMICS WORKS
literature
& databases

Fetch data	

every day

nowomics
Work out what’s
changed

link to original	

data source

Personalised	

News Feed	

& email alerts

Follow
Users follow what	

they work on

Organise by	

gene, disease,
process, author, etc
Alpha - summer 2013
Beta - November 2013
CEDAR Enterprise Fellowship - December 2013
TECH
Python

pymongo

MongoDB
Amazon

~20GB (data & indexes)

EC2, S3, SES

Elasticsearch
WHY MONGO?
•

‘schema less’ - data will change over time	


•

horizontal scaling	


•

rich query system	


•

ease of development
DATA MODEL
Gene

Publication

Gene

Disease
DATA MODEL
Gene

Publication

Gene

Disease

Tracking these relationships
gene: PPARG
disease: diabetes

date: 17 Nov 2013
source: NCBI
DATA MODEL
Gene

Publication

Gene

Disease

Tracking these relationships
gene: PPARG
disease: diabetes
experiment: GWAS
probability: 0.012

date: 17 Nov 2013
source: NCBI

+ more fields

???

+ more
types
COLLECTIONS
links

12.5m

short field names

{t1: gene, n1: 101, t2: pub, n2: 201,
date: 2013-11-17, type: pub }

gene

200k

{id: 101, symbol: PPARG}

pub

1.4m

{id: 201, identifier: 24386954}

disease

12k

{id: 201, name: diabetes}
1.

2.
3.
4.

1+4 - precalculate with aggregation framework
3 - wasn’t using correct index, needed hint
2 - uses aggregation framework, doesn’t support hint
AGGREGATION FRAMEWORK I
•

New in 2.2 - alternative to map reduce	

•

map reduce was slow and complex	


•

Analogous to SQL group by

•

Run a pipeline of commands
db.links.aggregate([
{$match: {t2: 'pub', t1: 'gene'}},
{$group: {_id : '$n2', count: {$sum: 1}}} ])
AGGREGATION FRAMEWORK II

*

*
AGGREGATION FRAMEWORK III

*
AGGREGATION EXAMPLE
Count of genes linked to each publication

db.links.aggregate([
{$match: {t2: 'pub', t1: 'gene'}},
{$group: {_id : '$n2', count: {$sum: 1}}} ])

(actually precalculate for all and store results in collection)
AGGREGATION EXAMPLE
Count of updates per month
db.links.aggregate([
{$match: {date: {$gte: new ISODate('2013-02-01')},
't1': 'gene', 'n1': 530}},
{$project: {_id: 0, month: {$month: ‘$date'},
year: {$year: ‘$date'}}},
{$group: {'_id': {m: '$month', y: ‘$year'},
count: {$sum: 1}}} ] )

(actually precalculate for all and store results in collection)
AGGREGATION ISSUES
•

No explain()(coming in 2.6)	


•

Can’t use index hints	


•

16MB result limit - run in batches

(coming in 2.6)	


•

Can’t output results to collection

(coming in 2.6)
EC2 ARCHITECTURE
primary

secondary

EC2 large

EC2 large
app

EC2 large
arbiter

EC2 micro

mongoDB replica set
EC2 ARCHITECTURE
primary

app

secondary

EC2 large

EC2 large
app

EC2 large
arbiter

EC2 micro

mongoDB replica set
PERFORMANCE
Indexes & data (20GB) bigger than RAM (8GB)	

• main indexes in RAM would be OK	

• Loading data uses different indexes	

• Slow page loads
•
PERFORMANCE
Indexes & data (20GB) bigger than RAM (8GB)	

• main indexes in RAM would be OK	

• Loading data uses different indexes	

• Slow page loads	

•

!

•

ReadPreference.SECONDARY_PREFERRED
send links queries to secondary	

• indexes stay in RAM
•
nowomics
making science easy to follow

richard@nowomics.com

www.nowomics.com

Contenu connexe

Dernier

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Nowomics & MongoDB