31. Cloud in Genomics
Genomics의 구체적 cloud 사례들
서비스 구분 서비스명 서비스 내용 KT
IaaS
Amazon WebServices
PaaS, SaaS 서비스회사들을 적극 지원
public dataset을 통한 간접 지원
ucloud biz
Google Cloud 컴퓨팅과 스토리지 지원 ucloud biz
NeCTAR Research Cloud OpenStack 기반 연구자를 위한 Private Cloud ucloud biz
SaaS
DNANexus NGS 데이터 분석 파이프라인 제공 g-Analysis
SevenBridge Genomics NGS 데이터 분석 파이프라인 제공 g-Analysis
GotCloud NGS 데이터 분석 파이프라인 제공 g-Analysis
Globus Genomics Galaxy를 AWS 기반으로 제공 g-Galaxy
GenomeSpace Storage 기반의 bioinformatics 서비스 제공 g-Storage
PaaS
CloudMan 스토리지 기반의 Bioinformatics 툴 지원
SeqWare 유전체 분석 가능한 기반 플랫폼 제공
StarCluster AWS 기반의 HPC Cluster 컴퓨팅 환경 제공
CycleComputing AWS 기반의 HPC Cluster 컴퓨팅 환경 제공
Google Genomics, BigQeury NGS 데이터 분석을 위한 API 제공 g-Insight
50. HPC Cloud의 요소 - Web Service
http://seqware.github.io/docs/
LIMS
Object
Storage
High
Speed
File
Transfer
IaaS
HPC
Private Cloud
OpenStack…
Job Schedule
Bioinformatics
Linux…
Bioinformatics
Bioinformatics
Hadoop and
Database
57. # Compute the Ti/Tv ratio for BRCA1.
SELECT
transitions,
transversions,
transitions/transversions AS titv
FROM (
SELECT
SUM(IF(mutation IN ('A->G',
'G->A',
'C->T',
'T->C'),
INTEGER(num_snps),
INTEGER(0))) AS transitions,
SUM(IF(mutation IN ('A->C',
'C->A',
'G->T',
'T->G',
'A->T',
'T->A',
'C->G',
'G->C'),
INTEGER(num_snps),
INTEGER(0))) AS transversions,
FROM (
SELECT
CONCAT(reference_bases,
CONCAT(STRING('->'),
alternate_bases)) AS mutation,
COUNT(alternate_bases) AS num_snps,
FROM
[google.com:biggene:1000genomes.variants1kG]
WHERE
contig = '17'
AND position BETWEEN 41196312
AND 41277500
AND vt = 'SNP'
GROUP BY
mutation
ORDER BY
mutation));
Google BigQuery with plot
result <- query_exec(project = "google.com:biggene", dataset =
"1000genomes",
query = sql, billing = billing_project)
Ti/Tv ratio in BRCA1
58. # Count the variation for each sample including phenotypic traits
SELECT
samples.genotype.sample_id AS sample_id,
gender,
population,
super_population,
COUNT(samples.genotype.sample_id) AS num_variants_for_sample,
SUM(IF(samples.af >= 0.05,
INTEGER(1),
INTEGER(0))) AS common_variant,
SUM(IF(samples.af < 0.05
AND samples.af > 0.005,
INTEGER(1),
INTEGER(0))) AS middle_variant,
SUM(IF(samples.af <= 0.005
AND samples.af > 0.001,
INTEGER(1),
INTEGER(0))) AS rare_variant,
SUM(IF(samples.af <= 0.001,
INTEGER(1),
INTEGER(0))) AS very_rare_variant,
FROM
FLATTEN([google.com:biggene:1000genomes.variants1kG],
genotype) AS samples
JOIN
[google.com:biggene:1000genomes.sample_info] p
ON
samples.genotype.sample_id = p.sample
WHERE
samples.vt = 'SNP'
AND (samples.genotype.first_allele > 0
OR samples.genotype.second_allele > 0)
GROUP BY
sample_id,
gender,
population,
super_population
ORDER BY
sample_id;
Google BigQuery with R
ggplot(result, aes(x = population, y = common_variant, fill =
super_population)) +
geom_boxplot() + ylab("Count of common variants per sample") +
ggtitle("Common Variants (Minimum Allelic Frequency 5%)")
Variant type