SlideShare une entreprise Scribd logo
1  sur  19
Overview
1. Introduction to HPC
2. Hadoop (HDFS, MapReduce)
3. AWS toolkit (Amazon S3, Amazon EMR, Amazon Redshift)
4. Case study
Why?
Large data files from sequencers.
Computational bottleneck.
Processing time.
Data persistence and reliability.
Data security.
Bottlenecks in Genome Analysis
How?
Introduction
“High Performance Computing (HPC) most generally refers to the practice of
aggregating computing power in a way that delivers much higher performance
than one could get out of a typical desktop computer or workstation in order to
solve large problems in science, engineering, or business. ”
Dedicated supercomputer.
Commodity HPC cluster.
Grid computing.
HPC in cloud.
Forms of HPC
What?
Hadoop
Open source Java based framework for reliable, scalable and distributed
computing.
Doug Cutting and Mike Cafarella, 2006-08 in Yahoo!- inspired by Google (GFS)
in 2003.
Key Components
Hadoop Distributed File System (HDFS)
MapReduce
Hadoop
HDFS (Hadoop Distributed File System)
Data management layer
Master-Slave architecture
Fault Tolerant
Key Components:
NameNode
SecondaryNamenode
Hadoop- Continued
MapReduce
Mappers and Reducers
Batch oriented
Key Components
JobTracker
TaskTracker
Hadoop- Architecture
AWS ToolKit - Amazon Elastic MapReduce (EMR)
Managed Hadoop framework.
Runs almost all popular distributed frameworks such as Apache Spark, HBase,
Presto, and Flink.
Elastic.
Flexible Data storage (S3, HDFS, RedShift, Glacier, RDS).
Secure and reliable.
Full control and root access.
AWS ToolKit - Amazon EMR
aws emr create-cluster 
--name "demo" 
--release-label emr-4.5.0 
--instance-type m3.xlarge 
--instance-count 2 
--ec2-attributes KeyName=YOUR-AWS-SSH-KEY 
--use-default-roles 
--applications Name=Hive Name=Spark
aws emr create-cluster 
--name "Test cluster" 
--ami-version 2.4 
--applications Name=Hive Name=Pig 
--use-default-roles --ec2-attributes KeyName=myKey 
--instance-groups 
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge 
--steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE,
Args=[-f,s3://mybucket/scripts/pigscript.pig,-p, 
INPUT=s3://mybucket/inputdata/,-p, 
OUTPUT=s3://mybucket/outputdata/, 
$INPUT=s3://mybucket/inputdata/, 
$OUTPUT=s3://mybucket/outputdata/]
AWS ToolKit - Amazon S3 (Simple Storage Service)
Virtually infinite storage.
Single object size up to 5TB.
Why use S3?
Durable, Low Cost, Scalable, High Performance, Secure, Integrated, Easy to Use.
Decouple storage and computation resources.
HDFS requirements and implements EMRFS.
AWS ToolKit - Amazon Redshift
Fast, simple petabyte-scale data warehouse.
Use SQL query to interact.
Massively parallel.
Relational.
Architecture - Leader Node and Compute node.
Fast - 4 GB/sec/node.
Case Study- Rail RNA
Cloud-enabled spliced aligner that analyzes many samples at once.
Architecture - Amazon S3, Amazon EMR.
~50000 (from NCBI archive) human RNA sample using Rail-RNA - 150 Tbps.
Input to result - 2 weeks.
Cost- ~$1.40/sample.
Paper- Splicing across SRA.
Thank you

Contenu connexe

Tendances

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 

Tendances (20)

Hadoop
Hadoop Hadoop
Hadoop
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Cloud Native In-Depth
Cloud Native In-DepthCloud Native In-Depth
Cloud Native In-Depth
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 
Your Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and ContainersYour Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
 
Google Cloud Platform (GCP).ppt
Google Cloud Platform (GCP).pptGoogle Cloud Platform (GCP).ppt
Google Cloud Platform (GCP).ppt
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
 
AWS 12월 웨비나 │클라우드 마이그레이션을 통한 성공사례
AWS 12월 웨비나 │클라우드 마이그레이션을 통한 성공사례AWS 12월 웨비나 │클라우드 마이그레이션을 통한 성공사례
AWS 12월 웨비나 │클라우드 마이그레이션을 통한 성공사례
 
Terraform
TerraformTerraform
Terraform
 
Terraform vs Pulumi
Terraform vs PulumiTerraform vs Pulumi
Terraform vs Pulumi
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술
 
Scheduling in cloud
Scheduling in cloudScheduling in cloud
Scheduling in cloud
 
Introduction to helm
Introduction to helmIntroduction to helm
Introduction to helm
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
 
KEDA Overview
KEDA OverviewKEDA Overview
KEDA Overview
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 

Similaire à High Performance Computing (HPC) in cloud

AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
Amazon Web Services
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
Amazon Web Services Korea
 

Similaire à High Performance Computing (HPC) in cloud (20)

(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
How containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go liveHow containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go live
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 

Plus de Accubits Technologies

Plus de Accubits Technologies (6)

AI-powered real-time video analytics for Manufacturing sector
AI-powered real-time video analytics for Manufacturing sectorAI-powered real-time video analytics for Manufacturing sector
AI-powered real-time video analytics for Manufacturing sector
 
AI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sectorAI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sector
 
Blockchain and IoT For Supply Chain Traceability
Blockchain and IoT For Supply Chain TraceabilityBlockchain and IoT For Supply Chain Traceability
Blockchain and IoT For Supply Chain Traceability
 
ICOs : past, present and future
ICOs : past, present and futureICOs : past, present and future
ICOs : past, present and future
 
Blockchain in Bioinformatics
Blockchain in BioinformaticsBlockchain in Bioinformatics
Blockchain in Bioinformatics
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 

Dernier

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

High Performance Computing (HPC) in cloud

  • 1.
  • 2. Overview 1. Introduction to HPC 2. Hadoop (HDFS, MapReduce) 3. AWS toolkit (Amazon S3, Amazon EMR, Amazon Redshift) 4. Case study
  • 4. Large data files from sequencers. Computational bottleneck. Processing time. Data persistence and reliability. Data security. Bottlenecks in Genome Analysis
  • 6. Introduction “High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. ”
  • 7. Dedicated supercomputer. Commodity HPC cluster. Grid computing. HPC in cloud. Forms of HPC
  • 9. Hadoop Open source Java based framework for reliable, scalable and distributed computing. Doug Cutting and Mike Cafarella, 2006-08 in Yahoo!- inspired by Google (GFS) in 2003. Key Components Hadoop Distributed File System (HDFS) MapReduce
  • 10. Hadoop HDFS (Hadoop Distributed File System) Data management layer Master-Slave architecture Fault Tolerant Key Components: NameNode SecondaryNamenode
  • 11. Hadoop- Continued MapReduce Mappers and Reducers Batch oriented Key Components JobTracker TaskTracker
  • 13. AWS ToolKit - Amazon Elastic MapReduce (EMR) Managed Hadoop framework. Runs almost all popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink. Elastic. Flexible Data storage (S3, HDFS, RedShift, Glacier, RDS). Secure and reliable. Full control and root access.
  • 14. AWS ToolKit - Amazon EMR aws emr create-cluster --name "demo" --release-label emr-4.5.0 --instance-type m3.xlarge --instance-count 2 --ec2-attributes KeyName=YOUR-AWS-SSH-KEY --use-default-roles --applications Name=Hive Name=Spark
  • 15. aws emr create-cluster --name "Test cluster" --ami-version 2.4 --applications Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE, Args=[-f,s3://mybucket/scripts/pigscript.pig,-p, INPUT=s3://mybucket/inputdata/,-p, OUTPUT=s3://mybucket/outputdata/, $INPUT=s3://mybucket/inputdata/, $OUTPUT=s3://mybucket/outputdata/]
  • 16. AWS ToolKit - Amazon S3 (Simple Storage Service) Virtually infinite storage. Single object size up to 5TB. Why use S3? Durable, Low Cost, Scalable, High Performance, Secure, Integrated, Easy to Use. Decouple storage and computation resources. HDFS requirements and implements EMRFS.
  • 17. AWS ToolKit - Amazon Redshift Fast, simple petabyte-scale data warehouse. Use SQL query to interact. Massively parallel. Relational. Architecture - Leader Node and Compute node. Fast - 4 GB/sec/node.
  • 18. Case Study- Rail RNA Cloud-enabled spliced aligner that analyzes many samples at once. Architecture - Amazon S3, Amazon EMR. ~50000 (from NCBI archive) human RNA sample using Rail-RNA - 150 Tbps. Input to result - 2 weeks. Cost- ~$1.40/sample. Paper- Splicing across SRA.