SlideShare une entreprise Scribd logo
1  sur  25
変通 [hen-tsoo]
noun
1. Resourcefulness – the quality of being able to cope with a difficult situation
2. Adaptability – the ability to change (or be changed) to fit changed circumstances
3. Agility – the power of moving quickly and easily; nimbleness
INFINITELY SCALABLE CLUSTERS
Grid computing on public cloud
WELCOME TO HENTSŪ
AGENDA
• Grid computing overview
• Trusted tools moving into public cloud
• Alternative cloud services
SOME BACKGROUND
TERMINOLOGY
• Public Cloud (AWS, Azure, Google)
• Private Cloud (Your datacentre)
• High Performance Computing (HPC)
• Grid computing
• Compute cluster
• Mathworks MATLAB
• CPUs / Processors / Cores
• RAM (processor storage)
• Disk (physical storage)
• IaaS (virtual hardware and
networking)
• PaaS (software services)
WHAT IS PUBLIC CLOUD?
“A service provider makes resources, such as virtual machines, applications and
storage, available to the general public.”
• Utility model
• No contracts
• Shared hardware / multi tenant
• Self managed
WHAT IS GRID COMPUTING?
Traditional resource limitations:
• Data store performance
• PC Processor / Memory / Storage
• Network bandwidth
The researcher may wait a long time for results.
• Grid computing moves the computational work from the
PC to a cluster of servers
• The cluster processes the data on behalf of the
researcher and returns the results
• Processing time is reduced
• Larger datasets can be tackled
KEY CONCEPTS
The challenges The workflows
Number of tasks
Sizeofdata
Big Data
High Throughput
Computing
MapReduce
High Performance
Computing
Ingest Process
Analyse
Visualise
Store
CHOICE OF TOOLS AND PLATFORMS
TRUSTED TOOLS & PUBLIC
CLOUD
HARDWARE INFLEXIBILITY
• Buy 22 core processors at 2.2GHz or 6
core processors at 3.6GHz?
• Buy 8GB, 16GB or 32GB memory
modules (RAM per core ratio)?
• Graphical Processing Units (GPUs)?
• How much local storage per server?
• What network devices between
servers (32 or 48 port switches?)
• What size file server?
0
20
40
60
80
100
120
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Jobsperday
Date
Grid usage varies depending on research priorities:
PROFILING MATLAB RESOURCE USAGE
• MATLAB uses one processor core at a
time (50% on a 2 vCPU machine). Use
parallel computing toolkit for
multicore PCs.
• MATLAB stores all data in RAM, very
little I/O while processing
• I/O spike when writing out results
SysInternals Process Explorer
MATLAB GRID WITH
PUBLIC CLOUD
- Pay only for what you use
- Scale compute resource up
AND down
- Minimal capital outlay on
hardware
- Experiment with grid
computing platforms
quickly, cheaply and with
no commitment
A DAY IN A PUBLIC
CLOUD CLUSTER
0
20
40
60
80
100
120
140
160
180
Time
00:30:00
01:10:00
01:50:00
02:30:00
03:10:00
03:50:00
04:30:00
05:10:00
05:50:00
06:30:00
07:10:00
07:50:00
08:30:00
09:10:00
09:50:00
10:30:00
11:10:00
11:50:00
12:30:00
13:10:00
13:50:00
14:30:00
15:10:00
15:50:00
16:30:00
17:10:00
17:50:00
18:30:00
19:10:00
19:50:00
20:30:00
21:10:00
21:50:00
22:30:00
23:10:00
Workers Tasks in Queue
- Cluster consisting 32x 4 cores
- Max 128 worker nodes
- Ramps up as jobs get submitted
- Tears down nodes when jobs
finished
- Minimising costs when not in use
IDEAL CLUSTER SIZE?
0
200
400
600
800
1000
1200
1400
8 16 32 64 96 128 160 192 224
Seconds
Cores
Job Run time in seconds
Ingest Process
Analyse
Visualise
Store
Optimise other parts of the workflow?
RUNNING MATLAB CLUSTER ON IAAS
AWS vCPUs are hyper-threaded™
Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types
(M4, M3, C4, C3, R3, HS1, G2, I2, and D2)
https://aws.amazon.com/ec2/instance-types/
Azure does not overcommit memory or
cores. vCPUs are physical cores.
Azure does not use hyper-threading.
https://aws.amazon.com/ec2/instance-types/
GRID DEPLOYMENT OPTIONS
1. Infrastructure as a Service (IaaS) DIY
Spin up a compute cluster on VMs for additional capacity and new workloads
2. Burst
Use existing on premises compute cluster and burst on cloud as required
3. Software as a Service (SaaS)
Software vendors and Managed Service Providers provide their own SaaS
solutions. Pay for compute and application software per hour
4. Platform as a Service (PaaS)
Cloud providers’ data analytics platform as a service:
Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR
CLOUD HOSTED DATA AND
ANALYTICS AS A SERVICE
GOOGLE BIG DATA REFERENCE ARCHITECTURE
WHAT IS BIGQUERY?
Hadoop based “service that enables
interactive analysis of massively large
datasets”
• Distributed File System - Stores data
that’s larger than can fit on a single
machine
• Map Reduce – Distributes processing
across multiple systems
http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop
GOOGLE BIGQUERY AND
DATALAB DEMO
DON’T FORGET SECURITY
Security considerations:
• Secure transfer and storage of data and code
• Secure remote access to cloud hosted environment
• Secure authentication
• Windows AD credentials
• AWS IAM credentials
• Google accounts
• Microsoft accounts
• Auditing (who accessed what, who changed what)
SUMMARY
• Traditional grid and HPC tools can benefit from moving into cloud
• Vast landscape of available tools
• Off-the-shelf PaaS offerings
• Integrations and ecosystems
• Cheap and very quick to experiment
Hentsu Ltd
1 Fore Street
London EC2Y 9DT
hello@hentsu.com
https://hentsu.com
MORE INFORMATION?
NEXT EVENT:
JANUARY 2017
Intellectual Property (IP)
security for Public Cloud
Services
Securing mobile email and
cloud based file storage

Contenu connexe

Tendances

Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsandrewdenty
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of dataArnon Shimoni
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsArnon Shimoni
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAshrith Mekala
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS Mesosphere Inc.
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidAlluxio, Inc.
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topicsValentin Kropov
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streamingMaturin BADO
 

Tendances (20)

Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Openstack
OpenstackOpenstack
Openstack
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
 
Zabbix at scale with Elasticsearch
Zabbix at scale with ElasticsearchZabbix at scale with Elasticsearch
Zabbix at scale with Elasticsearch
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topics
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 

Similaire à Infinitely Scalable Clusters - Grid Computing on Public Cloud - London

Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsAlluxio, Inc.
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 

Similaire à Infinitely Scalable Clusters - Grid Computing on Public Cloud - London (20)

Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 

Dernier

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Infinitely Scalable Clusters - Grid Computing on Public Cloud - London

  • 1. 変通 [hen-tsoo] noun 1. Resourcefulness – the quality of being able to cope with a difficult situation 2. Adaptability – the ability to change (or be changed) to fit changed circumstances 3. Agility – the power of moving quickly and easily; nimbleness INFINITELY SCALABLE CLUSTERS Grid computing on public cloud
  • 3. AGENDA • Grid computing overview • Trusted tools moving into public cloud • Alternative cloud services
  • 5. TERMINOLOGY • Public Cloud (AWS, Azure, Google) • Private Cloud (Your datacentre) • High Performance Computing (HPC) • Grid computing • Compute cluster • Mathworks MATLAB • CPUs / Processors / Cores • RAM (processor storage) • Disk (physical storage) • IaaS (virtual hardware and networking) • PaaS (software services)
  • 6. WHAT IS PUBLIC CLOUD? “A service provider makes resources, such as virtual machines, applications and storage, available to the general public.” • Utility model • No contracts • Shared hardware / multi tenant • Self managed
  • 7. WHAT IS GRID COMPUTING? Traditional resource limitations: • Data store performance • PC Processor / Memory / Storage • Network bandwidth The researcher may wait a long time for results. • Grid computing moves the computational work from the PC to a cluster of servers • The cluster processes the data on behalf of the researcher and returns the results • Processing time is reduced • Larger datasets can be tackled
  • 8. KEY CONCEPTS The challenges The workflows Number of tasks Sizeofdata Big Data High Throughput Computing MapReduce High Performance Computing Ingest Process Analyse Visualise Store
  • 9. CHOICE OF TOOLS AND PLATFORMS
  • 10. TRUSTED TOOLS & PUBLIC CLOUD
  • 11. HARDWARE INFLEXIBILITY • Buy 22 core processors at 2.2GHz or 6 core processors at 3.6GHz? • Buy 8GB, 16GB or 32GB memory modules (RAM per core ratio)? • Graphical Processing Units (GPUs)? • How much local storage per server? • What network devices between servers (32 or 48 port switches?) • What size file server? 0 20 40 60 80 100 120 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Jobsperday Date Grid usage varies depending on research priorities:
  • 12. PROFILING MATLAB RESOURCE USAGE • MATLAB uses one processor core at a time (50% on a 2 vCPU machine). Use parallel computing toolkit for multicore PCs. • MATLAB stores all data in RAM, very little I/O while processing • I/O spike when writing out results SysInternals Process Explorer
  • 13. MATLAB GRID WITH PUBLIC CLOUD - Pay only for what you use - Scale compute resource up AND down - Minimal capital outlay on hardware - Experiment with grid computing platforms quickly, cheaply and with no commitment
  • 14. A DAY IN A PUBLIC CLOUD CLUSTER 0 20 40 60 80 100 120 140 160 180 Time 00:30:00 01:10:00 01:50:00 02:30:00 03:10:00 03:50:00 04:30:00 05:10:00 05:50:00 06:30:00 07:10:00 07:50:00 08:30:00 09:10:00 09:50:00 10:30:00 11:10:00 11:50:00 12:30:00 13:10:00 13:50:00 14:30:00 15:10:00 15:50:00 16:30:00 17:10:00 17:50:00 18:30:00 19:10:00 19:50:00 20:30:00 21:10:00 21:50:00 22:30:00 23:10:00 Workers Tasks in Queue - Cluster consisting 32x 4 cores - Max 128 worker nodes - Ramps up as jobs get submitted - Tears down nodes when jobs finished - Minimising costs when not in use
  • 15. IDEAL CLUSTER SIZE? 0 200 400 600 800 1000 1200 1400 8 16 32 64 96 128 160 192 224 Seconds Cores Job Run time in seconds Ingest Process Analyse Visualise Store Optimise other parts of the workflow?
  • 16. RUNNING MATLAB CLUSTER ON IAAS AWS vCPUs are hyper-threaded™ Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types (M4, M3, C4, C3, R3, HS1, G2, I2, and D2) https://aws.amazon.com/ec2/instance-types/ Azure does not overcommit memory or cores. vCPUs are physical cores. Azure does not use hyper-threading. https://aws.amazon.com/ec2/instance-types/
  • 17. GRID DEPLOYMENT OPTIONS 1. Infrastructure as a Service (IaaS) DIY Spin up a compute cluster on VMs for additional capacity and new workloads 2. Burst Use existing on premises compute cluster and burst on cloud as required 3. Software as a Service (SaaS) Software vendors and Managed Service Providers provide their own SaaS solutions. Pay for compute and application software per hour 4. Platform as a Service (PaaS) Cloud providers’ data analytics platform as a service: Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR
  • 18. CLOUD HOSTED DATA AND ANALYTICS AS A SERVICE
  • 19. GOOGLE BIG DATA REFERENCE ARCHITECTURE
  • 20. WHAT IS BIGQUERY? Hadoop based “service that enables interactive analysis of massively large datasets” • Distributed File System - Stores data that’s larger than can fit on a single machine • Map Reduce – Distributes processing across multiple systems http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop
  • 22. DON’T FORGET SECURITY Security considerations: • Secure transfer and storage of data and code • Secure remote access to cloud hosted environment • Secure authentication • Windows AD credentials • AWS IAM credentials • Google accounts • Microsoft accounts • Auditing (who accessed what, who changed what)
  • 23. SUMMARY • Traditional grid and HPC tools can benefit from moving into cloud • Vast landscape of available tools • Off-the-shelf PaaS offerings • Integrations and ecosystems • Cheap and very quick to experiment
  • 24. Hentsu Ltd 1 Fore Street London EC2Y 9DT hello@hentsu.com https://hentsu.com MORE INFORMATION?
  • 25. NEXT EVENT: JANUARY 2017 Intellectual Property (IP) security for Public Cloud Services Securing mobile email and cloud based file storage