SlideShare une entreprise Scribd logo
1  sur  62
Multi-Tenant Research Clusters

                                 v 2.0

                                         1
I’m Chris.

I’m an infrastructure geek.

I work for the BioTeam.

             www.bioteam.net - Twitter: @chris_dag   2
Your substitute host ...
Speaking on behalf of others

 ‣ Original speaker can’t
   make it today
 ‣ Stepping in as substitute
   due to involvement in
   Assessment & Deployment
   project phases
 ‣ Just about everything in
   this presentation is the
   work of other, smarter,
   people!
                               3
Our Case Study:
Sanofi S.A.




                  4
Pharma Multi-Tenant HPC
Case Study: Sanofi S.A.




 ‣ Sanofi: multinational pharmaceutical with
   worldwide research & commercial locations
 ‣ 7 major therapeutic areas: cardiovascular,
   central nervous system, diabetes, internal
   medicine, oncology, thrombosis and vaccines
 ‣ Other Sanofi S.A. companies: Merial, Chattem,
   Genzyme & Sanofi Pasteur

                                                  5
History
Case Study: Sanofi S.A.

 ‣ System discussed here is among 1st major
   outcomes of a late-2011 global review effort
   called “HPC-O” (HPC Optimization)
 ‣ HPC-O involved:
   •   Revisiting prior HPC recommendations
   •   Intensive data-gathering & cataloging of HPC resources
   •   North America: Interviews with 30+ senior scientists, IT &
       scientific leadership, system operators & all levels of global
       IT and infrastructure support services
   •   Similar effort across EU operations as well
                                                                       6
HPC-O Recommendations
Case Study: Sanofi S.A.



 ‣ Build a new shared-services HPC environment
   •   Model/prototype for future global HPC
   •   Designed to meet scientific & business requirements
   •   Multiple concurrent users, groups and business units
   •   ... use IT building blocks that are globally approved and
       supported 24x7 by Global IS (“GIS”) organization
 ‣ Site the initial system in the Boston area
                                                                   7
Current Status

 ‣ Online since
   November 2012
 ‣ Approaching end of
   initial round of testing,
   optimization and user
   acceptance work
 ‣ Today: currently running
   large-scale workloads
   but not formally in
   Production status
                               8
Why Multi-Tenant Cluster?
Case Study: Sanofi S.A.




 ‣ HPC Mission & Scope Creep
   •   11+ separate HPC systems in North America alone
   •   Wildly disparate technology, product & support models
   •   “Islands” of HPC used by single business units
   •   Almost no detectable cross-system or cross-site usage
   •   Huge variance in refresh/upgrade cycles

                                                               9
Why Multi-Tenant Cluster, cont.
Case Study: Sanofi S.A.


 ‣ Utilization & Efficiency
   •   Islands of HPC tended to be underutilized most of the
       time and oversubscribed by a single business unit during
       peak demand times
   •   Hardware age and capability varied widely due to huge
       differences in maintenance cycles by business unit
   •   Cost of commercial software licensing (globally) hugely
       significant; difficult to maximize ROI & use of very
       expensive software entitlements across “islands” of HPC
                                                                  10
Why Multi-Tenant Cluster, cont.
Case Study: Sanofi S.A.

 ‣ Need for “Opportunistic Capacity”
   •   Difficult to perform exploratory research outside the
       normal scope of business unit activities
 ‣ Avoid “Shadow IT” problems
   •   Frustrated users will find their own DIY solutions
   •   ... “the cloud” is just a departmental credit card away
 ‣ “Chaperoned” cloud-bursting
   •   Centrally managed “chaperoned” utilization of IaaS cloud
       resources for specific workloads & data
                                                                  11
In-house vs. Cloud
Case Study: Sanofi S.A.

 ‣ Cloud options studied intensively; still made
   decision to invest in significant regional HPC
 ‣ Some Reasons
   •   Baseline “always available” capability
   •   Ability to obsessively tune performance
   •   Security
   •   Cost control (many factors here ...)
   •   Data size, movement & lifecycle issues
   •   Agility
                                                   12
In-house vs. Cloud
Case Study: Sanofi S.A.

 ‣ HPC Storage: “Center of Gravity” for Scientific Data
   •   Compute power is pretty easy and not super expensive
   •   Storage need, even just for the Boston region is peta-scale
   •   Mapping data flows and access patterns reveals a very complex web of
       researcher, instrument, workstation and pipeline interactions with storage
 ‣ In a nutshell:
   •   Engineer the heck out of a robust peta-scale R&D storage platform for
       the Boston region
   •   Drop a reasonable amount of HPC capability near this storage
   •   Bias all engineering/design efforts to facilitate agility/change
   •   Use the cloud only when best-fit
                                                                                    13
Picture Tour
What it actually looks like ...




                                  14
15
16
17
18
19
20
21
Key Enabling Technologies




                            22
Enabling Technologies: Facility
Case Study: Sanofi S.A.


 ‣ A Sanofi company has a suitable local colo suite
   •   ... already under long-term lease
   •   ... and with a bit of server consolidation, lots of space for
       HPC compute and storage
   •   ... plenty of room for “adjunct” systems that will likely be
       attracted to storage “center of gravity”
 ‣ Can’t reveal exact size but this facility can handle
   double-digit numbers of additional HPC compute,
   storage and network cabinets
                                                                       23
Enabling Technologies: WAN
Case Study: Sanofi S.A.


 ‣ Regional consolidated HPC is not possible
   without MAN/WAN efforts to connect all sites
   and users
   •   ... direct routing required; not optimal to route HPC
       traffic through Corporate Tier-1 facilities that may be
       thousands of miles away
   •   Existing MAN/WAN network links upgraded if there was
       a business/scientific justification
   •   All other MAN/WAN links verified that expansion is easy/
       possible should a business need arise
                                                                 24
Enabling Technologies: WAN
Case Study: Sanofi S.A.

 ‣ Regional Networking Result
   •       Most sites: bonded 1-Gigabit path to regional HPC hub
   •       A Cambridge building has direct 10-Gigabit Ethernet link to
           the HPC hub; used for heavy data movement as well as
           ingest of data arriving on physical media
   •       Special routing (HTTP, FTP,) in place for satellite locations
           not yet on the converged Enterprise WAN/MAN
   •       HPC Hub Facility:
       -     Dedicated HPC-only internet link for open-data downloads
       -     Internet-2 connection being pursued for EDU collaboration
                                                                           25
Architecture
Case Study




               26
Architecture
Philosophy

 ‣ Intense desire to keep things simple
 ‣ Commodity works very well; avoid the expensive and
   the exotic when we can
 ‣ Extra commodity capacity compensates for
   performance lost by not choosing the exotic
   competition
   •   Also delivers more agility and easier reuse/repurposing
 ‣ If we build from globally-blessed IT components we can
   eventually turn basic operation, maintenance and
   monitoring over to the Global IS organization
   •   ... freeing Research IT staff to concentrate on science & users
                                                                         27
Architecture
HPC Stack


 ‣ Explicit decision made to source the HPC cluster
   stack from a commercial provider
  •   This is actually a radical departure from prior HPC efforts

 ‣ Many evaluated; one chosen
 ‣ Primary drivers:
  •   24x7 commercial support
  •   Research IT staff needs to concentrate on apps/users
  •   “Single SKU” Out-of-the-box functionality and features (bare
      metal provisioning, etc.) that reduce operational burden
                                                                     28
Architecture
HPC Stack - Bright Computing

 ‣ Bright Computing selected
   •   Hardware neutral
   •   Scheduler neutral
   •   Full API, CLI and lightweight
       monitoring stack
   •   Web GUIs for non-experts
   •   Single dashboard for advanced
       monitoring and management
   •   Data-aware scheduling & native
       support for AWS cloud bursting
                                        29
Architecture
Compute Hardware




 ‣ Compute Hardware




                      30
Architecture
Compute Hardware



 ‣ Key Design Goals
   •   Use common server
       config for as many nodes
       as possible
   •   Modular & extensible
       design
   •   “Blessed” by Global IS
       (GIS) organization

                                 31
Architecture
Compute Hardware

 ‣ HP C7000 Blade
   Enclosures
   •   Our basic building block
   •   Very flexible on network,
       interconnect and blade
       configuration
   •   Sanofi GIS approved
   •   “Lights-out” facility approved
   •   Pre-negotiated preferential
       pricing on almost everything
       we needed
                                        32
Architecture
Compute Hardware

 ‣ HP C7000 Blade Enclosure
   becomes the smallest
   modular unit in HPC design
 ‣ Big cluster built from
   smaller preconfigured
   “blocks” of C7000s
 ‣ 4 standard “blocks”:
   •   M-Block
   •   C-Block
   •   G-Block
   •   X-Block
                                33
Architecture
Compute Hardware

 ‣ M-Block (Mgmt)
   •       HP BL460c Blades
       -    Dual-socket quad-core
            with 96GB RAM & 1TB
            mirrored OS disks

 ‣ 2x HA Master Node(s)
 ‣ 1x Mgmt Node
 ‣ 3x HPC Login Node(s)
 ‣ ... plenty of room ...
                                    34
Architecture
Compute Hardware


 ‣ C-Block (Compute)
   •       HP BL460c Blades
       -    Dual-socket quad-core
            with 96GB RAM & 1TB
            mirrored OS disks

 ‣ Fully populated with 16
   blades per enclosure
 ‣ Set of 8 C-Blocks =
   1024 CPU Cores
                                    35
Architecture
Compute Hardware


 ‣ G-Block (GPU)
   •   No C7000; HP s6500
       enclosure used for G-Block
       units
 ‣ HP SL250s Servers
 ‣ 3x Tesla GPUs per SL250s
   Server
 ‣ ... 15 Tflop per G-Block
                                    36
Architecture
Compute Hardware

 ‣       X-Block C7000
     •     Hosting of “Adjunct Servers”
     •     X-block for unique requirements
           that don’t fit into a standard C,G or
           M-block configuration; or for
           servers supplied by business units
 ‣       Big Memory Nodes
 ‣       Virtualization Platform(s)
 ‣       Big SMP Nodes
 ‣       Graphics/Viz Nodes
 ‣       Application Servers
 ‣       Database Servers
                                                  37
Architecture
Compute Hardware

 ‣ Modular design can
   grow into double-digit
   numbers of datacenter
   cabinets
   •   C-blocks and G-blocks for
       compute; M-blocks and X-
       blocks for Mgmt and special
       cases
   •   8-core 96GB RAM, 1TB
       BL460c blade is standard
       individual server config;
       deviation only when required
                                      38
Architecture
Network Hardware




 ‣ Network




                   39
Architecture
Network Hardware




 ‣ Network
 ‣ Key design decision:
   •   10Gigabit Ethernet only
   •   No Infiniband*
 ‣ Fully redundant Cisco
   Nexus 10Gb Fabric

                                 40
Architecture
Network Hardware

 ‣ Cisco Nexus 10G
   •   Redundant Everything
   •   C7000 enclosures (M, X and
       C-blocks) have 40Gb uplinks
       (80Gb possible)
   •   G-Blocks and misc systems
       have 10Gb links
   •   20Gb bandwidth to each
       storage node
   •   Easily expanded; centrally
       managed, monitored and
       controlled
                                     41
Architecture
Storage Hardware




 ‣ Storage




                   42
Architecture
Storage Hardware

 ‣ EMC Isilon Scale-out NAS
   •   ~1 petabyte raw for active use
   •   ~1 petabyte raw for backup
 ‣ Why Isilon?
   •   Large, single-namespace scaling
       beyond our most aggressive capacity
       projections
   •   Easy to manage / GIS Approved
   •   Aggregate throughput increases with
       capacity expansion
   •   Tiering & SSD options
                                             43
Architecture
External Connectivity

 ‣       Dedicated Internet circuit for
         new HPC Hub
     •     Direct download/ingest of large
           public datasets without affecting
           other business users
     •     Downloads don’t hit MAN/WAN
           networks & avoid the centrally routed
           Enterprise internet egress point
           located hundreds of miles away
     •     Very handy for Cloud/VPN efforts as
           well
 ‣       Internet 2
     •     I2 and other high speed academic
           network connectivity planned
                                                   44
Architecture
Physical data ingest

 ‣ Large Scale Data
   Ingest & Export
   •   Often overlooked; very
       important!

 ‣ Dedicated Data Station
   •   10 Gig link to HPC Hub
   •   Fast CPUs for checksum and
       integrity operations
   •   Removable SATA/SAS bays
   •   Lots of USB & eSATA ports
                                    45
One More Thing ...




                     46
One more thing ...
Not just a single cluster




 ‣ Single cluster? Nope.




                            47
One more thing ...
Not just a single cluster
 ‣ Single cluster? Nope.
 ‣ The secret sauce is in the
   facility, storage and
   network core
 ‣ Petabytes of scientific
   data have a “gravitational
   pull” within an enterprise
 ‣ ... we expect many new
   users and use cases to
   follow
                                48
One more thing ...
Not just a single cluster

 ‣       We can support:
     •     Additional clusters & analytic platforms
           grafted onto our network and storage core
     •     Validated server, software and cluster
           environments collocated in close proximity
     •     Integration with private cloud and
           virtualization environments
     •     Integration with public IaaS clouds
     •     Dedicated Hadoop / Big Data
           environments
     •     On-demand reconfiguration of C-Blocks
           into HDFS/Hadoop-optimized mini clusters
     •     And much more ...


                                                        49
Beyond the hardware bits ...

                               50
Beyond the hardware ...
Many other critical factors involved




 ‣ Lets Discuss:
   •   Requirements Gathering
   •   Building Trust
   •   Governance
   •   Support Model


                                       51
Requirements Gathering

‣ When seven-figure CapEx amounts are involved
  you can’t afford to make a mistake
‣ Capturing business & scientific requirements is
  non trivial
 •   ... especially when trying to account for future needs
‣ Not a 1 person / 1 department job
 •   ... requires significant expertise and insider knowledge
     spanning science, software, business plans and both
     research and global IT staff
                                                               52
Requirements Gathering
Our approach


 1. Keep the core project team small & focused
   • Engage niche resources (legal, security, etc) on demand
 2. Promiscuous (“meet with anyone”) data gathering,
    meeting & discussion philosophy
 3. Strong project management / oversight
 4. Public support from senior leadership
 5. Frequent sync-up with key leaders & groups
   • Global facility/network/storage/support orgs, Research budget
      & procurement teams, senior scientific leadership, etc.
                                                                     53
Building Trust
Consolidated HPC requires trust

 ‣ Previous: Many independent islands of HPC
   •   ... often built/supported/run by local resources
 ‣ Moving to a shared-services model requires great
   trust among users & scientific leadership
   •   Researchers have low tolerance for BS/incompetence
   •   Informatics is essential; users need to be reassured that
       current capabilities will be maintained while new capabilities will
       be gained
   •   Enterprise IT must be willing to prove it understands & can
       support the unique needs and operational requirements of
       research informatics
                                                                             54
Building Trust
Our approach

 ‣ Our Approach:
   •       Strong project team with deep technical & institutional
           experience. Team members could answer any question
           coming from researchers or business unit professionally
           and with an aura of expertise & competence
   •       Explicit vocal support from senior IT and research
           leadership (“We will make this work. Promise.”)
   •       Willingness to accept & respond to criticism & feedback
       -     ... especially when someone smashes a poor assumption or
             finds a gap in the planned design
                                                                        55
Governance




‣ Tied for first place among “reasons why
  centralized HPC deployments fail”
‣ Multi-Tenant HPC Governance is essential
‣ ... and often overlooked



                                             56
Governance
The basic issue

 ‣ ... in research HPC settings there are certain
   things that should NEVER be dictated by IT
 ‣ It is not appropriate for an IT SysAdmin to ...
   •   Create or alter resource allocation policies & quotas
   •   Decide what users/groups get special treatment
   •   Decide what software can and can not be used
   •   ... etc.
 ‣ A governance structure involving scientific
   leadership and user representation is essential
                                                               57
Governance
Our Approach



 ‣ Two committees: “Ops” and “Overlord”
 ‣ Ops Committee: Users & HPC IT staff
   coordinating HPC operations jointly
 ‣ Overlord Committee: Invoked as needed.
   Tiebreaker decisions, busts through political/
   organizational walls and approve funding/
   expansion decisions

                                                    58
Governance
Our Approach


 ‣ Ops Committee communicates frequently and is
   consulted before any user-affecting changes occur
   •   Membership is drawn from interested/engaged HPC “power
       users” from each business unit + the HPC Admin Team

 ‣ Ops Committee “owns” HPC scheduler & queue
   policies and approves/denies any requests for
   special treatment. All scheduler/policy changes
   are blessed by Ops before implementation
 ‣ This is the primary ongoing governance group

                                                                59
Governance
Our Approach


 ‣ Overlord Committee meets only as needed
   •       Membership: the scariest heavy hitters we could recruit
           from senior scientific and IT leadership
       -     VP or Director level is not unreasonable
   •       This group needs the most senior people you can find.
           Heavy hitters required when mediating between
           conflicting business units or busting through political/
           organizational barriers
       -     Committee does not need to be large, just powerful
                                                                     60
Support Model
Our Approach

 ‣ Often-overlooked or under-resourced
 ‣ We are still working on this ourselves
 ‣ General model
   •   Transition server, network and storage maintenance & monitoring
       over to Global IS as soon as possible
   •   Free up rare HPC Support FTE resources to concentrate on
       enabling science & supporting users
   •   Offer frequent training and local “HPC mentor” attention
   •   Online/portal tools that facilitate user communication, best practice
       advice and collaborative “self-support” for common issues
   •   Still TBD: Helpdesk, Ticketing & Dashboards
                                                                               61
end; Thanks!
Slides: http://slideshare.net/chrisdag/
                                          62

Contenu connexe

Tendances

2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersChris Dagdigian
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy editionChris Dwan
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014Ari Berman
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015Patrick Kalaher
 

Tendances (20)

2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015
 

En vedette

HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 
IT Change Management Using JIRA
IT Change Management Using JIRAIT Change Management Using JIRA
IT Change Management Using JIRAAtlassian
 

En vedette (7)

HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
IT Change Management Using JIRA
IT Change Management Using JIRAIT Change Management Using JIRA
IT Change Management Using JIRA
 

Similaire à Multi-Tenant Pharma HPC Clusters

How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing IceotopePR
 
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014Gene Leyzarovich
 
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY
 
Open Marketing Meeting 03/27/2013
Open Marketing Meeting 03/27/2013Open Marketing Meeting 03/27/2013
Open Marketing Meeting 03/27/2013OpenStack
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016INDUSCommunity
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3David Byte
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Subbu Rama
 
Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPenguin Computing
 
Penguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin Computing
 
Penguin Computing Designing and Deploying End to End HPC and AI Solutions
Penguin Computing Designing and Deploying End to End HPC and AI SolutionsPenguin Computing Designing and Deploying End to End HPC and AI Solutions
Penguin Computing Designing and Deploying End to End HPC and AI SolutionsKristi King
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данныхCEE-SEC(R)
 
Big data talk barcelona - jsr - jc
Big data talk   barcelona - jsr - jcBig data talk   barcelona - jsr - jc
Big data talk barcelona - jsr - jcJames Saint-Rossy
 
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to Take
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to TakeWebinar: Is Your Storage Ready for Commercial HPC? – Three Steps to Take
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to TakeStorage Switzerland
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...Grid Dynamics
 
Is Your Storage Ready for Commercial HPC? - Three Steps to Take
Is Your Storage Ready for Commercial HPC? - Three Steps to TakeIs Your Storage Ready for Commercial HPC? - Three Steps to Take
Is Your Storage Ready for Commercial HPC? - Three Steps to TakePanasas
 
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloudWebinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloudThomas Francis
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...Altair
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 

Similaire à Multi-Tenant Pharma HPC Clusters (20)

How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing
 
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014
QLogic - CrossIT - ACNC/JetStor FibricCache VMUG 2014
 
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
 
Open Marketing Meeting 03/27/2013
Open Marketing Meeting 03/27/2013Open Marketing Meeting 03/27/2013
Open Marketing Meeting 03/27/2013
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
 
Penguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI Solutions
 
Penguin Computing Designing and Deploying End to End HPC and AI Solutions
Penguin Computing Designing and Deploying End to End HPC and AI SolutionsPenguin Computing Designing and Deploying End to End HPC and AI Solutions
Penguin Computing Designing and Deploying End to End HPC and AI Solutions
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
 
Big data talk barcelona - jsr - jc
Big data talk   barcelona - jsr - jcBig data talk   barcelona - jsr - jc
Big data talk barcelona - jsr - jc
 
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to Take
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to TakeWebinar: Is Your Storage Ready for Commercial HPC? – Three Steps to Take
Webinar: Is Your Storage Ready for Commercial HPC? – Three Steps to Take
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
 
Is Your Storage Ready for Commercial HPC? - Three Steps to Take
Is Your Storage Ready for Commercial HPC? - Three Steps to TakeIs Your Storage Ready for Commercial HPC? - Three Steps to Take
Is Your Storage Ready for Commercial HPC? - Three Steps to Take
 
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloudWebinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 

Plus de Chris Dagdigian

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedChris Dagdigian
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchChris Dagdigian
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the TrenchesChris Dagdigian
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 

Plus de Chris Dagdigian (6)

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Multi-Tenant Pharma HPC Clusters

  • 2. I’m Chris. I’m an infrastructure geek. I work for the BioTeam. www.bioteam.net - Twitter: @chris_dag 2
  • 3. Your substitute host ... Speaking on behalf of others ‣ Original speaker can’t make it today ‣ Stepping in as substitute due to involvement in Assessment & Deployment project phases ‣ Just about everything in this presentation is the work of other, smarter, people! 3
  • 5. Pharma Multi-Tenant HPC Case Study: Sanofi S.A. ‣ Sanofi: multinational pharmaceutical with worldwide research & commercial locations ‣ 7 major therapeutic areas: cardiovascular, central nervous system, diabetes, internal medicine, oncology, thrombosis and vaccines ‣ Other Sanofi S.A. companies: Merial, Chattem, Genzyme & Sanofi Pasteur 5
  • 6. History Case Study: Sanofi S.A. ‣ System discussed here is among 1st major outcomes of a late-2011 global review effort called “HPC-O” (HPC Optimization) ‣ HPC-O involved: • Revisiting prior HPC recommendations • Intensive data-gathering & cataloging of HPC resources • North America: Interviews with 30+ senior scientists, IT & scientific leadership, system operators & all levels of global IT and infrastructure support services • Similar effort across EU operations as well 6
  • 7. HPC-O Recommendations Case Study: Sanofi S.A. ‣ Build a new shared-services HPC environment • Model/prototype for future global HPC • Designed to meet scientific & business requirements • Multiple concurrent users, groups and business units • ... use IT building blocks that are globally approved and supported 24x7 by Global IS (“GIS”) organization ‣ Site the initial system in the Boston area 7
  • 8. Current Status ‣ Online since November 2012 ‣ Approaching end of initial round of testing, optimization and user acceptance work ‣ Today: currently running large-scale workloads but not formally in Production status 8
  • 9. Why Multi-Tenant Cluster? Case Study: Sanofi S.A. ‣ HPC Mission & Scope Creep • 11+ separate HPC systems in North America alone • Wildly disparate technology, product & support models • “Islands” of HPC used by single business units • Almost no detectable cross-system or cross-site usage • Huge variance in refresh/upgrade cycles 9
  • 10. Why Multi-Tenant Cluster, cont. Case Study: Sanofi S.A. ‣ Utilization & Efficiency • Islands of HPC tended to be underutilized most of the time and oversubscribed by a single business unit during peak demand times • Hardware age and capability varied widely due to huge differences in maintenance cycles by business unit • Cost of commercial software licensing (globally) hugely significant; difficult to maximize ROI & use of very expensive software entitlements across “islands” of HPC 10
  • 11. Why Multi-Tenant Cluster, cont. Case Study: Sanofi S.A. ‣ Need for “Opportunistic Capacity” • Difficult to perform exploratory research outside the normal scope of business unit activities ‣ Avoid “Shadow IT” problems • Frustrated users will find their own DIY solutions • ... “the cloud” is just a departmental credit card away ‣ “Chaperoned” cloud-bursting • Centrally managed “chaperoned” utilization of IaaS cloud resources for specific workloads & data 11
  • 12. In-house vs. Cloud Case Study: Sanofi S.A. ‣ Cloud options studied intensively; still made decision to invest in significant regional HPC ‣ Some Reasons • Baseline “always available” capability • Ability to obsessively tune performance • Security • Cost control (many factors here ...) • Data size, movement & lifecycle issues • Agility 12
  • 13. In-house vs. Cloud Case Study: Sanofi S.A. ‣ HPC Storage: “Center of Gravity” for Scientific Data • Compute power is pretty easy and not super expensive • Storage need, even just for the Boston region is peta-scale • Mapping data flows and access patterns reveals a very complex web of researcher, instrument, workstation and pipeline interactions with storage ‣ In a nutshell: • Engineer the heck out of a robust peta-scale R&D storage platform for the Boston region • Drop a reasonable amount of HPC capability near this storage • Bias all engineering/design efforts to facilitate agility/change • Use the cloud only when best-fit 13
  • 14. Picture Tour What it actually looks like ... 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 23. Enabling Technologies: Facility Case Study: Sanofi S.A. ‣ A Sanofi company has a suitable local colo suite • ... already under long-term lease • ... and with a bit of server consolidation, lots of space for HPC compute and storage • ... plenty of room for “adjunct” systems that will likely be attracted to storage “center of gravity” ‣ Can’t reveal exact size but this facility can handle double-digit numbers of additional HPC compute, storage and network cabinets 23
  • 24. Enabling Technologies: WAN Case Study: Sanofi S.A. ‣ Regional consolidated HPC is not possible without MAN/WAN efforts to connect all sites and users • ... direct routing required; not optimal to route HPC traffic through Corporate Tier-1 facilities that may be thousands of miles away • Existing MAN/WAN network links upgraded if there was a business/scientific justification • All other MAN/WAN links verified that expansion is easy/ possible should a business need arise 24
  • 25. Enabling Technologies: WAN Case Study: Sanofi S.A. ‣ Regional Networking Result • Most sites: bonded 1-Gigabit path to regional HPC hub • A Cambridge building has direct 10-Gigabit Ethernet link to the HPC hub; used for heavy data movement as well as ingest of data arriving on physical media • Special routing (HTTP, FTP,) in place for satellite locations not yet on the converged Enterprise WAN/MAN • HPC Hub Facility: - Dedicated HPC-only internet link for open-data downloads - Internet-2 connection being pursued for EDU collaboration 25
  • 27. Architecture Philosophy ‣ Intense desire to keep things simple ‣ Commodity works very well; avoid the expensive and the exotic when we can ‣ Extra commodity capacity compensates for performance lost by not choosing the exotic competition • Also delivers more agility and easier reuse/repurposing ‣ If we build from globally-blessed IT components we can eventually turn basic operation, maintenance and monitoring over to the Global IS organization • ... freeing Research IT staff to concentrate on science & users 27
  • 28. Architecture HPC Stack ‣ Explicit decision made to source the HPC cluster stack from a commercial provider • This is actually a radical departure from prior HPC efforts ‣ Many evaluated; one chosen ‣ Primary drivers: • 24x7 commercial support • Research IT staff needs to concentrate on apps/users • “Single SKU” Out-of-the-box functionality and features (bare metal provisioning, etc.) that reduce operational burden 28
  • 29. Architecture HPC Stack - Bright Computing ‣ Bright Computing selected • Hardware neutral • Scheduler neutral • Full API, CLI and lightweight monitoring stack • Web GUIs for non-experts • Single dashboard for advanced monitoring and management • Data-aware scheduling & native support for AWS cloud bursting 29
  • 30. Architecture Compute Hardware ‣ Compute Hardware 30
  • 31. Architecture Compute Hardware ‣ Key Design Goals • Use common server config for as many nodes as possible • Modular & extensible design • “Blessed” by Global IS (GIS) organization 31
  • 32. Architecture Compute Hardware ‣ HP C7000 Blade Enclosures • Our basic building block • Very flexible on network, interconnect and blade configuration • Sanofi GIS approved • “Lights-out” facility approved • Pre-negotiated preferential pricing on almost everything we needed 32
  • 33. Architecture Compute Hardware ‣ HP C7000 Blade Enclosure becomes the smallest modular unit in HPC design ‣ Big cluster built from smaller preconfigured “blocks” of C7000s ‣ 4 standard “blocks”: • M-Block • C-Block • G-Block • X-Block 33
  • 34. Architecture Compute Hardware ‣ M-Block (Mgmt) • HP BL460c Blades - Dual-socket quad-core with 96GB RAM & 1TB mirrored OS disks ‣ 2x HA Master Node(s) ‣ 1x Mgmt Node ‣ 3x HPC Login Node(s) ‣ ... plenty of room ... 34
  • 35. Architecture Compute Hardware ‣ C-Block (Compute) • HP BL460c Blades - Dual-socket quad-core with 96GB RAM & 1TB mirrored OS disks ‣ Fully populated with 16 blades per enclosure ‣ Set of 8 C-Blocks = 1024 CPU Cores 35
  • 36. Architecture Compute Hardware ‣ G-Block (GPU) • No C7000; HP s6500 enclosure used for G-Block units ‣ HP SL250s Servers ‣ 3x Tesla GPUs per SL250s Server ‣ ... 15 Tflop per G-Block 36
  • 37. Architecture Compute Hardware ‣ X-Block C7000 • Hosting of “Adjunct Servers” • X-block for unique requirements that don’t fit into a standard C,G or M-block configuration; or for servers supplied by business units ‣ Big Memory Nodes ‣ Virtualization Platform(s) ‣ Big SMP Nodes ‣ Graphics/Viz Nodes ‣ Application Servers ‣ Database Servers 37
  • 38. Architecture Compute Hardware ‣ Modular design can grow into double-digit numbers of datacenter cabinets • C-blocks and G-blocks for compute; M-blocks and X- blocks for Mgmt and special cases • 8-core 96GB RAM, 1TB BL460c blade is standard individual server config; deviation only when required 38
  • 40. Architecture Network Hardware ‣ Network ‣ Key design decision: • 10Gigabit Ethernet only • No Infiniband* ‣ Fully redundant Cisco Nexus 10Gb Fabric 40
  • 41. Architecture Network Hardware ‣ Cisco Nexus 10G • Redundant Everything • C7000 enclosures (M, X and C-blocks) have 40Gb uplinks (80Gb possible) • G-Blocks and misc systems have 10Gb links • 20Gb bandwidth to each storage node • Easily expanded; centrally managed, monitored and controlled 41
  • 43. Architecture Storage Hardware ‣ EMC Isilon Scale-out NAS • ~1 petabyte raw for active use • ~1 petabyte raw for backup ‣ Why Isilon? • Large, single-namespace scaling beyond our most aggressive capacity projections • Easy to manage / GIS Approved • Aggregate throughput increases with capacity expansion • Tiering & SSD options 43
  • 44. Architecture External Connectivity ‣ Dedicated Internet circuit for new HPC Hub • Direct download/ingest of large public datasets without affecting other business users • Downloads don’t hit MAN/WAN networks & avoid the centrally routed Enterprise internet egress point located hundreds of miles away • Very handy for Cloud/VPN efforts as well ‣ Internet 2 • I2 and other high speed academic network connectivity planned 44
  • 45. Architecture Physical data ingest ‣ Large Scale Data Ingest & Export • Often overlooked; very important! ‣ Dedicated Data Station • 10 Gig link to HPC Hub • Fast CPUs for checksum and integrity operations • Removable SATA/SAS bays • Lots of USB & eSATA ports 45
  • 46. One More Thing ... 46
  • 47. One more thing ... Not just a single cluster ‣ Single cluster? Nope. 47
  • 48. One more thing ... Not just a single cluster ‣ Single cluster? Nope. ‣ The secret sauce is in the facility, storage and network core ‣ Petabytes of scientific data have a “gravitational pull” within an enterprise ‣ ... we expect many new users and use cases to follow 48
  • 49. One more thing ... Not just a single cluster ‣ We can support: • Additional clusters & analytic platforms grafted onto our network and storage core • Validated server, software and cluster environments collocated in close proximity • Integration with private cloud and virtualization environments • Integration with public IaaS clouds • Dedicated Hadoop / Big Data environments • On-demand reconfiguration of C-Blocks into HDFS/Hadoop-optimized mini clusters • And much more ... 49
  • 50. Beyond the hardware bits ... 50
  • 51. Beyond the hardware ... Many other critical factors involved ‣ Lets Discuss: • Requirements Gathering • Building Trust • Governance • Support Model 51
  • 52. Requirements Gathering ‣ When seven-figure CapEx amounts are involved you can’t afford to make a mistake ‣ Capturing business & scientific requirements is non trivial • ... especially when trying to account for future needs ‣ Not a 1 person / 1 department job • ... requires significant expertise and insider knowledge spanning science, software, business plans and both research and global IT staff 52
  • 53. Requirements Gathering Our approach 1. Keep the core project team small & focused • Engage niche resources (legal, security, etc) on demand 2. Promiscuous (“meet with anyone”) data gathering, meeting & discussion philosophy 3. Strong project management / oversight 4. Public support from senior leadership 5. Frequent sync-up with key leaders & groups • Global facility/network/storage/support orgs, Research budget & procurement teams, senior scientific leadership, etc. 53
  • 54. Building Trust Consolidated HPC requires trust ‣ Previous: Many independent islands of HPC • ... often built/supported/run by local resources ‣ Moving to a shared-services model requires great trust among users & scientific leadership • Researchers have low tolerance for BS/incompetence • Informatics is essential; users need to be reassured that current capabilities will be maintained while new capabilities will be gained • Enterprise IT must be willing to prove it understands & can support the unique needs and operational requirements of research informatics 54
  • 55. Building Trust Our approach ‣ Our Approach: • Strong project team with deep technical & institutional experience. Team members could answer any question coming from researchers or business unit professionally and with an aura of expertise & competence • Explicit vocal support from senior IT and research leadership (“We will make this work. Promise.”) • Willingness to accept & respond to criticism & feedback - ... especially when someone smashes a poor assumption or finds a gap in the planned design 55
  • 56. Governance ‣ Tied for first place among “reasons why centralized HPC deployments fail” ‣ Multi-Tenant HPC Governance is essential ‣ ... and often overlooked 56
  • 57. Governance The basic issue ‣ ... in research HPC settings there are certain things that should NEVER be dictated by IT ‣ It is not appropriate for an IT SysAdmin to ... • Create or alter resource allocation policies & quotas • Decide what users/groups get special treatment • Decide what software can and can not be used • ... etc. ‣ A governance structure involving scientific leadership and user representation is essential 57
  • 58. Governance Our Approach ‣ Two committees: “Ops” and “Overlord” ‣ Ops Committee: Users & HPC IT staff coordinating HPC operations jointly ‣ Overlord Committee: Invoked as needed. Tiebreaker decisions, busts through political/ organizational walls and approve funding/ expansion decisions 58
  • 59. Governance Our Approach ‣ Ops Committee communicates frequently and is consulted before any user-affecting changes occur • Membership is drawn from interested/engaged HPC “power users” from each business unit + the HPC Admin Team ‣ Ops Committee “owns” HPC scheduler & queue policies and approves/denies any requests for special treatment. All scheduler/policy changes are blessed by Ops before implementation ‣ This is the primary ongoing governance group 59
  • 60. Governance Our Approach ‣ Overlord Committee meets only as needed • Membership: the scariest heavy hitters we could recruit from senior scientific and IT leadership - VP or Director level is not unreasonable • This group needs the most senior people you can find. Heavy hitters required when mediating between conflicting business units or busting through political/ organizational barriers - Committee does not need to be large, just powerful 60
  • 61. Support Model Our Approach ‣ Often-overlooked or under-resourced ‣ We are still working on this ourselves ‣ General model • Transition server, network and storage maintenance & monitoring over to Global IS as soon as possible • Free up rare HPC Support FTE resources to concentrate on enabling science & supporting users • Offer frequent training and local “HPC mentor” attention • Online/portal tools that facilitate user communication, best practice advice and collaborative “self-support” for common issues • Still TBD: Helpdesk, Ticketing & Dashboards 61