Multi-Tenant Pharma HPC Clusters

Multi-Tenant Research Clusters

v 2.0

1

I’m Chris.

I’m an infrastructure geek.

I work for the BioTeam.

www.bioteam.net - Twitter: @chris_dag 2

Your substitute host ...
Speaking on behalf of others

‣ Original speaker can’t
make it today
‣ Stepping in as substitute
due to involvement in
Assessment & Deployment
project phases
‣ Just about everything in
this presentation is the
work of other, smarter,
people!
3

Our Case Study:
Sanoﬁ S.A.

4

Pharma Multi-Tenant HPC
Case Study: Sanofi S.A.

‣ Sanofi: multinational pharmaceutical with
worldwide research & commercial locations
‣ 7 major therapeutic areas: cardiovascular,
central nervous system, diabetes, internal
medicine, oncology, thrombosis and vaccines
‣ Other Sanofi S.A. companies: Merial, Chattem,
Genzyme & Sanofi Pasteur

5

History

‣ System discussed here is among 1st major
outcomes of a late-2011 global review effort
called “HPC-O” (HPC Optimization)
‣ HPC-O involved:
• Revisiting prior HPC recommendations
• Intensive data-gathering & cataloging of HPC resources
• North America: Interviews with 30+ senior scientists, IT &
scientiﬁc leadership, system operators & all levels of global
IT and infrastructure support services
• Similar effort across EU operations as well
6

HPC-O Recommendations

‣ Build a new shared-services HPC environment
• Model/prototype for future global HPC
• Designed to meet scientiﬁc & business requirements
• Multiple concurrent users, groups and business units
• ... use IT building blocks that are globally approved and
supported 24x7 by Global IS (“GIS”) organization
‣ Site the initial system in the Boston area
7

Current Status

‣ Online since
November 2012
‣ Approaching end of
initial round of testing,
optimization and user
acceptance work
‣ Today: currently running
large-scale workloads
but not formally in
Production status
8

Why Multi-Tenant Cluster?

‣ HPC Mission & Scope Creep
• 11+ separate HPC systems in North America alone
• Wildly disparate technology, product & support models
• “Islands” of HPC used by single business units
• Almost no detectable cross-system or cross-site usage
• Huge variance in refresh/upgrade cycles

9

Why Multi-Tenant Cluster, cont.

‣ Utilization & Efficiency
• Islands of HPC tended to be underutilized most of the
time and oversubscribed by a single business unit during
peak demand times
• Hardware age and capability varied widely due to huge
differences in maintenance cycles by business unit
• Cost of commercial software licensing (globally) hugely
significant; difficult to maximize ROI & use of very
expensive software entitlements across “islands” of HPC
10

Why Multi-Tenant Cluster, cont.

‣ Need for “Opportunistic Capacity”
• Difficult to perform exploratory research outside the
normal scope of business unit activities
‣ Avoid “Shadow IT” problems
• Frustrated users will find their own DIY solutions
• ... “the cloud” is just a departmental credit card away
‣ “Chaperoned” cloud-bursting
• Centrally managed “chaperoned” utilization of IaaS cloud
resources for specific workloads & data
11

In-house vs. Cloud

‣ Cloud options studied intensively; still made
decision to invest in signiﬁcant regional HPC
‣ Some Reasons
• Baseline “always available” capability
• Ability to obsessively tune performance
• Security
• Cost control (many factors here ...)
• Data size, movement & lifecycle issues
• Agility
12

In-house vs. Cloud

‣ HPC Storage: “Center of Gravity” for Scientific Data
• Compute power is pretty easy and not super expensive
• Storage need, even just for the Boston region is peta-scale
• Mapping data flows and access patterns reveals a very complex web of
researcher, instrument, workstation and pipeline interactions with storage
‣ In a nutshell:
• Engineer the heck out of a robust peta-scale R&D storage platform for
the Boston region
• Drop a reasonable amount of HPC capability near this storage
• Bias all engineering/design efforts to facilitate agility/change
• Use the cloud only when best-fit
13

Picture Tour
What it actually looks like ...

14

Key Enabling Technologies

22

Enabling Technologies: Facility

‣ A Sanoﬁ company has a suitable local colo suite
• ... already under long-term lease
• ... and with a bit of server consolidation, lots of space for
HPC compute and storage
• ... plenty of room for “adjunct” systems that will likely be
attracted to storage “center of gravity”
‣ Can’t reveal exact size but this facility can handle
double-digit numbers of additional HPC compute,
storage and network cabinets
23

Enabling Technologies: WAN

‣ Regional consolidated HPC is not possible
without MAN/WAN efforts to connect all sites
and users
• ... direct routing required; not optimal to route HPC
traffic through Corporate Tier-1 facilities that may be
thousands of miles away
• Existing MAN/WAN network links upgraded if there was
a business/scientific justification
• All other MAN/WAN links verified that expansion is easy/
possible should a business need arise
24

Enabling Technologies: WAN

‣ Regional Networking Result
• Most sites: bonded 1-Gigabit path to regional HPC hub
• A Cambridge building has direct 10-Gigabit Ethernet link to
the HPC hub; used for heavy data movement as well as
ingest of data arriving on physical media
• Special routing (HTTP, FTP,) in place for satellite locations
not yet on the converged Enterprise WAN/MAN
• HPC Hub Facility:
- Dedicated HPC-only internet link for open-data downloads
- Internet-2 connection being pursued for EDU collaboration
25

Architecture
Case Study

26

Architecture
Philosophy

‣ Intense desire to keep things simple
‣ Commodity works very well; avoid the expensive and
the exotic when we can
‣ Extra commodity capacity compensates for
performance lost by not choosing the exotic
competition
• Also delivers more agility and easier reuse/repurposing
‣ If we build from globally-blessed IT components we can
eventually turn basic operation, maintenance and
monitoring over to the Global IS organization
• ... freeing Research IT staff to concentrate on science & users
27

Architecture
HPC Stack

‣ Explicit decision made to source the HPC cluster
stack from a commercial provider
• This is actually a radical departure from prior HPC efforts

‣ Many evaluated; one chosen
‣ Primary drivers:
• 24x7 commercial support
• Research IT staff needs to concentrate on apps/users
• “Single SKU” Out-of-the-box functionality and features (bare
metal provisioning, etc.) that reduce operational burden
28

Architecture
HPC Stack - Bright Computing

‣ Bright Computing selected
• Hardware neutral
• Scheduler neutral
• Full API, CLI and lightweight
monitoring stack
• Web GUIs for non-experts
• Single dashboard for advanced
monitoring and management
• Data-aware scheduling & native
support for AWS cloud bursting
29

Architecture
Compute Hardware

‣ Compute Hardware

30

Architecture
Compute Hardware

‣ Key Design Goals
• Use common server
conﬁg for as many nodes
as possible
• Modular & extensible
design
• “Blessed” by Global IS
(GIS) organization

31

Architecture
Compute Hardware

‣ HP C7000 Blade
Enclosures
• Our basic building block
• Very flexible on network,
interconnect and blade
configuration
• Sanofi GIS approved
• “Lights-out” facility approved
• Pre-negotiated preferential
pricing on almost everything
we needed
32

Architecture
Compute Hardware

‣ HP C7000 Blade Enclosure
becomes the smallest
modular unit in HPC design
‣ Big cluster built from
smaller preconﬁgured
“blocks” of C7000s
‣ 4 standard “blocks”:
• M-Block
• C-Block
• G-Block
• X-Block
33

Architecture
Compute Hardware

‣ M-Block (Mgmt)
• HP BL460c Blades
- Dual-socket quad-core
with 96GB RAM & 1TB
mirrored OS disks

‣ 2x HA Master Node(s)
‣ 1x Mgmt Node
‣ 3x HPC Login Node(s)
‣ ... plenty of room ...
34

Architecture
Compute Hardware

‣ C-Block (Compute)
• HP BL460c Blades
- Dual-socket quad-core
with 96GB RAM & 1TB
mirrored OS disks

‣ Fully populated with 16
blades per enclosure
‣ Set of 8 C-Blocks =
1024 CPU Cores
35

Architecture
Compute Hardware

‣ G-Block (GPU)
• No C7000; HP s6500
enclosure used for G-Block
units
‣ HP SL250s Servers
‣ 3x Tesla GPUs per SL250s
Server
‣ ... 15 Tﬂop per G-Block
36

Architecture
Compute Hardware

‣ X-Block C7000
• Hosting of “Adjunct Servers”
• X-block for unique requirements
that don’t ﬁt into a standard C,G or
M-block conﬁguration; or for
servers supplied by business units
‣ Big Memory Nodes
‣ Virtualization Platform(s)
‣ Big SMP Nodes
‣ Graphics/Viz Nodes
‣ Application Servers
‣ Database Servers
37

Architecture
Compute Hardware

‣ Modular design can
grow into double-digit
numbers of datacenter
cabinets
• C-blocks and G-blocks for
compute; M-blocks and X-
blocks for Mgmt and special
cases
• 8-core 96GB RAM, 1TB
BL460c blade is standard
individual server conﬁg;
deviation only when required
38

Architecture
Network Hardware

‣ Network

39

Architecture
Network Hardware

‣ Network
‣ Key design decision:
• 10Gigabit Ethernet only
• No Inﬁniband*
‣ Fully redundant Cisco
Nexus 10Gb Fabric

40

Architecture
Network Hardware

‣ Cisco Nexus 10G
• Redundant Everything
• C7000 enclosures (M, X and
C-blocks) have 40Gb uplinks
(80Gb possible)
• G-Blocks and misc systems
have 10Gb links
• 20Gb bandwidth to each
storage node
• Easily expanded; centrally
managed, monitored and
controlled
41

Architecture
Storage Hardware

‣ Storage

42

Architecture
Storage Hardware

‣ EMC Isilon Scale-out NAS
• ~1 petabyte raw for active use
• ~1 petabyte raw for backup
‣ Why Isilon?
• Large, single-namespace scaling
beyond our most aggressive capacity
projections
• Easy to manage / GIS Approved
• Aggregate throughput increases with
capacity expansion
• Tiering & SSD options
43

Architecture
External Connectivity

‣ Dedicated Internet circuit for
new HPC Hub
• Direct download/ingest of large
public datasets without affecting
other business users
• Downloads don’t hit MAN/WAN
networks & avoid the centrally routed
Enterprise internet egress point
located hundreds of miles away
• Very handy for Cloud/VPN efforts as
well
‣ Internet 2
• I2 and other high speed academic
network connectivity planned
44

Architecture
Physical data ingest

‣ Large Scale Data
Ingest & Export
• Often overlooked; very
important!

‣ Dedicated Data Station
• 10 Gig link to HPC Hub
• Fast CPUs for checksum and
integrity operations
• Removable SATA/SAS bays
• Lots of USB & eSATA ports
45

One More Thing ...

46

One more thing ...
Not just a single cluster

‣ Single cluster? Nope.

47

One more thing ...
‣ Single cluster? Nope.
‣ The secret sauce is in the
facility, storage and
network core
‣ Petabytes of scientiﬁc
data have a “gravitational
pull” within an enterprise
‣ ... we expect many new
users and use cases to
follow
48

One more thing ...

‣ We can support:
• Additional clusters & analytic platforms
grafted onto our network and storage core
• Validated server, software and cluster
environments collocated in close proximity
• Integration with private cloud and
virtualization environments
• Integration with public IaaS clouds
• Dedicated Hadoop / Big Data
environments
• On-demand reconﬁguration of C-Blocks
into HDFS/Hadoop-optimized mini clusters
• And much more ...

49

Beyond the hardware bits ...

50

Beyond the hardware ...
Many other critical factors involved

‣ Lets Discuss:
• Requirements Gathering
• Building Trust
• Governance
• Support Model

51

Requirements Gathering

‣ When seven-figure CapEx amounts are involved
you can’t afford to make a mistake
‣ Capturing business & scientific requirements is
non trivial
• ... especially when trying to account for future needs
‣ Not a 1 person / 1 department job
• ... requires significant expertise and insider knowledge
spanning science, software, business plans and both
research and global IT staff
52

Requirements Gathering
Our approach

1. Keep the core project team small & focused
• Engage niche resources (legal, security, etc) on demand
2. Promiscuous (“meet with anyone”) data gathering,
meeting & discussion philosophy
3. Strong project management / oversight
4. Public support from senior leadership
5. Frequent sync-up with key leaders & groups
• Global facility/network/storage/support orgs, Research budget
& procurement teams, senior scientiﬁc leadership, etc.
53

Building Trust
Consolidated HPC requires trust

‣ Previous: Many independent islands of HPC
• ... often built/supported/run by local resources
‣ Moving to a shared-services model requires great
trust among users & scientiﬁc leadership
• Researchers have low tolerance for BS/incompetence
• Informatics is essential; users need to be reassured that
current capabilities will be maintained while new capabilities will
be gained
• Enterprise IT must be willing to prove it understands & can
support the unique needs and operational requirements of
research informatics
54

Building Trust
Our approach

‣ Our Approach:
• Strong project team with deep technical & institutional
experience. Team members could answer any question
coming from researchers or business unit professionally
and with an aura of expertise & competence
• Explicit vocal support from senior IT and research
leadership (“We will make this work. Promise.”)
• Willingness to accept & respond to criticism & feedback
- ... especially when someone smashes a poor assumption or
ﬁnds a gap in the planned design
55

Governance

‣ Tied for ﬁrst place among “reasons why
centralized HPC deployments fail”
‣ Multi-Tenant HPC Governance is essential
‣ ... and often overlooked

56

Governance
The basic issue

‣ ... in research HPC settings there are certain
things that should NEVER be dictated by IT
‣ It is not appropriate for an IT SysAdmin to ...
• Create or alter resource allocation policies & quotas
• Decide what users/groups get special treatment
• Decide what software can and can not be used
• ... etc.
‣ A governance structure involving scientiﬁc
leadership and user representation is essential
57

Governance
Our Approach

‣ Two committees: “Ops” and “Overlord”
‣ Ops Committee: Users & HPC IT staff
coordinating HPC operations jointly
‣ Overlord Committee: Invoked as needed.
Tiebreaker decisions, busts through political/
organizational walls and approve funding/
expansion decisions

58

Governance
Our Approach

‣ Ops Committee communicates frequently and is
consulted before any user-affecting changes occur
• Membership is drawn from interested/engaged HPC “power
users” from each business unit + the HPC Admin Team

‣ Ops Committee “owns” HPC scheduler & queue
policies and approves/denies any requests for
special treatment. All scheduler/policy changes
are blessed by Ops before implementation
‣ This is the primary ongoing governance group

59

Governance
Our Approach

‣ Overlord Committee meets only as needed
• Membership: the scariest heavy hitters we could recruit
from senior scientific and IT leadership
- VP or Director level is not unreasonable
• This group needs the most senior people you can find.
Heavy hitters required when mediating between
conflicting business units or busting through political/
organizational barriers
- Committee does not need to be large, just powerful
60

Support Model
Our Approach

‣ Often-overlooked or under-resourced
‣ We are still working on this ourselves
‣ General model
• Transition server, network and storage maintenance & monitoring
over to Global IS as soon as possible
• Free up rare HPC Support FTE resources to concentrate on
enabling science & supporting users
• Offer frequent training and local “HPC mentor” attention
• Online/portal tools that facilitate user communication, best practice
advice and collaborative “self-support” for common issues
• Still TBD: Helpdesk, Ticketing & Dashboards
61

end; Thanks!
Slides: http://slideshare.net/chrisdag/
62

Multi-Tenant Pharma HPC Clusters

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Multi-Tenant Pharma HPC Clusters

Similaire à Multi-Tenant Pharma HPC Clusters (20)

Plus de Chris Dagdigian

Plus de Chris Dagdigian (6)

Dernier

Dernier (20)

Multi-Tenant Pharma HPC Clusters