3. Outline
• Servers
– Concepts & Definitions
– Novel properties of servers
• Clusters
• The Cloud
4. Servers – Concepts & Definitions
“A computer or program that
supplies data or resources to other
machines on a network.”
• File Server • Email Server
• Database Server • iTunes Server
• Web Server • Computing Server
server. (n.d.). Collins English Dictionary - Complete & Unabridged 10th Edition. Retrieved March 25, 2013,
from Dictionary.com website: http://dictionary.reference.com/browse/server
5. Servers – Concepts & Definitions
• Same hardware
components as your
Personal Computer
– Processor, Memory, Pow
er Supply, Hard Drive
• Often stacked in a rack
Image from: http://www.stealth.com/industrial_rackmounts_sr1501datasheet.htm
6. Servers – Concepts & Definitions
• Same hardware
components as your
Personal Computer
– Processor, Memory, Pow
er Supply, Hard Drive
• Often stacked in a rack
http://www.daystarinc.com/hosting-facility
7. Is this a server?
Image from: http://www.stealth.com/industrial_rackmounts_sr1501datasheet.htm
8. Is this a server?
Image from: http://mediapool.getthespec.com/media.jpg?m=gBLSSTJ6IbHLuZD1JNnmyw%3D%3D&v=HR
9. Is this a server?
Image from: http://www.maximumpc.com/articles/reviews/hardware
10. Is this a server?
Image from: http://www.phonearena.com/image.php?m=Articles.Images&f=name&id=28259&name=GT-
I8520_1.jpg&caption=&title=Image+from+%22UPDATED%3A+Samsung+I8520+is+an+Android+phone+with+built-in+projector%22&kw=&popup=1
12. Common Attributes of a Server
• Often runs an Operating System geared
towards servers.
• Primarily accessed remotely
– Often “headless” (no monitor)
• Runs 24/7, minimize downtime
• May be kept in a data center
– Superior cooling, increased security, etc.
• Redundancy (Power, Disk Storage)
• More powerful and expensive
13. Operating System
Client PCs Servers
• Windows (XP, Vista, 7, 8) • Linux (Red Hat, Suse
• Mac OS Enterprise, Ubuntu
• Linux Server)
(Ubuntu, Mint, openSUS • Windows (Windows
E) Server 2003, 2008, 2012)
• Non-Linux Unix
(BSD, Solaris, AIX)
14. Remote Access & the Shell
• Typically don’t have
physical access to the
server, must access
over a network
• Windows is heavily
graphical, access
using “Remote
Desktop”
Image from http://www.softsalad.com/software/remote-desktop-control.html
15. Remote Access & the Shell
• Typically don’t have
physical access to the
server, must access
over a network
• Windows is heavily
graphical, access
using “Remote
Desktop”
• Linux is less
graphical, access via
a “Shell”
Image from http://www.softsalad.com/software/remote-desktop-control.html
16. Shell Access
1. User logs in
2. User types command
3. Computer executes Shell
command and prints
output
4. User types another
command
5. …
6. User logs off
Modified from http://software-carpentry.org/4_0/shell/intro.html
20. Shell Access
• Slow learning curve
• Can often be confusing at first, requires a new way
of thinking
• Ultimately very powerful and efficient
• Three reasons to use:
1. It’s your only choice for remote access on some non-
graphical systems
2. Many software tools only offer Command Line
interfaces
3. Allows for powerful new combinations of tools
Modified from http://software-carpentry.org/4_0/shell/intro.html
21. Data Centers
• Redundant, independent power feeds
– Diesel generator backup
• Redundant Internet connections
• Redundant cooling
• 24/7/365 staffing, restricted access
22. RAID
Disk 1
• “Redundant Array of
Disk 2 Independent Disks”
RAID
• Store information
Disk 3
Array redundantly
Disk 4 • Support failure of
one or more hard
Disk 5
drives without losing
data
23. Server Computing Power
• Often very expensive machines
• Hardware designed to support more resources
than a PC
– May have dozens or hundreds of GB of RAM
– Very expensive powerful processor, or even
multiple processors
25. Example Problem
• Group of 10
researchers
• Too many concurrent
users, runs slowly
• Have some very large
jobs
26. Naïve Solution
• Buy more independent
servers!
• Let people connect to
whichever server they
want
• Problems:
– Not sure which servers
are busiest
– Still takes weeks to run
big simulations
27. Clustered Solution
• Servers are “nodes” in a
cluster
• Log in via head node
• Head node manages
requested jobs
– Submits them to “worker”
or “slave” nodes
– Intelligently calculates
available resources on
each worker node
• Multiple nodes can work
on a single task
28. Job Submission
• Prepare a script to be executed (“myjob.sh”)
– Include specifications on resources required
• (“-l nodes=2:ppn=4”)
– Or what queue it should be submitted to
• Different queues have different priorities and permissions
• Submit that job to the head node (“qsub
myjob.sh”)
• Head node will begin executing as soon as it
has sufficient resources
42. Clusters
• Solve problem of sharing resources
• Allow multiple nodes to collaborate on a single
job
– Programs must be specifically designed to run in
this fashion
• Can solve very large problems by combining
hundreds of nodes together
– Global weather forecasting, particle collisions at
CERN, etc.
43. HPC at UT Southwestern
• QBRC manages an 18 node cluster on-
campus.
• Have access to Texas Advanced Computing
Center (TACC) at UT Austin
– 6,400 node cluster with > 100k cores
– Attracts many users, often a queue before your
jobs will run.
45. Cloud Computing
• Vendors with access to massive computing
resources began leasing their servers out
– Amazon, Microsoft, Google, Rackspace
– Charge per hour of use, usually just a few cents.
46. Cloud Computing - Advantages
• No up-front purchase/cost
• No hardware to manage
• 100 servers in parallel is the same cost as a
single server running for 100 hours
– Can get parallel jobs done much more quickly
47. Cloud Computing - Disadvantages
• Data must be transferred over the Internet
– Can take hours to upload a large sequencing
experiment.
• Can be more expensive than internal clusters
Notes de l'éditeur
Lots of overlap between these – could have a single machine that runs all of these services.
Answer: we don’t know.Top row could be plugged in under my desk and used only to watch YouTube and edit Word documents.Bottom row could be formally installed in a data center and used to host a website.In reality, there’s not a clear distinction. Most “clients”/PCs actually run some “serving” software such as file or media sharing.11m
Most servers will meet most/all of these criteria. Of course, you could really argue that just about any technical device is a server of something, but this list defines typical usage.14m
See mostly Linux in Academia due to licensing concerns, among other reasons.
Only way to access many remote, non-graphical systems. Much more efficient.Many tools only have CLIsAllows for unique combination of tools, patch togetherGraphical systems are truly easier to work with for many tasks, but when you start getting into bioinformatic analysis, you’ll begin to appreciate the power of the shell.Want to count how many times a particular motif occurs in a sequence file? Can be done in one line by combining two commands on a shell.
Makes sense to invest in a server everyone in the group can access and share remotely, rather than buying everyone a more powerful computer.
30m
We have a group of 10 researchersWe have been sharing a single server, but have outgrown itToo many people want to use it and it slows down when we all do.One group member is running a simulation that will take 2 weeks to complete if only run on this server.
Requires a lot of manual effort to inspect which server is the most available right now.If somebody starts running a big job on the server your on, it will slow down drasticallyStill no way for these independent servers to collaborate on big jobs
Not all software supports parallelization. Must be specifically written with that in mind.
User 4 logs in to the head node
User 4 creates a job.
Head node reads the job description and finds that the user will need one node.Sees that node #3 is not being used, assigns the job to node #3.
User 9 logs in
User 9 submits a job
Head node reads the job description and finds that the user will need one node.Sees that node #1 is not being used, assigns the job to node #1
User 2 logs in
User 2 submits a job requiring 2 nodes to run collaboratively.
Head node checks existing nodes, finds that there aren’t 2 nodes available. Places user2’s job in the queue.
Job #2 finishes
Head node receives notification, (optionally) notifies User 9 that his/her job is complete.
Head node now checks to see if there are sufficient resources to run the next job, there are, so it initializes the job.
Head node now checks to see if there are sufficient resources to run the next job, there are, so it initializes the job.