In this video from ISC'14, Trev Harmon from Adaptive Computing presents: HPC, Cloud & Big Workflow: What's New in Moab 8.0.
Learn more: http://adaptivecomputing.com
Watch the video presentation: http://wp.me/p3RLHQ-cvN
Other top 100 notables, Kracken. Jaguar, Cielo, Beeken, GAEA, HPCC, Red Sky, Big Red
Long history of running the most powerful systems in the world.
See our website for links to our awards.
The maker of the world’s most advanced workload and resource management system
Adaptive Computing has proven leadership in accelerating IT to speed and improve business based on our 10 + years in delivering workload management and IT decision engine software. This leadership has been recognized in many ways with:
50+patents filed or approved
51% revenue growth in 2010
Acceleration centric investments in 2010 from leaders like Intel Capital, Epic ventures and Tudor Ventures who recognize our HPC market strength and cloud solution leadership and opportunity
Global partnerships with organizations like HP, IBM, Cray, Microsoft and many others who utilize our innovative and leading decision engine to make their solutions more competitive and innovative for their customers
And most importantly by customers who us to accelerate and manage their IT environments, the most dynamic and scale-intensive on the planet, a few of which are called out here, with heterogeneous, extreme-scale, multiple data centers with complex workloads, resources and priorities and decisions that we help accelerate in self-optimizing environment.
In fact our software is used on well over $2 billion of hardware
Moab needs to improve its data-staging speed and reliability using several means for both grid and normal cluster operations.
Moab will stage input and output data using “system jobs”. System jobs execute on the Moab head node, are tracked separately from user jobs, and
<click> can have their own walltime limit which Moab would dynamically calculate using the same information used in grid job scheduling. T
<click> This allows Moab to stage data with dependencies between the user job and the system jobs.
<click> Moab needs to support other file transfer utilities for both grid and cluster operations; e.g., open-source products such as “rsync” and “gsiscp” and commercially available products, such as Aspera, that have additional security, reliability, and possible parallel operation.
Different applications often require different clock frequencies in order to achieve maximum power savings, as evidenced by the dashed green line in the two graphs. The application on the left will consume the least power at the 1400 MHz clock frequency while the application on the right consumes the least power at the 1600 MHz frequency.
Administrators or users must benchmark applications usage at different clock frequencies to obtain the runtimes and power consumption measurements for all frequencies in order to determine the optimal frequency for consuming the least amount of power.
Of course, the benefits of power savings - lower operational costs - must be balanced against the benefits of faster execution times - better ROI on capital costs - which really means balancing both benefits to obtain the minimum total cost of ownership (TCO).
For example, looking at the right graph the power consumption at 1.6 GHz is ~16950 joules and at 1.8 GHz is ~17125, which is about 1% more power, but the job execution time drops from ~73 seconds to ~66 seconds, which is 7 seconds or about 10% less time. A user may choose to run the application at 1.8 GHz because it can run in less time at very little additional cost. To illustrate, the user can run the application 11 times at 1.8 GHz in the time it takes to run the application 10 times at 1.6 GHz, with a power cost of 11.1 for 1.8 GHz compared to a power cost of 11.0 running the application 11 times at 1.6 GHz, which the user may consider a great deal!
Kilby’s new CPU clock job submission option indicates the user’s desire to force the compute nodes allocated to the job to run with the specified clock frequency or power governor policy. This new option can be specified on both Moab’s msub and TORQUE’s qsub job submission commands.
In addition, an administrator can assign the CPU clock job submission option to a Moab job template, which means all job submissions matching the job template will acquire inherit its CPU clock option value.
When a CPU clock job submission option sets the clock frequency of the job’s allocated compute nodes and the job executes, then after the job completes TORQUE sets the clock frequency back to the value the node had before the job executed on the node.
<click 1> The cpuclock= option can specify the absolute clock frequency, a Linux power governor policy, or a relative P-state or “performance-state” value.
<click 2> An absolute clock frequency is specified in units of megahertz and must have a value of 16 or higher. For example, a value of 2200 means the processor(s) on the job’s compute nodes will run at 2200 MHz or 2.2 GHz. If a processor does not support the exact clock frequency value, TORQUE will select the available clock frequency closest to the specified value.
<click 3> A Linux power governor policy is specified using one of the keywords identified in the slide. All compute nodes run the job with the specified power governor policy.
<click 4> A relative P-state number is a value in the range 0 to 15. The value 0 means to run a processor at its “turbo” frequency. The values 1-15 refer to the available fastest non-turbo clock frequency to the available slowest non-turbo clock frequency, respectively. Processors may not support all 15 possible P-states, which means any P-state value larger than the last supported P-state defaults to the last supported P-state, which is the lowest and slowest clock frequency the processor supports.
Moab’s “green” policy management understands only two so-called “power” states that are really scheduling availability states.
“On” means a compute node is available for scheduling
“Off” means a compute node is not available for scheduling
A new “Power Idle Node Action” parameter permits the Moab administrator to specify what the “off” or “not-schedulable” power state should actually be when Moab applies green policy to idle nodes in order to reduce energy consumption.
<click 1> Allowed values are standby, suspend, sleep, hibernate, shutdown, and off (currently the only value available). They are listed in least-power-savings to most-power-savings and least-time-to-awaken to most-time-to-awaken order.
Note the green power management feature saves energy by placing a compute node in a non-active state, meaning it is not executing software.
<click 2> As part of Moab handling additional low-power/no-power states, the Kilby release will include new power management “reference” scripts.
Better view into your system utilization and what Moab is doing
He figures his priority formula is not giving the user enough priority so he adjusts it in the priority policy “sound board” interface…
With it, he can quickly understand if changing one or more weights will influence his user’s job priority.
Notes for Engineering:
The mixing board sliders are logarithmic
When a component’s weight is set to 0, the whole control is supposed to be grayed out BUT the slider position for the sub components must be maintained (although their weight changes to 0)
The search bar on top needs to go away in the policy management page
Notice how the policy tabs are actually on the top – this is the design we must follow!