TeideHPC is a High Performance Computing infrastructure, used for research and development tasks in a variety of areas like weather forecasting, astrophysics, CFD or bioinformatics. We are also involved in other fields of study, less related with R&D such as render or streaming services, and other with changing requirements, such the evaluation of pilot environments for open government/open data, research institutes or companies from the engineering sector.
In those jobs we have found OpenNebula as a great solution with many capabilities but also with some limitations. The aim of these session is to show our experience with OpenNebula, from a technical viewpoint, presenting some topics on deployment/configuration with Cobbler/Chef, Infiniband virtualization with SR-IOV, Power management and Cloud bursting limitations.
Author Biography
Carlos Ignacio González Vila is a Software and System Engineer. Worked in the Research Support Computing Service in the same University for three years, doing a wide variety of tasks such as development of research applications for data analysis, project management or system administration of the clusters and supercomputers of the University.
Since 2012 he is working as a High Performance Computing System Administrator in the Technological and Renewable Energies Institute (ITER) of Tenerife (Canary Islands), involved in the TeideHPC supercomputer proyect, a computing infrastructure with more than 1100 computing nodes and nearly 40 TB of RAM memory. His job includes storage management, ethernet and infiniband network, scientific application and cloud infrastructure.
2. ITER
Founded by the Cabildo Insular de Tenerife in 1990, the island's administrative
authority. Its objectives are to promote research activities and technological
development, especially those related with the use of the renewable energies.
15. Study case – Infiniband
virtualization
● OpenNebula Ecosystem
● KVM SR-IOV Driver
– Single root I/O Virtualization
Virtual
functions
PCIe device
16. Study case – Infiniband
virtualization
● OpenNebula Ecosystem
● KVM SR-IOV Driver
– Single root I/O Virtualization
Virtual
functions
PCIe device
# lspci
b0:00.0 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3]
b0:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family
[ConnectX-3/ConnectX-3 Pro Virtual Function]
b0:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family
[ConnectX-3/ConnectX-3 Pro Virtual Function]
● Enable option in BIOS
● Enable IO virtualization in kernel boot line
● Define number of Virtual Functions in kernel module load
● Burn SR-IOV capable firmware in the HCA (OEM)
● Upgrade OFED version (2.3-2.0.5-rhel6.6)
17. Study case – Infiniband
virtualization
● OpenNebula Ecosystem
● KVM SR-IOV Driver
● Great A'Tuin
– GPU devices support was funded by SURFsara
OpenNebula 4.14 Release notes... Support for GPU consumables,
giving the ability to give exlcusive PCI passthrough access to VMs to
GPU cards, for HPC computing.
19. Study case – Interoperability
limitations
● 2 research centers
– Share resources
● Conectivity
● Federation
● Cloud bursting
20. Study case – Interoperability
limitations
● 2 research centers
– Share resources
● Conectivity
● Federation
● Cloud bursting
● End-to-end connection
– L2-L3
– Jumbo frames
● 1 lambda - 10 Gbps
● Ping < 40 ms
● VPN IPsec
● +600Mbps between 1Gbps
hosts
21. Study case – Interoperability
limitations
● 2 research centers
– Share resources
● Conectivity
● Federation
● Cloud bursting
Tightly coupled
22. Study case – Interoperability
limitations
● 2 research centers
– Share resources
● Conectivity
● Federation
● Cloud bursting
23. Study case – Interoperability
limitations
● 2 research centers
– Share resources
● Conectivity
● Federation
● Cloud bursting
OpenNebula 4.8 docs... “The remote provider could be a
commercial Cloud service, such as Amazon EC2, IBM
SoftLayer or Microsoft Azure, or a partner infrastructure
running a different OpenNebula instance”