Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Automated out-of-band management with Ansible and Redfish

2 096 vues

Publié le

Secure, automated and scalable out-of-band management using Ansible and Redfish with Dell EMC PowerEdge servers.

Publié dans : Logiciels
  • Soyez le premier à commenter

Automated out-of-band management with Ansible and Redfish

  1. 1. Talk Title Here Author Name, Company Automated Out-of-Band Management with Ansible and Redfish Jose Delarosa, Dell EMC October 23, 2017
  2. 2. Before we start • Thank you for coming • Please ask questions: It’s OK to interrupt • If time runs out, happy to talk to you afterwards
  3. 3. Who am I • Jose Delarosa (@jdelaros1) – Linux Engineer at Dell EMC (13 years) – Part-time technology evangelist – Part-time developer – Part-time systems engineer – Full-time problem solver
  4. 4. Agenda 1. iDRAC: provides out-of-band management 2. Redfish: provides scalability 3. Ansible: provides automation 4. Putting it all together:
  5. 5. iDRAC Overview
  6. 6. Integrated Dell Remote Access Controller (iDRAC) • Embedded chip on a PowerEdge server, independent of the server’s operating system and main power: – Provides device inventory – Detects hardware failure – Manage power: turn off, on, hard reset • Has its own Ethernet port, usually connected to separate management network • Referred to as “out-of-band” (OOB) management, as opposed to “in- band” management provided by the operating system.
  7. 7. Login
  8. 8. Dashboard
  9. 9. Storage Controller
  10. 10. Hard Drives
  11. 11. Fans
  12. 12. Temperatures
  13. 13. System Event Logs
  14. 14. Simple Out-of-Band Management Management Network
  15. 15. Redfish Overview
  16. 16. What is Redfish? • Open source, open industry standard specification published by the DMTF for hardware management. • A RESTful API used to obtain information and exert control over servers via an OOB controller. • Built on a modern tool-chain, which includes HTTPS, JSON and the OData standard. • A redfish “command” is sent as a URI request, so a client can be any application on a server, workstation or mobile device.
  17. 17. Scalable Out-of-Band Management Management Network https://<idrac>/redfish/v1/Systems/Systems.Embedded.1 { Health OK HealthRollup OK State Enabled }
  18. 18. What can we do with Redfish? • Get server health status • Alert on server health status changes • Retrieve hardware and firmware inventory • Reset, reboot, and power control servers • Access system logs • Configure OOB controller • Much more! https://www.dmtf.org/standards/redfish
  19. 19. Redfish API tree structure
  20. 20. iDRAC operation APIs Dell Redfish API URLs Comments /redfish/v1/Managers /redfish/v1/Managers/iDRAC.Embedded.1 /redfish/v1/Managers/iDRAC.Embedded.1/Actions/Manager.Reset Used to perform iDRAC reset /redfish/v1/Managers/iDRAC.Embedded.1/NetworkProtocol Reports information about iDRAC's network services. Includes Web server, SNMP, vMedia, Telnet, SSH, IPMI & KVM. /redfish/v1/ Managers/iDRAC.Embedded.1/SerialInterfaces iDRAC BMC serial interface /redfish/v1/ Managers/iDRAC.Embedded.1/SerialInterfaces/<Serial-key> /redfish/v1/Managers/iDRAC.Embedded.1/LogServices /redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel Access to server System Event Log /redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Lclog Access to Lifecycle Controller Log /redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Actions/LogService.ClearLog Used to clear LC Log /redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia Status of iDRAC virtual media /redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/<media-type> /redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces iDRAC network interface /redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces/<FQDD> /redfish/v1/Managers/iDRAC.Embedded.1/AccountService /redfish/v1/Managers/iDRAC.Embedded.1/Accounts iDRAC user accounts /redfish/v1/Managers/iDRAC.Embedded.1/Accounts/<Account-Id>
  21. 21. Chassis Inventory APIs Dell Redfish API URLs Comments /redfish/v1/Chassis /redfish/v1/Chassis/System.Embedded.1 Top-level URI for server chassis /redfish/v1/Chassis/System.Embedded.1/Thermal System temperatures /redfish/v1/Chassis/System.Embedded.1/Sensors/Fans Reports fan status for server and FX2 chassis /redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/<Fan-FQDD> /redfish/v1/Chassis/System.Embedded.1/Sensors/Temperatures Reports thermal data for server and FX2 chassis /redfish/v1/Chassis/System.Embedded.1/Sensors/Temperatures/<Sensor-FQDD> <Sensor-FQDD> addresses each temperature probe /redfish/v1/Chassis/System.Embedded.1/Power Power consumption and supply status /redfish/v1/Chassis/System.Embedded.1/Power/PowerControl /redfish/v1/Chassis/System.Embedded.1/Sensors/Voltages /redfish/v1/Chassis/System.Embedded.1/Sensors/Voltages/<Voltage-FQDD> <Voltage-FQDD> addresses each voltage output /redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies /redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/<PSU-FQDD> <PSU-FQDD> addresses each power supply /redfish/v1/Chassis/System.Embedded.1/Power/Redundancy/<PSRedundancy-FQDD> <PSRedundancy-FQDD> addresses power supply redundancy
  22. 22. System status APIs Dell Redfish API URLs Comments /redfish/v1 Top-level API access /redfish/v1/Systems Server inventory and status information access /redfish/v1/Systems/<ServiceTag+nodeid> /redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset Server reset operation /redfish/v1/Systems/System.Embedded.1/Processors Details on CPUs /redfish/v1/Systems/System.Embedded.1/Processors/<Processor-FQDD> /redfish/v1/Systems/System.Embedded.1/EthernetInterfaces Reports NIC IP address, DHCP and DNS information. /redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD> Example <EthernetInterface-FQDD> = NIC.Embedded.1-1-1 /redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD>/Vlans /redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD>/Vlans/<Vlan- FQDD> /redfish/v1/Systems/System.Embedded.1/Storage/Controllers Manages storage controllers (i.e. PERC). /redfish/v1/Systems/System.Embedded.1/Storage/Controllers/<Controller-FQDD> Typical <Controller-FQDD>=RAID.Slot.N-1; describes details of controller, backplane, enclosure, attached drives
  23. 23. Registries, Sessions, Tasks and Event APIs Dell Redfish API URLs Comments /redfish/v1/Registries/Messages/En PowerEdge message registry /redfish/v1/odata Enables OData clients to navigate iDRAC Redfish resources /redfish/v1/$metadata Provides a metadata document describing the resources and collections that are available at the iDRAC Redfish service root URI /redfish/v1/$metadata#<Collection or a single resource> /redfish/v1/JSONSchemas Schema descriptions for all supplied data /redfish/v1/JSONSchemas/<file> /redfish/v1/SessionService Redfish session management /redfish/v1/Sessions /redfish/v1/Sessions/<SessionId> /redfish/v1/TaskService Redfish internal task management /redfish/v1/EventService Redfish event management /redfish/v1/EventService/Actions/EventService.SubmitTestEvent /redfish/v1/EventSubscriptions /redfish/v1/EventSubscriptions/<Subscription ID>
  24. 24. Example: Get system health $ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1 -k -u root:calvin | jq .Status { "Health": "OK", "HealthRollUp": "OK", "State": "Enabled" }
  25. 25. Example: Get storage controller information $ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Slot.4-1 -k -u root:calvin | jq '{name: .Name, status: .Status.Health}’ { "name": "PERC H740P Adapter ", "status": "OK" }
  26. 26. Example: Get HDD information $ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Slot.4-1 -k -u root:calvin | jq .Devices [ { "CapacityBytes": 599550590976, "Manufacturer": "SEAGATE", "Model": "ST600MM0238", "Name": "Physical Disk 0:1:0", "Status": { "Health": "OK", "HealthRollup": "OK", "State": "Enabled" } }, { "CapacityBytes": 599550590976, "Manufacturer": "SEAGATE", "Model": "ST600MM0238", "Name": "Physical Disk 0:1:1", "Status": { "Health": "OK", "HealthRollup": "OK", "State": "Enabled" } },
  27. 27. Example: Get fan information $ curl -s https://<idrac>/redfish/v1/Chassis/System.Embedded.1/Thermal -k -u root:calvin | jq '.Fans[] | {name:.Name, reading: .Reading, health: .Status.Health}' { "name": "System Board Fan1", "reading": 5640, "health": "OK" } { "name": "System Board Fan2", "reading": 5640, "health": "OK" } { "name": "System Board Fan3", "reading": 5640, "health": "OK" } ..
  28. 28. Example: Get thermal information $ curl -s -k -u root:calvin | jq '.Temperatures[] | {name:.Name, readingCelsius: .ReadingCelsius, health: .Status.Health}' { "name": "CPU1 Temp", “readingCelsius": 29, "health": "OK" } { "name": "CPU2 Temp", “readingCelsius": 28, "health": "OK" } { "name": "System Board Exhaust Temp", “readingCelsius": 29, "health": "OK" } ..
  29. 29. Example: Get system event logs $ curl -s https://<idrac-ip>/redfish/v1/Managers/iDRAC.Embedded.1/Logs/Sel -k -u root:calvin | jq '.Members[] | {date: .Created, message: .Message, severity: .Severity}' .. { "date": "2017-09-26T13:33:00-05:00", "message": "Power supply redundancy is lost.", "severity": "Critical" } { "date": "2017-09-26T13:32:53-05:00", "message": "The power input for power supply 2 is lost.", "severity": "Critical" } { "date": "2017-09-16T10:37:59-05:00", "message": "Log cleared.", "severity": "Ok" }
  30. 30. Redfish Roadmap • Version 1.x focused on servers. Will expand over time to cover storage and network infrastructure. • Will add devices over time to cover new technologies (i.e. NVDIMMs, Multifunction Network Adapters) • SNIA is developing Swordfish, which builds upon Redfish’s local storage management to address advanced storage devices. • Open source efforts (http://github.com/dmtf) – Client libraries (Python, Java, PowerShell) – Redfish Mockup Creator / Server – Redfishtool (CLI utility similar to ipmitool)
  31. 31. Ansible Overview
  32. 32. What is Ansible • Automation software  makes repetitive tasks easy • Agentless  minimum footprint • No database back end  easy to install and use • Defines desired state  OK to run task more than once • Remote tasks are run in parallel  Efficient • Easier to learn and use than shell scripts
  33. 33. Ansible use cases OpenStack • Compute nodes • Storage nodes • Controller nodes IT Security Hardening • Firewall rules • Remove invalid users • Install latest updates Container Management • Stop/remove containers • Refresh container images • Deploy with new images  1-to-n management  Executes tasks in parallel
  34. 34. Popular Ansible use cases • Cloud infrastructure deployments • Application installation & configuration • Container management • Security compliance • IT audits • OOB systems management
  35. 35. Ansible definitions • Task: A task is the smallest unit of work. It can be an action like “Install a package”, “Remove a user”, “Create a firewall rule” or “Copy this file to this location”. • Play: A play is made up of tasks. For example, the Play “Prepare a database” is made up of tasks: – Task: “Install the database package” – Task: “Set password” – Task: “Create database” – Task: “Set database access”. • Playbook: A playbook is made up of Plays. For example the playbook “Prepare a web site with a database” is made of up Plays: 1) “Set up the database server” and 2) “Set up the web server”. Playbook: Setup my web application Play 1: Setup database Task 1: Install mysql package Task 2: Create database customer_db Play 2: Setup web server Task 1: Install httpd package Task 2: Configure site for TLS
  36. 36. Example: Automating routine tasks $ groupadd admin $ useradd -c Sys Admin -g admin -m sysman $ mkdir /opt/tools $ chmod 755 /opt/tools $ chown sysman /opt/tools $ yum -y install httpd $ yum -y update $ systemctl enable httpd $ systemctl start httpd $ rm /etc/motd - name: daily tasks hosts: my_100_daily_servers tasks: - group: name=admin state=present - user: name=sysman comment="Sys Admin" group=admin - file: path=/opt/tools state=directory owner=sysman mode=0755 - yum: name=httpd state=latest - yum: name=* state=latest - service: name=httpd state=started enabled=yes - file: path=/etc/motd state=absent Say you provision 100 servers every day and you run these commands in each server: The same commands can be placed in an Ansible playbook and executed in 100 servers: yaml format = easy to learn! yaml format = easy to organize!
  37. 37. One more Ansible definition • Module – An Ansible module is a program where a task is implemented. – A playbook is where you specify the instructions (tasks) you want to run; a module is the code to implement those instructions. – Modules can be written in any language but most popular is Python. – If you are an operator, you will work mostly with playbooks. – If you are a developer, you will work mostly with modules.
  38. 38. Scalable & Automated Out-of-Band Management Management Network { Health OK HealthRollup OK State Enabled } https://<idrac>/redfish/v1/Systems/Systems.Embedded.1
  39. 39. Putting it all together: iDRAC + Redfish + Ansible
  40. 40. Ansible module and playbooks for iDRAC • Manage your entire Dell EMC IT infrastructure (servers, routers, switches, storage) from an Ansible controller. • Automated monitoring, provisioning, firmware updates at scale. • Open source, so you can write your own extensions as needed and contribute back to the community. • To be included as an Ansible community module.
  41. 41. • Server Power On/Off; Reboot; Hard Reset • Install BIOS, Configure BIOS, Reset to Default • Configure iDRAC (CRUD operations): – User & Password Management – NTP and Time Zone settings – Storage (RAID, Physical Disks, Virtual Disks) • System Inventory – H/W, Firmware, Sensor • OS Deployment – remote file share, vMedia • Import / Export SCP – remote file share, vMedia • Backup and Restore – Server Profiles Key lifecycle management tasks • Upgrade using DSU (Dell Server Update) or DUEC (Dell Update Engine for Consoles) – Get list of available and applicable updates – Firmware Upgrade – BIOS Upgrade – OS Drivers Upgrade • Job Management – Check JOB status – Create JOB – Delete JOB – Create JOB Queue – Delete JOB Queue • Get Logs – Export LC logs – Export System Event Logs
  42. 42. Implementation example: get system inventory Playbook: Execute: Output:
  43. 43. Inventory spreadsheet • Collect inventory data, maintain in spreadsheet or database Server iDRAC IP Model IP address BIOS CPU Type RAM Service Tag Status webserver-1 PowerEdge R630 2.3.4 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 5W14Q47 OK webserver-2 PowerEdge R630 2.3.4 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 5XR7Q32 OK webserver-3 PowerEdge R630 2.3.2 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 128 5XT3QYY OK appserver-1 PowerEdge R830 2.3.2 4 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.60GHz 512 5XR7QXY OK dbserver-1 PowerEdge R730 2.1.2 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.33GHz 256 5XR7Q67 OK dbserver-2 PowerEdge R730 2.3.4 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.33GHz 256 5WT4Q37 OK dbserver-3 PowerEdge R730 2.3.4 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.33GHz 256 5WR4Q12 OK dbserver-4 PowerEdge R730 2.3.4 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.33GHz 256 5TT1Q21 OK
  44. 44. Implementation example: get HDD info and health Playbook: Output:
  45. 45. Implementation example: get BIOS boot order Playbook: Execute: Output:
  46. 46. https://github.com/dell/idrac-ansible-module Development is ongoing. Contributions are welcome!
  47. 47. Resources • iDRAC with Lifecycle Controller: http://dell.to/2qdBd0y • Redfish API specification: http://bit.ly/2gb9VBj • Dell EMC PowerEdge Redfish API Overview: http://dell.to/2odsH1p • iDRAC Redfish API Reference Guide: http://dell.to/2oyjMTy • Getting started with Ansible: http://bit.ly/2oCj5xy • jq JSON parser: https://stedolan.github.io/jq/
  48. 48. Thank you Q & A