Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Machine Learning in IT Operations - Sampath Manickam

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 3 Publicité

Machine Learning in IT Operations - Sampath Manickam

While humans currently hold significant responsibility for critical IT operation tasks at present, an Artificial Intelligence - Machine learning based platform will play a more critical and efficient role while humans supporting them. See my blog on Machine learning in IT Operation.

While humans currently hold significant responsibility for critical IT operation tasks at present, an Artificial Intelligence - Machine learning based platform will play a more critical and efficient role while humans supporting them. See my blog on Machine learning in IT Operation.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (19)

Similaire à Machine Learning in IT Operations - Sampath Manickam (20)

Publicité

Plus récents (20)

Machine Learning in IT Operations - Sampath Manickam

  1. 1. Machine learning is a branch of Artificial Intelligence (AI) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. It has a capability to play a significant role in improving IT operations in terms of incident management, root cause analysis, run-book automation and avoidance of future problems and to maintain the highest IT service availability to the end customers. Many enterprises have begun introducing machine-learning and artificial intelligence platforms and automation as part of their IT Operation journey. 83% of businesses say AI is a strategic priority for their businesses today, as per a study by the Boston Consulting Group and MIT Sloan Management Review. Additionally, 63% of businesses say pressure to reduce costs will require them to use AI. While humans currently hold significant responsibility for critical operations at present, an AI-enabled future is possible with machines playing a more critical role and humans supporting them. Humans will be empowered to use a system at scale, leaving the autonomous system to handle routine IT operations. In the context of this article, artificial intelligence can be defined as the use of Big Data analytics, Machine Learning and other artificial intelligence technologies to automate daily IT operations. Such autonomous system will require us to create safety nets in case of incidents and help to monitor, correlate and gain deep insights into data/ problem that the system has been tuned (machine-learned) over the period of time, helping to identify and resolve/prevent the issues that come up. Machine Learning in IT Operations Machine Learning is a subset of Artificial Intelligence, includes various analytics and algorithms to automate, based on sample data to make predictions or decisions without being explicitly programmed to perform various tasks in IT Operations, including event correlation to arrive at Root cause analysis,
  2. 2. tickets, alerts, and Change execution analysis, planned change versus actual change validation and correlating with received logs, alerts, Present & past events and History from multiple sources within IT systems & Tools. The concerted use in IT operations is still in the nascent stages and yet to mature a lot. However, many large enterprises or startups are taking steps towards this journey. Gartner predicts that large enterprise exclusive use of AIOPS and digital experience monitoring tools to monitor applications and infrastructure will rise from 5% in 2018 to 30% in 2023. It might be years in making end-to-end Automation, predict and take the corrective automated action as part of day-to-day IT operations and the methods will vary for each organization or industry. AI and ML are only as good as the right data made available on that platform. Hence, one of the biggest challenges for enterprise is the data management, including what type of data to be collected, where to be collected, real-time or batch processing, where to be stored, how to establish basic relationships between collected data sources, how an engineer feed the right information at the initial stage to tune the system as part of machine learning exercise, etc. As we are dealing with various levels of unstructured data, the correlation is not that obvious. This is a perfect task for a Data Scientist / Data Engineering Team to create various rules between different data sources, determine how to correlate/group them and when it makes sense to do so. This requires enterprises put forth great effort into enterprise Data governance, maintaining and managing the complete platform, the huge amount of performance and data they produce and its overall management of the system. Next comes choosing the right Machine Learning (ML) algorithms as part of the automation platform creation. These algorithms serve as the baseline for the ML behavior to achieve the desired business goals and to meet Objectives in an automated way. Once the Machine learning algorithms tuned based on sample data over the period of time, it knows how to deliver results, we can come out what needs to be automated, i.e. the machine learns itself and performs as designed. ML makes use of all available data sources, aggregating and organizing output data. Each data set can be collected, formatted and cleaned for relevant information with noise and unnecessary data reduced to find trends, patterns and problems. With ML, IT operations are more proactive than reactive, automatically anticipating, identifying and resolving issues in real-time which a human might not have detected from the multiple systems, dashboards and metrics. AI & ML Capabilities A more proactive approach helps to detect issues at an early stage and makes root cause analysis faster and easier. Even if the data set is vast, AI can get a speedy overview to detect the relation between events and issues which will allow for faster troubleshooting. This is especially useful in ensuring security as AI will monitor and detect unusual processes or activities and prioritize and address the possible malware. Not only will the algorithms flag unusual activity faster, but it will also help to detect system capacity issues, predict system failures, etc. When properly implemented, AI frees up the time and attention of IT operation staffs from focusing on routine tasks /processes and allowing them to focus on more complex tasks. AI and ML can automate the management of IT infrastructure by scaling forecasted demand and anticipating requirements based on historical data for storage, memory and processing power. By mapping the workload, the AI is able to recommend the right configuration and improve agility, productivity and efficiency. An additional benefit is insights into the IT environment while streamlining communication between teams and business units.
  3. 3. Conclusion As the world continues to evolve in Digital transformation, operation skills will continue to be needed but the team sizes will reduce with scale growing larger. Companies are adopting these techniques and technologies to stay competitive, cost-effective and efficient. Management of large distributed systems with smaller talent will make a big impact on the organization to be much more efficient. The organization can optimize its platforms with the right workload sizes and as little user intervention as possible. Instead of having to manage a crisis, humans can play a supervisory role and leave the AI to determine the course of action required based on the supporting data and metrics. With many such products in the industry, ever more innovation is taking place to integrate Artificial Intelligence — Machine learning platforms with the existing IT Operations tools, the whole IT industry is getting transformed towards an autonomous system in order to provide seamless IT operation.

×