SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
1
Confidential
Media High Availability Service
November 2020
Confidential
Nazariy Mamrokha - Engineering Director, Media
● More than 13 years of experience in software
development and Media domain. Including 10 years of
software development and 6 years of management
background.
● Strong experience in Media and Broadcasting domain.
● Architecting solutions for:
○ Media OTT applications (mobile, living room, gaming
consoles, Smart TVs, Web),
○ Backend Services (CMS/CDN, Streaming Services,
Subscription Management, Ad-Tech solutions,
Billing and Monetization, Analytics, Application
Store).
● Leading Media Program with 150+ engineers working
on 25+ projects
● Понад 13 років досвіду у розробці програмного
забезпечення та медіа-домені. У тому числі 10
років розробки програмного забезпечення та 6
років досвіду в менеджментів.
● Великий досвід у сфері Медіа та Мовлення.
● Розробка архітектурних рішення для:
○ Медіа/OTT додатків (мобільні, телевізійні
приставки, ігрові консолі, Smart TV,),
○ Бекенд-сервіси (CMS / CDN, потокові
сервіси, сервіси підписки, рішення Ad-Tech,
Монетизація та платежі, сервіси аналітики,
магазини додатків)
● Очолює програму з 25+ проектів у сфері Media,
загальною кількість 150+ людей
3
Confidential
3
Intro to Media
4
Confidential
End-to-End Video Content Lifecycle
Get your content and data
into the system
Perform all necessary content
manipulations
Play on any
device
Deliver to
end-user
MonetizeProduce
content
- Ingest
- Metadata
- Encode -
Transcode
- ABR
- Codecs
- Store
- Host
- Organize
- Scale
- Backup
- CDN
- ABR
- Packaging
- Encrypt
- DRM
- CAS
- Algorithms
- Apps
- Any
platform
- Any
device
- Subscription
management
- Ad Exchange
- Billing
- Manage
- Extract
- Archive
- Search
- Workflows
- Capture -
Edit
- Effects
- Workflows
- Finishing
Analytics
Content providers (TV
networks, studios, video
bloggers, etc.)
Service providers &
technology vendors (telecom,
broadband, CDN, ISVs, etc.)
OEM (connected
devices, consumer
electronics, etc.)
Engineering QA & Automation DevOps MigrationDesign Architecture Support
Industry
Content
GlobalLogic
Engineering Services
Confidential
Cloud video streaming platform
OTT-service-like cloud platform for video content delivery and monetization
VOD
Live
MAM
Metadata
management, CMS
cDVR SSAI
User management
CDN Clients
Metadata
VoD, Linear (HLS)
User profiles, Auth
API
Content with ads
Timeshifted,
TVPersonal
Recordings
VoD Library,
Scheduling, EPG
Ad management
Ad Insertion
Ad Tracking
Ad Decisions
Ingest, Transcode,
Playout, Package
Confidential
Serhiy Onanchenko - NOC Team Leader
● Over 18 years of professional experience in IT
industry
● Full stack developer, DBA, Linux/Windows
environments system administrator, network
engineer
● Supported production-grade ecosystems in Telecom
domain
● Managed support groups (30 members) providing
administration and monitoring services (24/7) for 350+
customers
● Currently manage 12 engineers NOC monitoring
multiple high loaded environments (up to 250K
RPS, 3000+ instances)
● Більше 18 років професійного досвіду в ІТ-
індустрії
● Full stack developer, DBA, адміністратор
Linux/Windows середовищ, інженер мережевого
обладнання
● Підтримував Supported екосистеми виробничого
рівня в домені телекомунікацій
● Був керівником груп підтримки (30 інженерів) які
займались адмініструванням та моніторингом
сервісів (24/7) для 350+ замовників
● В даний момент є менеджером NOC з 12 інженерів
який надає сервіси моніторингу для багатьох
високонавантажених середовищ
(до 250K RPS, 3000+ серверів)
7
Confidential
7
NOC from scratch
8
Confidential
1.NOC - who we are ?
- team structure
- scope
2.Incidents management
3.Monitoring toolset
4.Monitoring challenges
and
best practices
Agenda
9
Confidential
9
NOC - who are are?
Confidential
Current Team structure
1 Team Leader
12 NOC Engineers (2 people per shift)
● Linux, Windows systems
administration, automation
scripting
● Cloud computing and networks
● Web applications and servers
architecture, HTTP, REST API
● Monitoring tools and principles
● Strong troubleshooting and
problem-solving skills
● Good English language skills
Confidential
NOC setup
Confidential
Questions to audience
Poll #1
What is the largest environment you supported ?
Confidential
● 5+ products, 1000+ B2B customers
● 9+ AWS production environments
● Microservices, Kubernetes clusters
● 3000+ running instances
● up to 250K RPS
Scope
Availability target: up to 99.995% =
max 30.2 sec of downtime weekly
MTTA
(Mean Time to Acknowledge)
Target - 1 minute
Confidential
Responsibilities
● Infrastructure, Services monitoring
● Incident management and documenting
● Monitoring systems and checks maintaining,
implementations of new metrics and monitoring scenarios
● Keep and update a directory of all 3rd parties
● One focal point that always knows the service level and issues status
● Defining reliable and preventive monitoring requirements as part of the product development life cycle
● Communication, coordination, collaboration
15
Confidential
15
Incidents management
Confidential
Incident management process
Confidential
Questions to audience
What incident management tools you used to work with ?
Poll #2
18
Confidential
18
Monitoring toolset
Confidential
Monitoring toolset OpsGenie
Prometheus
Grafana
Amazon CloudWatch
PRTG
Dotcom-Monitor
Foglight
Witbe robot
Youbora
Logz.io
+
multiple
custom scripts/sensors
Confidential
Questions to audience
What monitoring tools do you use for
production environment monitoring ?
Poll #3
Confidential
Dotcom-monitor
Confidential
Grafana
Confidential
Logz.io
Confidential
Youbora Analytics
Confidential
Witbe Robots
Witbe robots for end-to-end scenarios
testing on any device (PC, smartphone,
STB) and Quality of Experience (QoE)
monitoring.
26
Confidential
26
Monitoring challenges
and
best practices
Confidential
Monitoring challenges
● Mix of infrastructures setups and products
● Black Box monitoring
● Noise and false-positives
● Anomalies detection
● Multiple communication channels
● Complicated and long Runbooks Human in the middle
real-time operations
Confidential
SRE Golden Signals to monitor
There are three common methodologies:
● From the Google SRE book: Latency, Traffic, Errors, and
Saturation
● USE Method (from Brendan Gregg): Utilization, Saturation, and
Errors
● RED Method (from Tom Wilkie): Rate, Errors, and Duration
Useful references:
#1 #2 #3
Confidential
The USE Method
Methodology for analyzing the performance of any system
A summary of USE is
“For every resource, check utilization, saturation, and errors.”
Resource: all physical server functional components (CPUs, disks,...)
● Utilization: the average time the resource was busy servicing work
● Saturation: the degree to which the resource has extra work
which it can’t service, often queued
● Errors: the count of error events
Confidential
The RED Method
Methodology for services analysis
A summary of RED is
“For every service, check rate, errors, and duration.”
● Rate: the number of requests per second
● Errors: the number of those requests that are failing
● Duration: the amount of time those requests take
Confidential
Confidential
Anton Bil - Senior Software Engineer
● Over 8 years of professional experience in IT industry
● Strong experience Linux/Windows environments
system administrator, DevOps, SRE
● As a SRE supported highly loaded infrastructures with
more than 7,000+ servers. Media and CND services.
● Currently works as SRE which provides services in
support, optimization and automation in high loaded
environments (up to 250K RPS, 3000+ instances)
● Більше 8 років професійного досвіду в ІТ-
індустрії
● Великий досвід у адмініструванні Linux/Windows
середовищ, DevOps, SRE
● Як SRE підтримував високонавантажені
інфраструктури з більше ніж 7000+ серверів.
Media і CDN сервіси
● В даний момент SRE який надає послуги в
підтримці, оптимізації і автоматизації
високонавантажених середовищ
(до 250K RPS, 3000+ серверів)
33
Confidential
33
SRE -
who is Site Reliability Engineer?
Confidential
Questions to audience
Poll #4
Does your organization formally use
Site Reliability Engineering?
Confidential
Availability
Confidential
Error budget
Confidential
Questions to audience
Poll #5
How many incidents are happening
during the changes?
Confidential
38
How to achieve stability in media
products?
Confidential
39
“Day in the life of SRE”
1. Monitoring, Alerts management
2. Deployments
3. Automation
4. Processes/Documentation
5. Incident management
Confidential
Official information sources by Google:
books
online course
41
Confidential
Summary
Confidential
Summary
● Media/OTT streaming industry is constantly raising and skyrocketing
because of COVID
● Reliability is a key to sustain daily streaming of millions of hours
● Requires constant Quality of Service monitoring
● Requires 24x7 support across the world
43
Confidential
Thank you!

Contenu connexe

Similaire à [Global logic] media high availability service

Rhytha Service Portfolio
Rhytha Service PortfolioRhytha Service Portfolio
Rhytha Service PortfolioRamalingam Raju
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupYashrajNayak4
 
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)Philippe Ensarguet
 
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and JenkinsExpedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and JenkinsCollabNet
 
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...Agustin Benito Bethencourt
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingAarno Aukia
 
PrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latestPrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latestPrashant Soni
 
Applied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptxApplied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptxApplied Systems Ltd.
 
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...Agile En Seine
 

Similaire à [Global logic] media high availability service (20)

Shaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M ResumeShaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M Resume
 
Rhytha Service Portfolio
Rhytha Service PortfolioRhytha Service Portfolio
Rhytha Service Portfolio
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New visionsoft
New visionsoftNew visionsoft
New visionsoft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
 
Develer - Company Profile
Develer - Company ProfileDeveler - Company Profile
Develer - Company Profile
 
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and JenkinsExpedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
 
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
 
Resume-Piyush_Agarwal (1)
Resume-Piyush_Agarwal (1)Resume-Piyush_Agarwal (1)
Resume-Piyush_Agarwal (1)
 
PrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latestPrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latest
 
Applied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptxApplied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptx
 
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
 
ServerAdminz - A Server Management Company - Portfolio
ServerAdminz - A Server Management Company - PortfolioServerAdminz - A Server Management Company - Portfolio
ServerAdminz - A Server Management Company - Portfolio
 
Janakiraman_Mar2016_SF
Janakiraman_Mar2016_SFJanakiraman_Mar2016_SF
Janakiraman_Mar2016_SF
 

Plus de GlobalLogic Ukraine

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Ukraine
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic Ukraine
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxGlobalLogic Ukraine
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxGlobalLogic Ukraine
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxGlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Ukraine
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"GlobalLogic Ukraine
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic Ukraine
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationGlobalLogic Ukraine
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic Ukraine
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic Ukraine
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Ukraine
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic Ukraine
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"GlobalLogic Ukraine
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Ukraine
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"GlobalLogic Ukraine
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Ukraine
 

Plus de GlobalLogic Ukraine (20)

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptx
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic Education
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

[Global logic] media high availability service

  • 2. Confidential Nazariy Mamrokha - Engineering Director, Media ● More than 13 years of experience in software development and Media domain. Including 10 years of software development and 6 years of management background. ● Strong experience in Media and Broadcasting domain. ● Architecting solutions for: ○ Media OTT applications (mobile, living room, gaming consoles, Smart TVs, Web), ○ Backend Services (CMS/CDN, Streaming Services, Subscription Management, Ad-Tech solutions, Billing and Monetization, Analytics, Application Store). ● Leading Media Program with 150+ engineers working on 25+ projects ● Понад 13 років досвіду у розробці програмного забезпечення та медіа-домені. У тому числі 10 років розробки програмного забезпечення та 6 років досвіду в менеджментів. ● Великий досвід у сфері Медіа та Мовлення. ● Розробка архітектурних рішення для: ○ Медіа/OTT додатків (мобільні, телевізійні приставки, ігрові консолі, Smart TV,), ○ Бекенд-сервіси (CMS / CDN, потокові сервіси, сервіси підписки, рішення Ad-Tech, Монетизація та платежі, сервіси аналітики, магазини додатків) ● Очолює програму з 25+ проектів у сфері Media, загальною кількість 150+ людей
  • 4. 4 Confidential End-to-End Video Content Lifecycle Get your content and data into the system Perform all necessary content manipulations Play on any device Deliver to end-user MonetizeProduce content - Ingest - Metadata - Encode - Transcode - ABR - Codecs - Store - Host - Organize - Scale - Backup - CDN - ABR - Packaging - Encrypt - DRM - CAS - Algorithms - Apps - Any platform - Any device - Subscription management - Ad Exchange - Billing - Manage - Extract - Archive - Search - Workflows - Capture - Edit - Effects - Workflows - Finishing Analytics Content providers (TV networks, studios, video bloggers, etc.) Service providers & technology vendors (telecom, broadband, CDN, ISVs, etc.) OEM (connected devices, consumer electronics, etc.) Engineering QA & Automation DevOps MigrationDesign Architecture Support Industry Content GlobalLogic Engineering Services
  • 5. Confidential Cloud video streaming platform OTT-service-like cloud platform for video content delivery and monetization VOD Live MAM Metadata management, CMS cDVR SSAI User management CDN Clients Metadata VoD, Linear (HLS) User profiles, Auth API Content with ads Timeshifted, TVPersonal Recordings VoD Library, Scheduling, EPG Ad management Ad Insertion Ad Tracking Ad Decisions Ingest, Transcode, Playout, Package
  • 6. Confidential Serhiy Onanchenko - NOC Team Leader ● Over 18 years of professional experience in IT industry ● Full stack developer, DBA, Linux/Windows environments system administrator, network engineer ● Supported production-grade ecosystems in Telecom domain ● Managed support groups (30 members) providing administration and monitoring services (24/7) for 350+ customers ● Currently manage 12 engineers NOC monitoring multiple high loaded environments (up to 250K RPS, 3000+ instances) ● Більше 18 років професійного досвіду в ІТ- індустрії ● Full stack developer, DBA, адміністратор Linux/Windows середовищ, інженер мережевого обладнання ● Підтримував Supported екосистеми виробничого рівня в домені телекомунікацій ● Був керівником груп підтримки (30 інженерів) які займались адмініструванням та моніторингом сервісів (24/7) для 350+ замовників ● В даний момент є менеджером NOC з 12 інженерів який надає сервіси моніторингу для багатьох високонавантажених середовищ (до 250K RPS, 3000+ серверів)
  • 8. 8 Confidential 1.NOC - who we are ? - team structure - scope 2.Incidents management 3.Monitoring toolset 4.Monitoring challenges and best practices Agenda
  • 10. Confidential Current Team structure 1 Team Leader 12 NOC Engineers (2 people per shift) ● Linux, Windows systems administration, automation scripting ● Cloud computing and networks ● Web applications and servers architecture, HTTP, REST API ● Monitoring tools and principles ● Strong troubleshooting and problem-solving skills ● Good English language skills
  • 12. Confidential Questions to audience Poll #1 What is the largest environment you supported ?
  • 13. Confidential ● 5+ products, 1000+ B2B customers ● 9+ AWS production environments ● Microservices, Kubernetes clusters ● 3000+ running instances ● up to 250K RPS Scope Availability target: up to 99.995% = max 30.2 sec of downtime weekly MTTA (Mean Time to Acknowledge) Target - 1 minute
  • 14. Confidential Responsibilities ● Infrastructure, Services monitoring ● Incident management and documenting ● Monitoring systems and checks maintaining, implementations of new metrics and monitoring scenarios ● Keep and update a directory of all 3rd parties ● One focal point that always knows the service level and issues status ● Defining reliable and preventive monitoring requirements as part of the product development life cycle ● Communication, coordination, collaboration
  • 17. Confidential Questions to audience What incident management tools you used to work with ? Poll #2
  • 19. Confidential Monitoring toolset OpsGenie Prometheus Grafana Amazon CloudWatch PRTG Dotcom-Monitor Foglight Witbe robot Youbora Logz.io + multiple custom scripts/sensors
  • 20. Confidential Questions to audience What monitoring tools do you use for production environment monitoring ? Poll #3
  • 25. Confidential Witbe Robots Witbe robots for end-to-end scenarios testing on any device (PC, smartphone, STB) and Quality of Experience (QoE) monitoring.
  • 27. Confidential Monitoring challenges ● Mix of infrastructures setups and products ● Black Box monitoring ● Noise and false-positives ● Anomalies detection ● Multiple communication channels ● Complicated and long Runbooks Human in the middle real-time operations
  • 28. Confidential SRE Golden Signals to monitor There are three common methodologies: ● From the Google SRE book: Latency, Traffic, Errors, and Saturation ● USE Method (from Brendan Gregg): Utilization, Saturation, and Errors ● RED Method (from Tom Wilkie): Rate, Errors, and Duration Useful references: #1 #2 #3
  • 29. Confidential The USE Method Methodology for analyzing the performance of any system A summary of USE is “For every resource, check utilization, saturation, and errors.” Resource: all physical server functional components (CPUs, disks,...) ● Utilization: the average time the resource was busy servicing work ● Saturation: the degree to which the resource has extra work which it can’t service, often queued ● Errors: the count of error events
  • 30. Confidential The RED Method Methodology for services analysis A summary of RED is “For every service, check rate, errors, and duration.” ● Rate: the number of requests per second ● Errors: the number of those requests that are failing ● Duration: the amount of time those requests take
  • 32. Confidential Anton Bil - Senior Software Engineer ● Over 8 years of professional experience in IT industry ● Strong experience Linux/Windows environments system administrator, DevOps, SRE ● As a SRE supported highly loaded infrastructures with more than 7,000+ servers. Media and CND services. ● Currently works as SRE which provides services in support, optimization and automation in high loaded environments (up to 250K RPS, 3000+ instances) ● Більше 8 років професійного досвіду в ІТ- індустрії ● Великий досвід у адмініструванні Linux/Windows середовищ, DevOps, SRE ● Як SRE підтримував високонавантажені інфраструктури з більше ніж 7000+ серверів. Media і CDN сервіси ● В даний момент SRE який надає послуги в підтримці, оптимізації і автоматизації високонавантажених середовищ (до 250K RPS, 3000+ серверів)
  • 33. 33 Confidential 33 SRE - who is Site Reliability Engineer?
  • 34. Confidential Questions to audience Poll #4 Does your organization formally use Site Reliability Engineering?
  • 37. Confidential Questions to audience Poll #5 How many incidents are happening during the changes?
  • 38. Confidential 38 How to achieve stability in media products?
  • 39. Confidential 39 “Day in the life of SRE” 1. Monitoring, Alerts management 2. Deployments 3. Automation 4. Processes/Documentation 5. Incident management
  • 40. Confidential Official information sources by Google: books online course
  • 42. Confidential Summary ● Media/OTT streaming industry is constantly raising and skyrocketing because of COVID ● Reliability is a key to sustain daily streaming of millions of hours ● Requires constant Quality of Service monitoring ● Requires 24x7 support across the world