SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Statistics 101 for System
Administrators
EuroPython 2014, 22th
July - Berlin
Roberto Polli - roberto.polli@babel.it
Babel Srl P.zza S. Benedetto da Norcia, 33
00040, Pomezia (RM) - www.babel.it
22 July 2014
Roberto Polli - roberto.polli@babel.it
Who? What? Why?
• Using (and learning) elements of statistics with python.
• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.
• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures
based on Open Source software for Italian ISP and PA. Contributes to
various FLOSS.
Intro Roberto Polli - roberto.polli@babel.it
Agenda
• A latency issue: what happened?
• Correlation in 30”
• Combining data
• Plotting time
• modules: scipy, matplotlib
Intro Roberto Polli - roberto.polli@babel.it
A Latency Issue
• Episodic network latency issues
• Logs traces: message size, #peers, retransimissions
• Do we need to scale? Was a peak problem?
Find a rapid answer with python!
Intro Roberto Polli - roberto.polli@babel.it
Basic statistics
Python provides basic statistics, like
from scipy.stats import mean # ¯x
from scipy.stats import std # σX
T = { ’ts’: (1, 2, 3, .., ),
’late’: (0.12, 6.31, 0.43, .. ),
’peers’: (2313, 2313, 2312, ..),...}
print([k, max(X), min(X), mean(X), std(X) ]
for k, X in T.items() ])
Intro Roberto Polli - roberto.polli@babel.it
Distributions
Data distribution - aka δX - shows event frequency.
# The fastest way to get a
# distribution is
from matplotlib import pyplot as plt
freq, bins, _ = plt.hist(T[’late’])
# plt.hist returns a
distribution = zip(bins, freq)
A ping rtt distribution
158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0
rtt in ms
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0 Ping RTT distribution
r
Intro Roberto Polli - roberto.polli@babel.it
Correlation I
Are two data series X, Y related?
Given ∆xi = xi − ¯x Mr. Pearson answered with this formula
ρ(X, Y ) = i ∆xi ∆yi
i ∆2xi ∆2yi
∈ [−1, +1] (1)
ρ identifies if the values of X and Y ‘move’ together on the same line.
Intro Roberto Polli - roberto.polli@babel.it
You must (scatter) plot
ρ doesn’t find non-linear correlation!
Intro Roberto Polli - roberto.polli@babel.it
Probability Indicator
Python scipy provides a correlation function, returning two values:
• the ρ correlation coefficient ∈ [−1, +1]
• the probability that such datasets are produced by uncorrelated systems
from scipy.stats.stats import pearsonr # our beloved ρ
a, b = range(0, 100), range(0, 400, 4)
c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a]
correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000
correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683
Intro Roberto Polli - roberto.polli@babel.it
Combinations
itertools is a gold pot of useful tools.
from itertools import combinations
# returns all possible combination of
# items grouped by N at a time
items = "heart spades clubs diamonds".split()
combinations(items, 2)
# And now all possible combinations between
# dataset fields!
combinations(T, 2)
Combinating 4 suites,
2 at a time.
♥♠
♥♣
♥♦
♠♣
♠♦
♣♦
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation I
# Now we have all the ingredients for
# net-fishing relations between our data!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
# Look for correlations between every dataset!
corr, prob = pearsonr(v1, v2)
if corr > .6:
print("Series", k1, k2, "can be correlated", corr)
elif prob < 0.05:
print("Series", k1, k2, "probability lower than 5%%", prob)
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation II
Now plot all combinations: there’s more to meet with eyes!
# Plot everything, and insert data in plots!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
corr, prob = pearsonr(v1, v2)
plt.scatter(v1, v2)
# 3 digit precision on title
plt.title("R={:0.3f} P={:0.3f}".format(corr, prob))
plt.xlabel(k1); plt.ylabel(k2)
# save and close the plot
plt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Intro Roberto Polli - roberto.polli@babel.it
Plotting Correlation
Intro Roberto Polli - roberto.polli@babel.it
Color is the 3rd dimension
from itertools import cycle
colors = cycle("rgb") # use more than 3 colors!
labels = cycle("morning afternoon night".split())
size = datalen / 3 # 3 colors, right?
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
[ plt.scatter( t1[i:i+size] , t2[i:i+size],
color=next(colors),
label=next(labels)
) for i in range(0, datalen, size) ]
# set title, save plot & co
Intro Roberto Polli - roberto.polli@babel.it
Example Correlation
Intro Roberto Polli - roberto.polli@babel.it
Latency Solution
• Latency wasn’t related to packet size or system throughput
• Errors were not related to packet size
• Discovered system throughput
Intro Roberto Polli - roberto.polli@babel.it
Wrap Up
• Use statistics: it’s easy
• Don’t use ρ to exclude relations
• Plot, Plot, Plot
• Continue collecting results
Intro Roberto Polli - roberto.polli@babel.it
That’s all folks!
Thank you for the attention!
Roberto Polli - roberto.polli@babel.it
Intro Roberto Polli - roberto.polli@babel.it

Contenu connexe

Similaire à Statistics 101 for System Administrators

Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisDataWorks Summit
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and rKelli-Jean Chun
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine LearningYounesCharfaoui
 

Similaire à Statistics 101 for System Administrators (20)

Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Python slide
Python slidePython slide
Python slide
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
11 Python CBSE Syllabus
11    Python CBSE Syllabus11    Python CBSE Syllabus
11 Python CBSE Syllabus
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine Learning
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 

Plus de Roberto Polli

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTPRoberto Polli
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Roberto Polli
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggeraRoberto Polli
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestRoberto Polli
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.Roberto Polli
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System AdministratorsRoberto Polli
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Roberto Polli
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerRoberto Polli
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repositoryRoberto Polli
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embeddedRoberto Polli
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceRoberto Polli
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009Roberto Polli
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1Roberto Polli
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Roberto Polli
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPRoberto Polli
 

Plus de Roberto Polli (20)

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTP
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggera
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetest
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System Administrators
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and Docker
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repository
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embedded
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open source
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAP
 
ultimo-miglio-v3
ultimo-miglio-v3ultimo-miglio-v3
ultimo-miglio-v3
 
Ultimo Miglio v2
Ultimo Miglio v2Ultimo Miglio v2
Ultimo Miglio v2
 

Dernier

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Dernier (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

Statistics 101 for System Administrators

  • 1. Statistics 101 for System Administrators EuroPython 2014, 22th July - Berlin Roberto Polli - roberto.polli@babel.it Babel Srl P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.babel.it 22 July 2014 Roberto Polli - roberto.polli@babel.it
  • 2. Who? What? Why? • Using (and learning) elements of statistics with python. • Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java and Python. Red Hat Certified Engineer and Virtualization Administrator. • Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures based on Open Source software for Italian ISP and PA. Contributes to various FLOSS. Intro Roberto Polli - roberto.polli@babel.it
  • 3. Agenda • A latency issue: what happened? • Correlation in 30” • Combining data • Plotting time • modules: scipy, matplotlib Intro Roberto Polli - roberto.polli@babel.it
  • 4. A Latency Issue • Episodic network latency issues • Logs traces: message size, #peers, retransimissions • Do we need to scale? Was a peak problem? Find a rapid answer with python! Intro Roberto Polli - roberto.polli@babel.it
  • 5. Basic statistics Python provides basic statistics, like from scipy.stats import mean # ¯x from scipy.stats import std # σX T = { ’ts’: (1, 2, 3, .., ), ’late’: (0.12, 6.31, 0.43, .. ), ’peers’: (2313, 2313, 2312, ..),...} print([k, max(X), min(X), mean(X), std(X) ] for k, X in T.items() ]) Intro Roberto Polli - roberto.polli@babel.it
  • 6. Distributions Data distribution - aka δX - shows event frequency. # The fastest way to get a # distribution is from matplotlib import pyplot as plt freq, bins, _ = plt.hist(T[’late’]) # plt.hist returns a distribution = zip(bins, freq) A ping rtt distribution 158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0 rtt in ms 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Ping RTT distribution r Intro Roberto Polli - roberto.polli@babel.it
  • 7. Correlation I Are two data series X, Y related? Given ∆xi = xi − ¯x Mr. Pearson answered with this formula ρ(X, Y ) = i ∆xi ∆yi i ∆2xi ∆2yi ∈ [−1, +1] (1) ρ identifies if the values of X and Y ‘move’ together on the same line. Intro Roberto Polli - roberto.polli@babel.it
  • 8. You must (scatter) plot ρ doesn’t find non-linear correlation! Intro Roberto Polli - roberto.polli@babel.it
  • 9. Probability Indicator Python scipy provides a correlation function, returning two values: • the ρ correlation coefficient ∈ [−1, +1] • the probability that such datasets are produced by uncorrelated systems from scipy.stats.stats import pearsonr # our beloved ρ a, b = range(0, 100), range(0, 400, 4) c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a] correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000 correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683 Intro Roberto Polli - roberto.polli@babel.it
  • 10. Combinations itertools is a gold pot of useful tools. from itertools import combinations # returns all possible combination of # items grouped by N at a time items = "heart spades clubs diamonds".split() combinations(items, 2) # And now all possible combinations between # dataset fields! combinations(T, 2) Combinating 4 suites, 2 at a time. ♥♠ ♥♣ ♥♦ ♠♣ ♠♦ ♣♦ Intro Roberto Polli - roberto.polli@babel.it
  • 11. Netfishing correlation I # Now we have all the ingredients for # net-fishing relations between our data! for (k1,v1), (k2,v2) in combinations(T.items(), 2): # Look for correlations between every dataset! corr, prob = pearsonr(v1, v2) if corr > .6: print("Series", k1, k2, "can be correlated", corr) elif prob < 0.05: print("Series", k1, k2, "probability lower than 5%%", prob) Intro Roberto Polli - roberto.polli@babel.it
  • 12. Netfishing correlation II Now plot all combinations: there’s more to meet with eyes! # Plot everything, and insert data in plots! for (k1,v1), (k2,v2) in combinations(T.items(), 2): corr, prob = pearsonr(v1, v2) plt.scatter(v1, v2) # 3 digit precision on title plt.title("R={:0.3f} P={:0.3f}".format(corr, prob)) plt.xlabel(k1); plt.ylabel(k2) # save and close the plot plt.savefig("{}_{}.png".format(k1, k2)); plt.close() Intro Roberto Polli - roberto.polli@babel.it
  • 13. Plotting Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 14. Color is the 3rd dimension from itertools import cycle colors = cycle("rgb") # use more than 3 colors! labels = cycle("morning afternoon night".split()) size = datalen / 3 # 3 colors, right? for (k1,v1), (k2,v2) in combinations(T.items(), 2): [ plt.scatter( t1[i:i+size] , t2[i:i+size], color=next(colors), label=next(labels) ) for i in range(0, datalen, size) ] # set title, save plot & co Intro Roberto Polli - roberto.polli@babel.it
  • 15. Example Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 16. Latency Solution • Latency wasn’t related to packet size or system throughput • Errors were not related to packet size • Discovered system throughput Intro Roberto Polli - roberto.polli@babel.it
  • 17. Wrap Up • Use statistics: it’s easy • Don’t use ρ to exclude relations • Plot, Plot, Plot • Continue collecting results Intro Roberto Polli - roberto.polli@babel.it
  • 18. That’s all folks! Thank you for the attention! Roberto Polli - roberto.polli@babel.it Intro Roberto Polli - roberto.polli@babel.it