SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Statistics 101 for System
Administrators
EuroPython 2014, 22th
July - Berlin
Roberto Polli - roberto.polli@babel.it
Babel Srl P.zza S. Benedetto da Norcia, 33
00040, Pomezia (RM) - www.babel.it
22 July 2014
Roberto Polli - roberto.polli@babel.it
Who? What? Why?
• Using (and learning) elements of statistics with python.
• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.
• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures
based on Open Source software for Italian ISP and PA. Contributes to
various FLOSS.
Intro Roberto Polli - roberto.polli@babel.it
Agenda
• A latency issue: what happened?
• Correlation in 30”
• Combining data
• Plotting time
• modules: scipy, matplotlib
Intro Roberto Polli - roberto.polli@babel.it
A Latency Issue
• Episodic network latency issues
• Logs traces: message size, #peers, retransimissions
• Do we need to scale? Was a peak problem?
Find a rapid answer with python!
Intro Roberto Polli - roberto.polli@babel.it
Basic statistics
Python provides basic statistics, like
from scipy.stats import mean # ¯x
from scipy.stats import std # σX
T = { ’ts’: (1, 2, 3, .., ),
’late’: (0.12, 6.31, 0.43, .. ),
’peers’: (2313, 2313, 2312, ..),...}
print([k, max(X), min(X), mean(X), std(X) ]
for k, X in T.items() ])
Intro Roberto Polli - roberto.polli@babel.it
Distributions
Data distribution - aka δX - shows event frequency.
# The fastest way to get a
# distribution is
from matplotlib import pyplot as plt
freq, bins, _ = plt.hist(T[’late’])
# plt.hist returns a
distribution = zip(bins, freq)
A ping rtt distribution
158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0
rtt in ms
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0 Ping RTT distribution
r
Intro Roberto Polli - roberto.polli@babel.it
Correlation I
Are two data series X, Y related?
Given ∆xi = xi − ¯x Mr. Pearson answered with this formula
ρ(X, Y ) = i ∆xi ∆yi
i ∆2xi ∆2yi
∈ [−1, +1] (1)
ρ identifies if the values of X and Y ‘move’ together on the same line.
Intro Roberto Polli - roberto.polli@babel.it
You must (scatter) plot
ρ doesn’t find non-linear correlation!
Intro Roberto Polli - roberto.polli@babel.it
Probability Indicator
Python scipy provides a correlation function, returning two values:
• the ρ correlation coefficient ∈ [−1, +1]
• the probability that such datasets are produced by uncorrelated systems
from scipy.stats.stats import pearsonr # our beloved ρ
a, b = range(0, 100), range(0, 400, 4)
c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a]
correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000
correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683
Intro Roberto Polli - roberto.polli@babel.it
Combinations
itertools is a gold pot of useful tools.
from itertools import combinations
# returns all possible combination of
# items grouped by N at a time
items = "heart spades clubs diamonds".split()
combinations(items, 2)
# And now all possible combinations between
# dataset fields!
combinations(T, 2)
Combinating 4 suites,
2 at a time.
♥♠
♥♣
♥♦
♠♣
♠♦
♣♦
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation I
# Now we have all the ingredients for
# net-fishing relations between our data!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
# Look for correlations between every dataset!
corr, prob = pearsonr(v1, v2)
if corr > .6:
print("Series", k1, k2, "can be correlated", corr)
elif prob < 0.05:
print("Series", k1, k2, "probability lower than 5%%", prob)
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation II
Now plot all combinations: there’s more to meet with eyes!
# Plot everything, and insert data in plots!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
corr, prob = pearsonr(v1, v2)
plt.scatter(v1, v2)
# 3 digit precision on title
plt.title("R={:0.3f} P={:0.3f}".format(corr, prob))
plt.xlabel(k1); plt.ylabel(k2)
# save and close the plot
plt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Intro Roberto Polli - roberto.polli@babel.it
Plotting Correlation
Intro Roberto Polli - roberto.polli@babel.it
Color is the 3rd dimension
from itertools import cycle
colors = cycle("rgb") # use more than 3 colors!
labels = cycle("morning afternoon night".split())
size = datalen / 3 # 3 colors, right?
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
[ plt.scatter( t1[i:i+size] , t2[i:i+size],
color=next(colors),
label=next(labels)
) for i in range(0, datalen, size) ]
# set title, save plot & co
Intro Roberto Polli - roberto.polli@babel.it
Example Correlation
Intro Roberto Polli - roberto.polli@babel.it
Latency Solution
• Latency wasn’t related to packet size or system throughput
• Errors were not related to packet size
• Discovered system throughput
Intro Roberto Polli - roberto.polli@babel.it
Wrap Up
• Use statistics: it’s easy
• Don’t use ρ to exclude relations
• Plot, Plot, Plot
• Continue collecting results
Intro Roberto Polli - roberto.polli@babel.it
That’s all folks!
Thank you for the attention!
Roberto Polli - roberto.polli@babel.it
Intro Roberto Polli - roberto.polli@babel.it

Contenu connexe

Similaire à Statistics 101 for System Administrators

Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisDataWorks Summit
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and rKelli-Jean Chun
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine LearningYounesCharfaoui
 

Similaire à Statistics 101 for System Administrators (20)

Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Python slide
Python slidePython slide
Python slide
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
11 Python CBSE Syllabus
11    Python CBSE Syllabus11    Python CBSE Syllabus
11 Python CBSE Syllabus
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine Learning
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 

Plus de Roberto Polli

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTPRoberto Polli
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Roberto Polli
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggeraRoberto Polli
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestRoberto Polli
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.Roberto Polli
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System AdministratorsRoberto Polli
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Roberto Polli
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerRoberto Polli
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repositoryRoberto Polli
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embeddedRoberto Polli
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceRoberto Polli
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009Roberto Polli
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1Roberto Polli
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Roberto Polli
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPRoberto Polli
 

Plus de Roberto Polli (20)

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTP
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggera
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetest
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System Administrators
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and Docker
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repository
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embedded
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open source
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAP
 
ultimo-miglio-v3
ultimo-miglio-v3ultimo-miglio-v3
ultimo-miglio-v3
 
Ultimo Miglio v2
Ultimo Miglio v2Ultimo Miglio v2
Ultimo Miglio v2
 

Dernier

Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 

Dernier (20)

Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 

Statistics 101 for System Administrators

  • 1. Statistics 101 for System Administrators EuroPython 2014, 22th July - Berlin Roberto Polli - roberto.polli@babel.it Babel Srl P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.babel.it 22 July 2014 Roberto Polli - roberto.polli@babel.it
  • 2. Who? What? Why? • Using (and learning) elements of statistics with python. • Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java and Python. Red Hat Certified Engineer and Virtualization Administrator. • Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures based on Open Source software for Italian ISP and PA. Contributes to various FLOSS. Intro Roberto Polli - roberto.polli@babel.it
  • 3. Agenda • A latency issue: what happened? • Correlation in 30” • Combining data • Plotting time • modules: scipy, matplotlib Intro Roberto Polli - roberto.polli@babel.it
  • 4. A Latency Issue • Episodic network latency issues • Logs traces: message size, #peers, retransimissions • Do we need to scale? Was a peak problem? Find a rapid answer with python! Intro Roberto Polli - roberto.polli@babel.it
  • 5. Basic statistics Python provides basic statistics, like from scipy.stats import mean # ¯x from scipy.stats import std # σX T = { ’ts’: (1, 2, 3, .., ), ’late’: (0.12, 6.31, 0.43, .. ), ’peers’: (2313, 2313, 2312, ..),...} print([k, max(X), min(X), mean(X), std(X) ] for k, X in T.items() ]) Intro Roberto Polli - roberto.polli@babel.it
  • 6. Distributions Data distribution - aka δX - shows event frequency. # The fastest way to get a # distribution is from matplotlib import pyplot as plt freq, bins, _ = plt.hist(T[’late’]) # plt.hist returns a distribution = zip(bins, freq) A ping rtt distribution 158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0 rtt in ms 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Ping RTT distribution r Intro Roberto Polli - roberto.polli@babel.it
  • 7. Correlation I Are two data series X, Y related? Given ∆xi = xi − ¯x Mr. Pearson answered with this formula ρ(X, Y ) = i ∆xi ∆yi i ∆2xi ∆2yi ∈ [−1, +1] (1) ρ identifies if the values of X and Y ‘move’ together on the same line. Intro Roberto Polli - roberto.polli@babel.it
  • 8. You must (scatter) plot ρ doesn’t find non-linear correlation! Intro Roberto Polli - roberto.polli@babel.it
  • 9. Probability Indicator Python scipy provides a correlation function, returning two values: • the ρ correlation coefficient ∈ [−1, +1] • the probability that such datasets are produced by uncorrelated systems from scipy.stats.stats import pearsonr # our beloved ρ a, b = range(0, 100), range(0, 400, 4) c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a] correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000 correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683 Intro Roberto Polli - roberto.polli@babel.it
  • 10. Combinations itertools is a gold pot of useful tools. from itertools import combinations # returns all possible combination of # items grouped by N at a time items = "heart spades clubs diamonds".split() combinations(items, 2) # And now all possible combinations between # dataset fields! combinations(T, 2) Combinating 4 suites, 2 at a time. ♥♠ ♥♣ ♥♦ ♠♣ ♠♦ ♣♦ Intro Roberto Polli - roberto.polli@babel.it
  • 11. Netfishing correlation I # Now we have all the ingredients for # net-fishing relations between our data! for (k1,v1), (k2,v2) in combinations(T.items(), 2): # Look for correlations between every dataset! corr, prob = pearsonr(v1, v2) if corr > .6: print("Series", k1, k2, "can be correlated", corr) elif prob < 0.05: print("Series", k1, k2, "probability lower than 5%%", prob) Intro Roberto Polli - roberto.polli@babel.it
  • 12. Netfishing correlation II Now plot all combinations: there’s more to meet with eyes! # Plot everything, and insert data in plots! for (k1,v1), (k2,v2) in combinations(T.items(), 2): corr, prob = pearsonr(v1, v2) plt.scatter(v1, v2) # 3 digit precision on title plt.title("R={:0.3f} P={:0.3f}".format(corr, prob)) plt.xlabel(k1); plt.ylabel(k2) # save and close the plot plt.savefig("{}_{}.png".format(k1, k2)); plt.close() Intro Roberto Polli - roberto.polli@babel.it
  • 13. Plotting Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 14. Color is the 3rd dimension from itertools import cycle colors = cycle("rgb") # use more than 3 colors! labels = cycle("morning afternoon night".split()) size = datalen / 3 # 3 colors, right? for (k1,v1), (k2,v2) in combinations(T.items(), 2): [ plt.scatter( t1[i:i+size] , t2[i:i+size], color=next(colors), label=next(labels) ) for i in range(0, datalen, size) ] # set title, save plot & co Intro Roberto Polli - roberto.polli@babel.it
  • 15. Example Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 16. Latency Solution • Latency wasn’t related to packet size or system throughput • Errors were not related to packet size • Discovered system throughput Intro Roberto Polli - roberto.polli@babel.it
  • 17. Wrap Up • Use statistics: it’s easy • Don’t use ρ to exclude relations • Plot, Plot, Plot • Continue collecting results Intro Roberto Polli - roberto.polli@babel.it
  • 18. That’s all folks! Thank you for the attention! Roberto Polli - roberto.polli@babel.it Intro Roberto Polli - roberto.polli@babel.it