SlideShare a Scribd company logo
1 of 20
Statistics in NumPy and SciPy



        February 12, 2009
Enthought Python Distribution (EPD)

 MORE THAN SIXTY INTEGRATED PACKAGES

  • Python 2.6                     • Repository access
  • Science (NumPy, SciPy, etc.)   • Data Storage (HDF, NetCDF, etc.)
  • Plotting (Chaco, Matplotlib)   • Networking (twisted)
  • Visualization (VTK, Mayavi)    • User Interface (wxPython, Traits UI)
  • Multi-language Integration     • Enthought Tool Suite
    (SWIG,Pyrex, f2py, weave)        (Application Development Tools)
Enthought Training Courses




                 Python Basics, NumPy,
                 SciPy, Matplotlib, Chaco,
                 Traits, TraitsUI, …
PyCon


    http://us.pycon.org/2010/tutorials/

        Introduction to Traits




                                 Corran Webster
Upcoming Training Classes
  March 1 – 5, 2009
       Python for Scientists and Engineers
       Austin, Texas, USA

  March 8 – 12, 2009
       Python for Quants
       London, UK




                       http://www.enthought.com/training/
NumPy / SciPy
  Statistics
Statistics overview

 • NumPy methods and functions
   – .mean, .std, .var, .min, .max, .argmax, .argmin
   – median, nanargmax, nanargmin, nanmax,
    nanmin, nansum
 • NumPy random number generators
 • Distribution objects in SciPy (scipy.stats)
 • Many functions in SciPy
   – f_oneway, bayes_mvs
   – nanmedian, nanstd, nanmean
NumPy methods
 • All array objects have some “statistical”
   methods
  – .mean(), .std(), .var(), .max(), .min(), .argmax(),
   .argmin()
  – Take an axis keyword that allows them to work on
   N-d arrays (shown with .sum).


   axis=0               axis=1
NumPy functions

 • median
 • nan-functions (ignore nans)
   –   nanmax
   –   nanmin
   –   nanargmin
   –   nanargmax
   –   nansum
 • Can also use masks and regular functions
NumPy Random Number Generators

 • Based on Mersenne twister algorithm
 • Written using PyRex / Cython
 • Univariate (over 40)
 • Multivariate (only 3)
  – multinomial
  – dirichlet
  – multivariate_normal
 • Convenience functions
  – rand, randn, randint, ranf
Statistics
scipy.stats — CONTINUOUS DISTRIBUTIONS

over 80
continuous
distributions!

METHODS

pdf     entropy
cdf     nnlf
rvs     moment
ppf     freeze
stats
fit
sf
isf
Using stats objects
DISTRIBUTIONS


>>> from scipy.stats import norm
# Sample normal dist. 100 times.
>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)
# Calculate probability dist.
>>> pdf = norm.pdf(x)
# Calculate cummulative Dist.
>>> cdf = norm.cdf(x)
# Calculate Percent Point Function
>>> ppf = norm.ppf(x)
Distribution objects
Every distribution can be modified by loc and scale keywords
(many distributions also have required shape arguments to select from a family)

LOCATION (loc) --- shift left (<0) or right (>0) the distribution




SCALE (scale) --- stretch (>1) or compress (<1) the distribution
Example distributions
NORM (norm) – N(µ,σ)


  Only location and scale       location   mean                    µ
  arguments:
                                scale      standard deviation      σ


LOG NORMAL (lognorm)

log(S) is N(µ, σ)
                                location   offset from zero (rarely used)
        S is lognormal
                                scale      eµ

         one shape parameter!   shape      σ
Setting location and Scale

NORMAL DISTRIBUTION


>>> from scipy.stats import norm
# Normal dist with mean=10 and std=2
>>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)
# Calculate probability dist.
>>> pdf = dist.pdf(x)
# Calculate cummulative dist.
>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.
>>> samp = dist.rvs(size=100)

# Estimate parameters from data
>>> mu, sigma = norm.fit(samp)           .fit returns best
>>> print “%4.2f, %4.2f” % (mu, sigma)   shape + (loc, scale)
10.07, 1.95                              that explains the data
Statistics
scipy.stats — Discrete Distributions

 10 standard
 discrete
 distributions
 (plus any
 finite RV)

METHODS
pmf     moment
cdf     entropy
rvs     freeze
ppf
stats
sf
isf
Using stats objects
 CREATING NEW DISCRETE DISTRIBUTIONS


# Create loaded dice.
>>> from scipy.stats import rv_discrete
>>> xk = [1,2,3,4,5,6]
>>> pk = [0.3,0.35,0.25,0.05,
          0.025,0.025]
>>> new = rv_discrete(name='loaded',
                   values=(xk,pk))

# Calculate histogram
>>> samples = new.rvs(size=1000)
>>> bins=linspace(0.5,5.5,6)
>>> subplot(211)
>>> hist(samples,bins=bins,normed=True)

# Calculate pmf
>>> x = range(0,8)
>>> subplot(212)
>>> stem(x,new.pmf(x))
Statistics
CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS

# Sample two normal distributions
# and create a bi-modal distribution
>>> rv1 = stats.norm()
>>> rv2 = stats.norm(2.0,0.8)
>>> samples = hstack([rv1.rvs(size=100),
                        rv2.rvs(size=100)])


# Use a Gaussian kernel density to
# estimate the PDF for the samples.
>>> from scipy.stats.kde import gaussian_kde
>>> approximate_pdf = gaussian_kde(samples)
>>> x = linspace(-3,6,200)

# Compare the histogram of the samples to
# the PDF approximation.
>>> hist(samples, bins=25, normed=True)
>>> plot(x, approximate_pdf(x),'r')
Other functions in scipy.stats

 • Statistical Tests (Anderson, Wilcox, etc.)
 • Other calculations (hmean, nanmedian)
 • Work in progress
 • A great place to jump in and help
Other statistical Resources

 • scikits.statsmodels
 • RPy2
 • PyMC

More Related Content

What's hot

Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1Jatin Miglani
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPyDevashish Kumar
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanWei-Yuan Chang
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and PackagesDamian T. Gordon
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
 
Arrays in python
Arrays in pythonArrays in python
Arrays in pythonmoazamali28
 
Datatypes in python
Datatypes in pythonDatatypes in python
Datatypes in pythoneShikshak
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 

What's hot (20)

Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Python pandas Library
Python pandas LibraryPython pandas Library
Python pandas Library
 
NUMPY
NUMPY NUMPY
NUMPY
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Numpy tutorial
Numpy tutorialNumpy tutorial
Numpy tutorial
 
Numpy
NumpyNumpy
Numpy
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and Packages
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Python programming : Classes objects
Python programming : Classes objectsPython programming : Classes objects
Python programming : Classes objects
 
Pandas
PandasPandas
Pandas
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 
Datatypes in python
Datatypes in pythonDatatypes in python
Datatypes in python
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Python Functions
Python   FunctionsPython   Functions
Python Functions
 

Viewers also liked

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPykammeyer
 
Доклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваДоклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваDmitry Dushkin
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Piotr Milanowski
 
Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Samarth Shah
 
Python для анализа данных
Python для анализа данныхPython для анализа данных
Python для анализа данныхPython Meetup
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]guestea12c43
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonChetan Khatri
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeasterOpenCascade
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...Ryan Rosario
 

Viewers also liked (14)

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
Доклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваДоклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, Смирнова
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.
 
MongoDB Replication Cluster
MongoDB Replication ClusterMongoDB Replication Cluster
MongoDB Replication Cluster
 
Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]
 
Python для анализа данных
Python для анализа данныхPython для анализа данных
Python для анализа данных
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Bayesian model averaging
Bayesian model averagingBayesian model averaging
Bayesian model averaging
 

Similar to NumPy/SciPy Statistics

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
getting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxgetting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxworkvishalkumarmahat
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonZahid Hasan
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonDr. Volkan OBAN
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Simple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialJin-Hwa Kim
 

Similar to NumPy/SciPy Statistics (20)

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
getting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxgetting started with numpy and pandas.pptx
getting started with numpy and pandas.pptx
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_python
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-Python
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learn
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Simple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorial
 

More from Enthought, Inc.

Talk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupTalk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupEnthought, Inc.
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with PythonEnthought, Inc.
 
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviScientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviEnthought, Inc.
 
February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?Enthought, Inc.
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPythonEnthought, Inc.
 
Scientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsScientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsEnthought, Inc.
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Enthought, Inc.
 
Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Enthought, Inc.
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Enthought, Inc.
 

More from Enthought, Inc. (13)

Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
Talk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupTalk at NYC Python Meetup Group
Talk at NYC Python Meetup Group
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with Python
 
SciPy 2010 Review
SciPy 2010 ReviewSciPy 2010 Review
SciPy 2010 Review
 
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviScientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
 
Chaco Step-by-Step
Chaco Step-by-StepChaco Step-by-Step
Chaco Step-by-Step
 
February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?
 
SciPy India 2009
SciPy India 2009SciPy India 2009
SciPy India 2009
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPython
 
Scientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsScientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: Traits
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009
 
Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

NumPy/SciPy Statistics

  • 1. Statistics in NumPy and SciPy February 12, 2009
  • 2. Enthought Python Distribution (EPD) MORE THAN SIXTY INTEGRATED PACKAGES • Python 2.6 • Repository access • Science (NumPy, SciPy, etc.) • Data Storage (HDF, NetCDF, etc.) • Plotting (Chaco, Matplotlib) • Networking (twisted) • Visualization (VTK, Mayavi) • User Interface (wxPython, Traits UI) • Multi-language Integration • Enthought Tool Suite (SWIG,Pyrex, f2py, weave) (Application Development Tools)
  • 3. Enthought Training Courses Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …
  • 4. PyCon http://us.pycon.org/2010/tutorials/ Introduction to Traits Corran Webster
  • 5. Upcoming Training Classes March 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA March 8 – 12, 2009 Python for Quants London, UK http://www.enthought.com/training/
  • 6. NumPy / SciPy Statistics
  • 7. Statistics overview • NumPy methods and functions – .mean, .std, .var, .min, .max, .argmax, .argmin – median, nanargmax, nanargmin, nanmax, nanmin, nansum • NumPy random number generators • Distribution objects in SciPy (scipy.stats) • Many functions in SciPy – f_oneway, bayes_mvs – nanmedian, nanstd, nanmean
  • 8. NumPy methods • All array objects have some “statistical” methods – .mean(), .std(), .var(), .max(), .min(), .argmax(), .argmin() – Take an axis keyword that allows them to work on N-d arrays (shown with .sum). axis=0 axis=1
  • 9. NumPy functions • median • nan-functions (ignore nans) – nanmax – nanmin – nanargmin – nanargmax – nansum • Can also use masks and regular functions
  • 10. NumPy Random Number Generators • Based on Mersenne twister algorithm • Written using PyRex / Cython • Univariate (over 40) • Multivariate (only 3) – multinomial – dirichlet – multivariate_normal • Convenience functions – rand, randn, randint, ranf
  • 11. Statistics scipy.stats — CONTINUOUS DISTRIBUTIONS over 80 continuous distributions! METHODS pdf entropy cdf nnlf rvs moment ppf freeze stats fit sf isf
  • 12. Using stats objects DISTRIBUTIONS >>> from scipy.stats import norm # Sample normal dist. 100 times. >>> samp = norm.rvs(size=100) >>> x = linspace(-5, 5, 100) # Calculate probability dist. >>> pdf = norm.pdf(x) # Calculate cummulative Dist. >>> cdf = norm.cdf(x) # Calculate Percent Point Function >>> ppf = norm.ppf(x)
  • 13. Distribution objects Every distribution can be modified by loc and scale keywords (many distributions also have required shape arguments to select from a family) LOCATION (loc) --- shift left (<0) or right (>0) the distribution SCALE (scale) --- stretch (>1) or compress (<1) the distribution
  • 14. Example distributions NORM (norm) – N(µ,σ) Only location and scale location mean µ arguments: scale standard deviation σ LOG NORMAL (lognorm) log(S) is N(µ, σ) location offset from zero (rarely used) S is lognormal scale eµ one shape parameter! shape σ
  • 15. Setting location and Scale NORMAL DISTRIBUTION >>> from scipy.stats import norm # Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2) >>> x = linspace(-5, 15, 100) # Calculate probability dist. >>> pdf = dist.pdf(x) # Calculate cummulative dist. >>> cdf = dist.cdf(x) # Get 100 random samples from dist. >>> samp = dist.rvs(size=100) # Estimate parameters from data >>> mu, sigma = norm.fit(samp) .fit returns best >>> print “%4.2f, %4.2f” % (mu, sigma) shape + (loc, scale) 10.07, 1.95 that explains the data
  • 16. Statistics scipy.stats — Discrete Distributions 10 standard discrete distributions (plus any finite RV) METHODS pmf moment cdf entropy rvs freeze ppf stats sf isf
  • 17. Using stats objects CREATING NEW DISCRETE DISTRIBUTIONS # Create loaded dice. >>> from scipy.stats import rv_discrete >>> xk = [1,2,3,4,5,6] >>> pk = [0.3,0.35,0.25,0.05, 0.025,0.025] >>> new = rv_discrete(name='loaded', values=(xk,pk)) # Calculate histogram >>> samples = new.rvs(size=1000) >>> bins=linspace(0.5,5.5,6) >>> subplot(211) >>> hist(samples,bins=bins,normed=True) # Calculate pmf >>> x = range(0,8) >>> subplot(212) >>> stem(x,new.pmf(x))
  • 18. Statistics CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS # Sample two normal distributions # and create a bi-modal distribution >>> rv1 = stats.norm() >>> rv2 = stats.norm(2.0,0.8) >>> samples = hstack([rv1.rvs(size=100), rv2.rvs(size=100)]) # Use a Gaussian kernel density to # estimate the PDF for the samples. >>> from scipy.stats.kde import gaussian_kde >>> approximate_pdf = gaussian_kde(samples) >>> x = linspace(-3,6,200) # Compare the histogram of the samples to # the PDF approximation. >>> hist(samples, bins=25, normed=True) >>> plot(x, approximate_pdf(x),'r')
  • 19. Other functions in scipy.stats • Statistical Tests (Anderson, Wilcox, etc.) • Other calculations (hmean, nanmedian) • Work in progress • A great place to jump in and help
  • 20. Other statistical Resources • scikits.statsmodels • RPy2 • PyMC

Editor's Notes

  1. &lt;&lt;1,Parallel Processing with IPython&gt;&gt;
  2. [toc] level = 2 title = Statistics # end config