SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Statistics for
Data Scientists
Agenda
Revision
Data
Statistics -Descriptive, Central Tendency, Variation, Distributions
Data Mining
Basics of Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
the culture of academia, which does not reward researchers for understanding technology.
DANGER ZONE- this overlap of skills gives people the ability to create what appears to be
a legitimate analysis without any understanding of how they got there or
what they have created
Being able to manipulate text files at the command-line,
understanding vectorized operations, thinking algorithmically;
these are the hacking skills that make for a successful data hacker.
data plus math and statistics only gets you machine learning,
which is great if that is what you are interested in, but not if you are doing data science
What is Business Analytics
Definition – study of business data using statistical techniques and
programming for creating decision support and insights for achieving
business goals
Predictive- To predict the future.
Descriptive- To describe the past.
Data
Data is a set of values of qualitative or quantitative variables. An example of qualitative
data would be an anthropologist's handwritten notes about her interviews. data is
collected by a huge range of organizations and institutions, including businesses (e.g.,
sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment
rates, literacy rates) and non-governmental organizations (e.g., censuses of the number
of homeless people by non-profit organizations). Data is measured, collected and
reported, and analyzed, whereupon it can be visualized using graphs, images or other
analysis tools.
https://en.wikipedia.org/wiki/Data
Data is distinct pieces of information, usually formatted in a special way. All software is
divided into two general categories: data and programs . Programs are collections of
instructions for manipulating data.Data can exist in a variety of forms -- as numbers or
text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored
in a person's mind.
http://www.webopedia.com/TERM/D/data.html
Data
https://en.oxforddictionaries.com/definition/data Definition of data in English:
data
noun
[mass noun] Facts and statistics collected together for reference or analysis:
‘there is very little data available’
The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted
in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
Philosophy Things known or assumed as facts, making the basis of reasoning or calculation.
Variable
Something that varies
Variable
Ordinal variables are variables that have two or more categories just like nominal variables only the categories can also be ordered or
ranked eg Excellent- Horrible. Dichotomous variables are nominal variables which have only two categories or levels. Nominal
variables are variables that have two or more categories, but which do not have an intrinsic order.
Interval variables are variables for which their central characteristic is that they can be measured along a continuum and they have a
numerical value (for example, temperature measured in degrees Celsius or Fahrenheit).
Ratio variables are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that
variable. a distance of ten metres is twice the distance of 5 metres.
https://statistics.laerd.com/statistical-guides/types-of-variable.php
.
Central Tendency
Mean
Arithmetic Mean- the sum of the values divided by the number of values.
The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and
not their sum (as is the case with the arithmetic mean) e.g. rates of growth.
Median
the median is the number separating the higher half of a data sample, a population, or a probability distribution, from the lower
hal
Mode-
The "mode" is the value that occurs most often.
Dispersion
Range
the range of a set of data is the difference between the largest and smallest values.
Variance
mean of squares of differences of values from mean
Standard Deviation
square root of its variance
Frequency
a frequency distribution is a table that displays the frequency of various outcomes in a sample.
Distribution
The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of
the data and how often they occur. When a distribution of categorical data is organized, you see the number or percentage of
individuals in each group.
http://www.dummies.com/education/math/statistics/what-the-distribution-tells-you-about-a-statistical-data-set/
Distributions
Normal
The simplest case of a normal distribution is known as the standard normal distribution. This is a special case where μ=0 and σ=1,
Skewed Distribution
Skewed Distribution
skewness is a measure of
the asymmetry of the
probability distribution of a
real-valued random variable
about its mean. The
skewness value can be
positive or negative, or even
undefined.
Image
https://en.wikipedia.org/wiki/F
ile:Negative_and_positive_sk
ew_diagrams_(English).svg
Skewed Distribution
kurtosis is a measure of the
"tailedness" of the probability distribution
of a real-valued random variable. kurtosis
is a descriptor of the shape of a probability
distribution
Image
http://www.itl.nist.gov/div898/handbook/eda/
section3/eda35b.htm
Skewed Distribution
skewness
returns value of
skewness,
kurtosis
returns value of kurtosis,
https://cran.r-project.org/
web/packages/moments
/moments.pdf
Image
http://www.janzengroup.
net/stats/lessons/descrip
tive.html
Distributions
Bernoulli
Distribution of a random variable which takes value 1 with success probability and value 0 with failure probability. It
can be used, for example, to represent the toss of a coin
Distributions
Chi Square
the distribution of a sum of the squares of k independent standard normal random variables.
Distributions
Poisson
a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time
and/or space if these events occur with a known average rate and independently of the time since the last event
Probability
Probability Distribution
The probability density function (pdf) of the normal distribution, also called Gaussian or "bell curve", the most important
continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area
under the curve.
Refresher in Statistics
Using RCmdr for Statistics
Using RCmdr for Statistics
Using RCmdr for Statistics
Using RCmdr
Central Limit Theorem
Central Limit Theorem -
In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently
large number of iterates of independentrandom variables, each with a well-defined expected value and well-defined variance, will
be approximately normally distributed, regardless of the underlying distribution.
Hypothesis testing
Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The
usual process of hypothesis testing consists of four steps.
1. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the
alternative hypothesis (commonly, that the observations show a real effect combined with a component of
chance variation).
2. Identify a test statistic that can be used to assess the truth of the null hypothesis.
3. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed
would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the
evidence against the null hypothesis.
4. Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the
observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is
valid.
http://mathworld.wolfram.com/HypothesisTesting.html
Hypothesis testing
http://cmapskm.ihmc.us/rid=1052458963987_678930513_8647/Hypothesis%20testing.cmap
Hypothesis testing
Hypothesis testing
Hypothesis testing
T test
http://statistics.berkeley.edu/computing/r-t-tests
> x = rnorm(10)
> y = rnorm(10)
> t.test(x,y)
> ttest = t.test(x,y)
> names(ttest)
> ttest$statistic
Chi Square Distribution
Problem
Find the 95th
percentile of the Chi-Squared distribution with 7 degrees of freedom.
Solution
We apply the quantile function qchisq of the Chi-Squared distribution against the decimal values 0.95.
> qchisq(.95, df=7) # 7 degrees of freedom
[1] 14.067
http://www.r-tutor.com/elementary-statistics/probability-distributions/chi-squared-distribution
Normal Distribution
we are looking for the percentage of students scoring
higher than 84 , we apply the function pnorm of the normal
distribution with mean 72 and standard deviation 15.2. We
are interested in the upper tail of the normal distribution.
> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)
[1] 0.21492
Student T Distribution
Problem
Find the 2.5th
and 97.5th
percentiles of the Student t distribution with 5 degrees of freedom.
Solution
We apply the quantile function qt of the Student t distribution against the decimal values 0.025 and 0.975.
> qt(c(.025, .975), df=5) # 5 degrees of freedom
[1] -2.5706 2.5706
Some code
http://rpubs.com/newajay/stats1
Some code
http://rpubs.com/newajay/stats4
Bayes Theorem
https://artax.karlin.mff.cuni.cz/r-help/library/LaplacesDemon/html/BayesTheorem.html
Bayes Theorem
https://en.wikipedia.org/wiki/Bayes'_theorem

Contenu connexe

Tendances

Logistic regression
Logistic regressionLogistic regression
Logistic regression
saba khan
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Simplilearn
 

Tendances (20)

Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Random forest
Random forestRandom forest
Random forest
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
 

Similaire à Statistics for data scientists

B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminology
aswhite
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisC
sharondabriggs
 

Similaire à Statistics for data scientists (20)

Data science
Data scienceData science
Data science
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docx
 
Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminology
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisC
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
Presentation of BRM.pptx
Presentation of BRM.pptxPresentation of BRM.pptx
Presentation of BRM.pptx
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 

Plus de Ajay Ohri

Plus de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 

Dernier

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Statistics for data scientists

  • 2. Agenda Revision Data Statistics -Descriptive, Central Tendency, Variation, Distributions Data Mining
  • 3. Basics of Data Science http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram the culture of academia, which does not reward researchers for understanding technology. DANGER ZONE- this overlap of skills gives people the ability to create what appears to be a legitimate analysis without any understanding of how they got there or what they have created Being able to manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically; these are the hacking skills that make for a successful data hacker. data plus math and statistics only gets you machine learning, which is great if that is what you are interested in, but not if you are doing data science
  • 4. What is Business Analytics Definition – study of business data using statistical techniques and programming for creating decision support and insights for achieving business goals Predictive- To predict the future. Descriptive- To describe the past.
  • 5. Data Data is a set of values of qualitative or quantitative variables. An example of qualitative data would be an anthropologist's handwritten notes about her interviews. data is collected by a huge range of organizations and institutions, including businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations). Data is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. https://en.wikipedia.org/wiki/Data Data is distinct pieces of information, usually formatted in a special way. All software is divided into two general categories: data and programs . Programs are collections of instructions for manipulating data.Data can exist in a variety of forms -- as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. http://www.webopedia.com/TERM/D/data.html
  • 6. Data https://en.oxforddictionaries.com/definition/data Definition of data in English: data noun [mass noun] Facts and statistics collected together for reference or analysis: ‘there is very little data available’ The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. Philosophy Things known or assumed as facts, making the basis of reasoning or calculation.
  • 8. Variable Ordinal variables are variables that have two or more categories just like nominal variables only the categories can also be ordered or ranked eg Excellent- Horrible. Dichotomous variables are nominal variables which have only two categories or levels. Nominal variables are variables that have two or more categories, but which do not have an intrinsic order. Interval variables are variables for which their central characteristic is that they can be measured along a continuum and they have a numerical value (for example, temperature measured in degrees Celsius or Fahrenheit). Ratio variables are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable. a distance of ten metres is twice the distance of 5 metres. https://statistics.laerd.com/statistical-guides/types-of-variable.php .
  • 9. Central Tendency Mean Arithmetic Mean- the sum of the values divided by the number of values. The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth. Median the median is the number separating the higher half of a data sample, a population, or a probability distribution, from the lower hal Mode- The "mode" is the value that occurs most often.
  • 10. Dispersion Range the range of a set of data is the difference between the largest and smallest values. Variance mean of squares of differences of values from mean Standard Deviation square root of its variance Frequency a frequency distribution is a table that displays the frequency of various outcomes in a sample.
  • 11. Distribution The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur. When a distribution of categorical data is organized, you see the number or percentage of individuals in each group. http://www.dummies.com/education/math/statistics/what-the-distribution-tells-you-about-a-statistical-data-set/
  • 12. Distributions Normal The simplest case of a normal distribution is known as the standard normal distribution. This is a special case where μ=0 and σ=1,
  • 14. Skewed Distribution skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined. Image https://en.wikipedia.org/wiki/F ile:Negative_and_positive_sk ew_diagrams_(English).svg
  • 15. Skewed Distribution kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. kurtosis is a descriptor of the shape of a probability distribution Image http://www.itl.nist.gov/div898/handbook/eda/ section3/eda35b.htm
  • 16. Skewed Distribution skewness returns value of skewness, kurtosis returns value of kurtosis, https://cran.r-project.org/ web/packages/moments /moments.pdf Image http://www.janzengroup. net/stats/lessons/descrip tive.html
  • 17. Distributions Bernoulli Distribution of a random variable which takes value 1 with success probability and value 0 with failure probability. It can be used, for example, to represent the toss of a coin
  • 18. Distributions Chi Square the distribution of a sum of the squares of k independent standard normal random variables.
  • 19. Distributions Poisson a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event
  • 20. Probability Probability Distribution The probability density function (pdf) of the normal distribution, also called Gaussian or "bell curve", the most important continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.
  • 22. Using RCmdr for Statistics
  • 23. Using RCmdr for Statistics
  • 24. Using RCmdr for Statistics
  • 26. Central Limit Theorem Central Limit Theorem - In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independentrandom variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.
  • 27. Hypothesis testing Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps. 1. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). 2. Identify a test statistic that can be used to assess the truth of the null hypothesis. 3. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. 4. Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid. http://mathworld.wolfram.com/HypothesisTesting.html
  • 33. T test http://statistics.berkeley.edu/computing/r-t-tests > x = rnorm(10) > y = rnorm(10) > t.test(x,y) > ttest = t.test(x,y) > names(ttest) > ttest$statistic
  • 34. Chi Square Distribution Problem Find the 95th percentile of the Chi-Squared distribution with 7 degrees of freedom. Solution We apply the quantile function qchisq of the Chi-Squared distribution against the decimal values 0.95. > qchisq(.95, df=7) # 7 degrees of freedom [1] 14.067 http://www.r-tutor.com/elementary-statistics/probability-distributions/chi-squared-distribution
  • 35. Normal Distribution we are looking for the percentage of students scoring higher than 84 , we apply the function pnorm of the normal distribution with mean 72 and standard deviation 15.2. We are interested in the upper tail of the normal distribution. > pnorm(84, mean=72, sd=15.2, lower.tail=FALSE) [1] 0.21492
  • 36. Student T Distribution Problem Find the 2.5th and 97.5th percentiles of the Student t distribution with 5 degrees of freedom. Solution We apply the quantile function qt of the Student t distribution against the decimal values 0.025 and 0.975. > qt(c(.025, .975), df=5) # 5 degrees of freedom [1] -2.5706 2.5706