SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Data Analysis Course
Basics & Terminology(Version-1)
 Venkat Reddy
Data Analysis Course
• Data analysis design document
•
•   Descriptive statistics
•   Data exploration, validation & sanitization




                                                               2
                                                                 Venkat Reddy
                                                          Data Analysis Course
•   Probability distributions examples and applications
•   Simple correlation and regression analysis
•   Multiple liner regression analysis
•   Logistic regression analysis
•   Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                          3
                                                                            Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy
What is “Statistics”?
• Statistics is the science of data that involves:
  •   Collecting
  •   Classifying
  •   Summarizing
  •   Organizing and




                                                                               Venkat Reddy
                                                                        Data Analysis Course
  •   Interpretation
Of numerical information.
• Examples:
  •   Cricket batting averages
  •   Stock price
  •   Climatology data such as rainfall amounts, average temperatures
  •   Marketing information
                                                                               4
  •   Gambling?
Key Terms
• What is Data?
   • facts or information that is relevant or appropriate to a decision
     maker
• Population?
   • the totality of objects under consideration




                                                                                 Venkat Reddy
                                                                          Data Analysis Course
• Sample?
   • a portion of the population that is selected for analysis
• Parameter?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the population
• Statistic?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the sample                                                5
Variables
• Traits or characteristics that can change values from case to
  case.
• Examples:
  •   Age




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  •   Gender
  •   Income
  •   Social class




                                                                         6
Types Of Variables
• In causal relationships:
          CAUSE               EFFECT
   independent variable  dependent variable
• Independent variable: is a variable that can be controlled or




                                                                          Venkat Reddy
                                                                   Data Analysis Course
  manipulated.
• Dependent variable: is a variable that cannot be controlled or
  manipulated. Its values are predicted from the independent
  variable.
• Discrete variables are measured in units that cannot be
  subdivided. Example: Number of children
• Continuous variables are measured in a unit that can be
  subdivided infinitely. Example: Height                                  7
Lab
•   Print product sales data
•   What are cause variables, what are effect variables
•   Identify the continuous & discrete variables
•   What is the population




                                                                              Venkat Reddy
                                                                       Data Analysis Course
•   Filter data and pick a sample
•   Calculate a parameter (Mean of the population)
•   Calculate a statistic
•   How close is the statistics to parameter? Is it a good estimate?
•   Self study: Randomly pick 10 samples, calculate mean for each
    sample. Find the mean of the means & see whether it is a
    good estimate of the population mean                                      8
Descriptive Statistics
•   Gives us the overall picture about data
•   Presents data in the form of tables, charts and graphs
•   Includes summary data
•   Avoids inferences




                                                                    Venkat Reddy
                                                             Data Analysis Course
•   Examples:
    • Measures of central location
       • Mean, median, mode and midrange
    • Measures of Variation
       • Variance, Standard Deviation, z-scores



                                                                    9
Details later
Lab
•   Download product sales data
•   Run proc means to print the descriptive statistics
•   Run proc univariate to print the descriptive statistics
•   Identify Measures of central location




                                                                     Venkat Reddy
                                                              Data Analysis Course
•   Identify Measures of variation




                                                                 10
Inferential Statistics
• Take decision on overall population using a sample
• “Sampled” data are incomplete but can still be representative
  of the population
• Permits the making of generalizations (inferences) about the




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  data
• Probability theory is a major tool used to analyze sampled
  data




-Details later
                                                                      11
Predictive Modeling
• The science of predicting future outcomes based on historical
  events.
• Model Building: “Developing set of equations or
  mathematical formulation to forecast future
  behaviors based on current or historical data.”




                                                                         Venkat Reddy
                                                                  Data Analysis Course
• Regression, logistic Regression, time series analysis etc.,




                                                                     12
-Details later
Statistical Computer Packages

 Typical Software
   •   SAS
   •   R
   •   SPSS
   •   MINITAB
   •   Excel
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com




                                      14
21.venkat@gmail.com




                                        Venkat Reddy
                                 Data Analysis Course
+91 9886 768879

Contenu connexe

Plus de Venkata Reddy Konasani

Plus de Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Introduction to Statistical Analysis

  • 1. Data Analysis Course Basics & Terminology(Version-1) Venkat Reddy
  • 2. Data Analysis Course • Data analysis design document • • Descriptive statistics • Data exploration, validation & sanitization 2 Venkat Reddy Data Analysis Course • Probability distributions examples and applications • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other 3 Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy
  • 4. What is “Statistics”? • Statistics is the science of data that involves: • Collecting • Classifying • Summarizing • Organizing and Venkat Reddy Data Analysis Course • Interpretation Of numerical information. • Examples: • Cricket batting averages • Stock price • Climatology data such as rainfall amounts, average temperatures • Marketing information 4 • Gambling?
  • 5. Key Terms • What is Data? • facts or information that is relevant or appropriate to a decision maker • Population? • the totality of objects under consideration Venkat Reddy Data Analysis Course • Sample? • a portion of the population that is selected for analysis • Parameter? • a summary measure (e.g., mean) that is computed to describe a characteristic of the population • Statistic? • a summary measure (e.g., mean) that is computed to describe a characteristic of the sample 5
  • 6. Variables • Traits or characteristics that can change values from case to case. • Examples: • Age Venkat Reddy Data Analysis Course • Gender • Income • Social class 6
  • 7. Types Of Variables • In causal relationships: CAUSE  EFFECT independent variable  dependent variable • Independent variable: is a variable that can be controlled or Venkat Reddy Data Analysis Course manipulated. • Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable. • Discrete variables are measured in units that cannot be subdivided. Example: Number of children • Continuous variables are measured in a unit that can be subdivided infinitely. Example: Height 7
  • 8. Lab • Print product sales data • What are cause variables, what are effect variables • Identify the continuous & discrete variables • What is the population Venkat Reddy Data Analysis Course • Filter data and pick a sample • Calculate a parameter (Mean of the population) • Calculate a statistic • How close is the statistics to parameter? Is it a good estimate? • Self study: Randomly pick 10 samples, calculate mean for each sample. Find the mean of the means & see whether it is a good estimate of the population mean 8
  • 9. Descriptive Statistics • Gives us the overall picture about data • Presents data in the form of tables, charts and graphs • Includes summary data • Avoids inferences Venkat Reddy Data Analysis Course • Examples: • Measures of central location • Mean, median, mode and midrange • Measures of Variation • Variance, Standard Deviation, z-scores 9 Details later
  • 10. Lab • Download product sales data • Run proc means to print the descriptive statistics • Run proc univariate to print the descriptive statistics • Identify Measures of central location Venkat Reddy Data Analysis Course • Identify Measures of variation 10
  • 11. Inferential Statistics • Take decision on overall population using a sample • “Sampled” data are incomplete but can still be representative of the population • Permits the making of generalizations (inferences) about the Venkat Reddy Data Analysis Course data • Probability theory is a major tool used to analyze sampled data -Details later 11
  • 12. Predictive Modeling • The science of predicting future outcomes based on historical events. • Model Building: “Developing set of equations or mathematical formulation to forecast future behaviors based on current or historical data.” Venkat Reddy Data Analysis Course • Regression, logistic Regression, time series analysis etc., 12 -Details later
  • 13. Statistical Computer Packages Typical Software • SAS • R • SPSS • MINITAB • Excel
  • 14. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 14 21.venkat@gmail.com Venkat Reddy Data Analysis Course +91 9886 768879