Publicité

27 Mar 2023
Publicité

1. PGDM-102 Business Statistics Prof. Aniruddha Ghosh 1 MBA-Fin., M.A-Eco., UGC-NET, Ph.D. (IIFT) Certified R-Programmer (IIT-Kanpur)
2. What do you mean by Statistics? • There are actually many definitions of statistics; however it can be classified in two senses: a) Statistics in plural sense; and b) Statistics in singular sense • As a plural noun, statistics is used for denoting numerical and quantitative information. Thus, in plural sense, it means the same thing as data. For ex. Statistics of scores of cricket match, price statistics, export-import statistics, etc. • In a singular sense, Statistics can be defined as a branch of science which deals with scientific methods of collection, organization, presentation analysis, and interpretation of data obtained by conducting a survey or an experimental study.
3. Some Basic Concepts a) POPULATION: The term is used to denote a well-defined set, group or aggregate of observations relating to a phenomenon under statistical investigation. Again, population can be classified into 2 broad classes: a. Finite e.g., A bag of wheat lying in Warehouse, No. of students in PMLSD B-school, etc. b. Infinite e.g., Blood cell count in a certain region of the brain. b) TARGET POPULATION: They are subsets of population. For. E.g. Wheat bags lying on the Eastern corner of warehouse, No. of Girls in IMS B-school, etc. c) SAMPLE FRAME: The frame from which the actual sample is drawn i.e., if a study is concerned to India but we have narrowed it to NCR region only, it becomes our sample frame. In other words, the sampling frame in a study is the target population only. d) SAMPLE AND SAMPLING: A Sample is a fraction or subset of population drawn through valid statistical procedure so that it can be regarded as representative of the entire population. The valid statistical procedure of drawing a sample from the population is called Sampling.
4. Some Basic Concepts..contd. a) SAMPLING UNIT: The members representing the sample. For e.g., Girls in B-School in this case. b) PARAMETER: Parameter is a descriptive measure of some characteristics of the population. For Ex. Height of students in a class. c) FUNDAMENTALS OF MEASUREMENT a. Construct: A wider term used to cover the broad concept and the underlying variables. For e.g. Firm performance b. Variable: The characteristic on which individuals or objects differ among themselves is called a variable. For e.g. to measure firm performance we generally use variables like RoA, ROCE, Sales growth, etc.
5. Data, Frequency • Collection of meaningful observations is called data. • E.g., Stock prices, height of students, twitter texts, etc. • Frequency of a variable can be defined as number of times an observation occurs in a series of observations. • For ex, we consider the series 5,2,5,3,4,2,3,5,2,4,2. Here, 2 occur 4 times 3 and 4 occur twice each and 5 occur thrice. Hence, the frequencies of 2, 3,4and 5 are respectively 4,2,2,3.
6. Types of Frequency • Simple • Grouped
7. Types of Data • According to statistics • Quantitative • Qualitative • According to decision making • Inferential data • According to time • Time series data • Cross-sectional data • Longitudinal/ panel data • Balanced panel • Unbalanced panel • According to linearity • Linear data • Non-linear data • According to parameters • Parametric • Non-parametric data • According to features • Nominal • Ordinal • Interval • Ratio or continuous data • According to interpretability • Structured • Unstructured • Semi-structured • Highly unstructured data • According to classification • Binary classified data • Grouped data • According to ML /DL algorithms • Supervised learning data • Unsupervised learning data • Reinforcement learning data
8. Data According to Features
9. Descriptive Statistics • Measures of Central Tendency • Measures of Dispersion • Measures of Skewness & Kurtosis
10. Measures of Central Tendency • Mean • Median • Mode • Trimmed Mean • Rolling Mean ≈ Moving Average • Outliers • Outlier detection
11. Detecting Outliers • Outliers are extreme datapoints which fall outside the normal minimum or maximum limits. To detect outlier there are quite a few approaches. a) Box Plots b) Q-Q plot c) P-P plot d) Statistical measures
12. Box-Plots • Where, • Q1 = 1st Quartile / 25th Percentile • Q2 = Me= 2nd Quartile / Median / / 50th Percentile • Q3 = 3rd Quartile / 75th Percentile • Lower extreme / lower fence = Q1 – 1.5*IQR • Upper extreme / upper fence = Q3 + 1.5*IQR Q1 Q2, Me Q3 Q3 +1.5*IQR Q1 - 1.5*IQR
13. Violin Plot • Sometimes Boxplots can be misleading as there is an absence of distribution in the plot. • Hence, we suggest Violin Plot. • Now in real time we can see whether the data is lying outside the whisker and the distribution and hence, the potential outliers can be identified. Lower Fence Upper Fence Q3 Q1 Me Distribution of the data at 95% C.I.
14. Violin Plot • Sometimes Boxplots can be misleading as there is an absence of distribution in the plot. • Hence, we suggest Violin Plot. • Now in real time we can see whether the data is lying outside the whisker and the distribution and hence, the potential outliers can be identified. F A C E D B G H I
Publicité