Publicité

New Microsoft PowerPoint Presentation.pptx

2 Nov 2022
Publicité

Contenu connexe

Publicité

New Microsoft PowerPoint Presentation.pptx

  1. DATA ANALYTICS ASSIGNMENT NAME - SAMIR KUMAR MTECH (INDUSTRIAL ENGINEERING AND MANAGEMENT)
  2. Data analysis • Data analysis is defined as the technique that analyse the data to enhance the productivity and the business growth by involving process like cleansing, transforming, inspecting and modelling data to perform market analysis, to gather the hidden insight of the data, to improve business study and for the generation of the report based upon the available data using the data analysis tools such as Tableau, Power BI, R and Python, Apache Spark, etc. • It refers to the technique to analyze data to enhance productivity and grow business. It is the process of inspecting, cleansing, transforming, and modeling the data.
  3. Why we Need Data Analysis? We need Data Analysis basically for the reasons mentioned below: • Gather hidden insights. • To generate reports based on the available data. • Perform market analysis. • Improvement of business Strategy.
  4. Decision Science is the collection of quantitative techniques used to inform decision-making at the individual and population levels. It includes decision analysis, risk analysis, cost-benefit and cost-effectiveness analysis, constrained optimization, simulation modeling, and behavioral decision theory, as well as parts of operations research, microeconomics, statistical inference, management control, cognitive and social psychology, and computer science Decision Science
  5. Data collection Processing & Modeling Analysis & Insight Data Intelligence Information DATA ANALYSIS DATA CONVERSION
  6. Reality Raw data collection Data processing & Data cleaning Insight visualization Data product Data analysis & Models
  7. Data analytics • Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. • Data analytics is often confused with data analysis. While these are related terms, they aren’t exactly the same. In fact, data analysis is a subcategory of data analytics that deals specifically with extracting meaning from data. Data analytics, as a whole, includes processes beyond analysis, including data science (using data to theorize and forecast) and data engineering (building data systems).
  8. So why Data Analytics? With Data Analytics businesses can understand hidden patterns and meanings within the behavior of the customer. For businesses, 1. Informed Decision Making. 2. More Effective Marketing 3. More Efficient Operations 4. Cutting Costs.
  9. What is Sampling? Sampling is a method that allows us to get information about the population based on the statistics from a subset of the population (sample), without having to investigate every individual.
  10. Why do we need Sampling? Sampling is done to draw conclusions about populations from samples, and it enables us to determine a population’s characteristics by directly observing only a portion (or sample) of the population. • Selecting a sample requires less time than selecting every item in a population • Sample selection is a cost-efficient method • Analysis of the sample is less cumbersome and more practical than an analysis of the entire population
  11. Population vs sample • The population is the entire group that you want to draw conclusions about. • The sample is the specific group of individuals that you will collect data from. The population can be defined in terms of geographical location, age, income, and many other characteristics.
  12. Learn how to determine sample size Stage 1: Consider your sample size variables 1. Population size 2. Margin of error (confidence interval) 3. Confidence level 4. Standard deviation Stage 2: Calculate sample size 5. Find your Z-score 6. Use the sample size formula
  13. Different Types of Sampling Techniques • Probability Sampling: In probability sampling, every element of the population has an equal chance of being selected. Probability sampling gives us the best chance to create a sample that is truly representative of the population • Non-Probability Sampling: In non-probability sampling, all elements do not have an equal chance of being selected. Consequently, there is a significant risk of ending up with a non- representative sample which does not produce generalizable results
  14. Types of Probability Sampling 1. Simple Random Sampling This is a type of sampling technique you must have come across at some point. Here, every individual is chosen entirely by chance and each member of the population has an equal chance of being selected. Simple random sampling reduces selection bias. One big advantage of this technique is that it is the most direct method of probability sampling. But it comes with a caveat – it may not select enough individuals with our characteristics of interest. Monte Carlo methods use repeated random sampling for the estimation of unknown parameters
  15. 2.Systematic Sampling In this type of sampling, the first individual is selected randomly and others are selected using a fixed ‘sampling interval’. Let’s take a simple example to understand this. Say our population size is x and we have to select a sample size of n. Then, the next individual that we will select would be x/nth intervals away from the first individual. We can select the rest in the same way. Systematic sampling is more convenient than simple random sampling. However, it might also lead to bias if there is an underlying pattern in which we are selecting items from the population (though the chances of that happening are quite rare).
  16. 3.Stratified Sampling In this type of sampling, we divide the population into subgroups (called strata) based on different traits like gender, category, etc. And then we select the sample(s) from these subgroups: We use this type of sampling when we want representation from all the subgroups of the population. However, stratified sampling requires proper knowledge of the characteristics of the population.
  17. 4.Cluster Sampling In a clustered sample, we use the subgroups of the population as the sampling unit rather than individuals. The population is divided into subgroups, known as clusters, and a whole cluster is randomly selected to be included in the study: In the above example, we have divided our population into 5 clusters. Each cluster consists of 4 individuals and we have taken the 4th cluster in our sample. We can include more clusters as per our sample size. This type of sampling is used when we focus on a specific region or area.
  18. Types of Non-Probability Sampling 1.Convenience Sampling This is perhaps the easiest method of sampling because individuals are selected based on their availability and willingness to take part. Here, let’s say individuals numbered 4, 7, 12, 15 and 20 want to be part of our sample, and hence, we will include them in the sample. Convenience sampling is prone to significant bias, because the sample may not be the representation of the specific characteristics such as religion or, say the gender, of the population.
  19. 2.Quota Sampling In this type of sampling, we choose items based on predetermined characteristics of the population. Consider that we have to select individuals having a number in multiples of four for our sample: Therefore, the individuals numbered 4, 8, 12, 16, and 20 are already reserved for our sample. In quota sampling, the chosen sample might not be the best representation of the characteristics of the population that weren’t considered
  20. 3.Judgment Sampling It is also known as selective sampling. It depends on the judgment of the experts when choosing whom to ask to participate. Suppose, our experts believe that people numbered 1, 7, 10, 15, and 19 should be considered for our sample as they may help us to infer the population in a better way. As you can imagine, quota sampling is also prone to bias by the experts and may not necessarily be representative.
  21. 4.Snowball Sampling I quite like this sampling technique. Existing people are asked to nominate further people known to them so that the sample increases in size like a rolling snowball. This method of sampling is effective when a sampling frame is difficult to identify. Here, we had randomly chosen person 1 for our sample, and then he/she recommended person 6, and person 6 recommended person 11, and so on. 1->6->11->14->19 There is a significant risk of selection bias in snowball sampling, as the referenced individuals will share common traits with the person who recommends them.
  22. Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. Statistics
  23. 1. Descriptive Statistics : Descriptive statistics uses data that provides a description of the population either through numerical calculation or graph or table. It provides a graphical summary of data. It is simply used for summarizing objects, etc. There are two categories in this as following below. (a). Measure of central tendency – Measure of central tendency is also known as summary statistics that is used to represents the center point or a particular value of a data set or sample set. In statistics, there are three common measures of central tendency as shown below: (i) Mean : It is measure of average of all value in a sample set. For example,
  24. (ii) Median : It is measure of central value of a sample set. In these, data set is ordered from lowest to highest value and then finds exact middle. For example, (iii) Mode : It is value most frequently arrived in sample set. The value repeated most of time in central set is actually mode. For example,
  25. (b). Measure of Variability – Measure of Variability is also known as measure of dispersion and used to describe variability in a sample or population. In statistics, there are three common measures of variability as shown below: (i) Range : It is given measure of how to spread apart values in sample set or data set. Range = Maximum value - Minimum value (ii) Variance : It simply describes how much a random variable defers from expected value and it is also computed as square of deviation. S2= ∑n i=1 [(xi - ͞ x)2 ÷ n] In these formula, n represent total data points, ͞x represent mean of data points and xi represent individual data points. (iii) Dispersion : It is measure of dispersion of set of data from its mean. σ= √ (1÷n) ∑n i=1 (xi - μ)2
  26. 2. Inferential Statistics : • Inferential Statistics makes inference and prediction about population based on a sample of data taken from population. It generalizes a large dataset and applies probabilities to draw a conclusion. • It is simply used for explaining meaning of descriptive stats. • It is simply used to analyze, interpret result, and draw conclusion. • Inferential Statistics is mainly related to and associated with hypothesis testing whose main target is to reject null hypothesis. • Hypothesis testing is a type of inferential procedure that takes help of sample data to evaluate and assess credibility of a hypothesis about a population. • Inferential statistics are generally used to determine how strong relationship is within sample. But it is very difficult to obtain a population list and draw a random sample. Inferential statistics can be done with help of various steps as given below: • Obtain and start with a theory. • Generate a research hypothesis. • Operationalize or use variables • Identify or find out population to which we can apply study material. • Generate or form a null hypothesis for these population. • Collect and gather a sample of children from population and simply run study. • Then, perform all tests of statistical to clarify if obtained characteristics of sample are sufficiently different from what would be expected under null hypothesis so that we can be able to find and reject null hypothesis.
  27. Types of inferential statistics – Various types of inferential statistics are used widely nowadays and are very easy to interpret. These are given below: • One sample test of difference/One sample hypothesis test • Confidence Interval • Contingency Tables and Chi-Square Statistic • T-test or Anova • Pearson Correlation • Bi-variate Regression • Multi-variate Regression
  28. Prescriptive analytics is a process that analyzes data and provides instant recommendations on how to optimize business practices to suit multiple predicted outcomes. In essence, prescriptive analytics takes the “what we know” (data), comprehensively understands that data to predict what could happen, and suggests the best steps forward based on informed simulations. Predictive analytics: Predictive analytics applies mathematical models to the current data to inform (predict) future behavior. It is the “what could happen."
  29. Types of Variables in Statistics 1. Quantitative Variables: Sometimes referred to as “numeric” variables, these are variables that represent a measurable quantity. Examples include: • Number of students in a class • Number of square feet in a house • Population size of a city • Age of an individual • Height of an individual 2. Qualitative Variables: Sometimes referred to as “categorical” variables, these are variables that take on names or labels and can fit into categories. Examples include: • Eye color (e.g. “blue”, “green”, “brown”) • Gender (e.g. “male”, “female”) • Breed of dog (e.g. “lab”, “bulldog”, “poodle”) • Level of education (e.g. “high school”, “Associate’s degree”, “Bachelor’s degree”) • Marital status (e.g. “married”, “single”, “divorced”) •
  30. Scales of measurements
  31. Nominal Scale A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or “labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric variables or the numbers that do not have any value. Characteristics of Nominal Scale • A nominal scale variable is classified into two or more categories. In this measurement mechanism, the answer should fall into either of the classes. • It is qualitative. The numbers are used here to identify the objects. • The numbers don’t define the object characteristics. The only permissible aspect of numbers in the nominal scale is “counting.” Example: An example of a nominal scale measurement is given below: What is your gender? M- Male F- Female Here, the variables are used as tags, and the answer to this question should be either M or F.
  32. Ordinal Scale The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data without establishing the degree of variation between them. Ordinal represents the “order.” Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also ranked. Characteristics of the Ordinal Scale • The ordinal scale shows the relative ranking of the variables • It identifies and describes the magnitude of a variable • Along with the information provided by the nominal scale, ordinal scales give the rankings of those variables • The interval properties are not known • The surveyors can quickly analyse the degree of agreement concerning the identified order of variables Example: Ranking of school students – 1st, 2nd, 3rd, etc. Ratings in restaurants Evaluating the frequency of occurrences • Very often • Often Assessing the degree of agreement • Totally agree • Agree • Totally disagree
  33. Interval Scale The interval scale is the 3rd level of measurement scale. It is defined as a quantitative measurement scale in which the difference between the two variables is meaningful. In other words, the variables are measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary. Characteristics of Interval Scale: • The interval scale is quantitative as it can quantify the difference between the values • It allows calculating the mean and median of the variables • To understand the difference between the variables, you can subtract the values between the variables • The interval scale is the preferred scale in Statistics as it helps to assign any numerical values to arbitrary assessment such as feelings, calendar types, etc. Example: • Likert Scale • Net Promoter Score (NPS) • Bipolar Matrix Table
  34. Ratio Scale The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a unique feature. It possesses the character of the origin or zero points. Characteristics of Ratio Scale: • Ratio scale has a feature of absolute zero • It doesn’t have negative numbers, because of its zero-point feature • It affords unique opportunities for statistical analysis. The variables can be orderly added, subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio scale. • Ratio scale has unique and useful properties. One such feature is that it allows unit conversions like kilogram – calories, gram – calories, etc. Example: An example of a ratio scale is: What is your weight in Kgs? Less than 55 kgs 55 – 75 kgs 76 – 85 kgs 86 – 95 kgs More than 95 kgs
Publicité