2. Descriptive Analysis
Descriptive analysis is a sort of data research
that aids in describing, demonstrating, or
helpfully summarizing data points so those
patterns may develop that satisfy all of the
conditions of the data. It is the technique of
identifying patterns and links by utilizing
recent and historical data.
3. Descriptive Analytics
Descriptive Analytics is the examination of data
or content, usually manually performed, to
answer the question “What happened?” (or
What is happening?), characterized by
traditional business intelligence (BI) and
visualizations such as pie charts, bar charts,
line graphs, tables, or generated narratives.
4. Data visualization
Data visualization is the representation of data
through use of common graphics, such as
charts, plots, infographics, and even
animations.
6. DATA VISUALIZATION TOOLS FOR BUSINESS
Microsoft Excel
Google Charts
Tableau can integrate with hundreds of sources to import data
and output dozens of visualization types—from charts to
maps and more. Owned by Sales force, Tableau
boasts millions of users and community members, and it’s
widely used at the enterprise level.
Datawrapper is a tool that, like Google Charts, is used to
generate charts, maps, and other graphics for use online.
Infogram is another popular option that can be used to
generate charts, reports, and maps.
7. Data Queries:
A query is a specific request for information from a database. In robust
database systems in particular, queries make it easier to perceive
trends at a high level or make edits to data in large quantities.
Sorting and filtering data is an essential task for managing and
manipulating large sets of data. Sorting allows you to organize your
data in a specific order, such as alphabetically or numerically, while
filtering allows you to extract specific information from your data
based on certain criteria.
• Sorting is done using the ORDER BY clause, which specifies the
column or columns that you want to sort your data by. Filtering is
done using the WHERE clause, which is used to specify the
conditions that must be met for a row to be included in the result
set.
• Both sorting and filtering can be used together to create powerful
queries that can extract the specific information that you need from
your data. Additionally, more advanced techniques such as using
multiple criteria, wildcards, and subqueries can also be used to
further refine your results.
8. Probability can be used for more than calculating the
likelihood of one event; it can summarize the
likelihood of all possible outcomes. A thing of
interest in probability is called a random variable,
and the relationship between each possible outcome
for a random variable and their probabilities is called
a probability distribution.
9. • When you conduct research about a group of people,
it’s rarely possible to collect data from every person in
that group. Instead, you select a sample. The sample is
the group of individuals who will actually participate in
the research.
• To draw valid conclusions from your results, you have
to carefully decide how you will select a sample that is
representative of the group as a whole. This is called
a sampling method. There are two primary types of
sampling methods that you can use in your research:
• Probability sampling involves random selection,
allowing you to make strong statistical inferences about
the whole group.
• Non-probability sampling involves non-random
selection based on convenience or other criteria,
allowing you to easily collect data.
10. Measures of Location
A fundamental task in many statistical analyses is to estimate a location parameter for
the distribution; i.e., to find a typical or central value that best describes the data.
Definition of Location: The first step is to define what we mean by a typical value. For
univariate data, there are three common definitions:
Mean - the mean is the sum of the data points divided by the number of data points.
That is,
Y¯=∑i=1NYi/N
The mean is that value that is most commonly referred to as the average. We will
use the term average as a synonym for the mean and the term typical value to
refer generically to measures of location.
Median - the median is the value of the point which has half the data smaller than
that point and half the data larger than that point. That is, if X1, X2, ... ,XN is a
random sample sorted from smallest value to largest value, then the median is
defined as:
Y~=Y(N+1)/2if N is odd
Y~=(YN/2+Y(N/2)+1)/2if N is even
Mode - the mode is the value of the random sample that occurs with the greatest
frequency. It is not necessarily unique. The mode is typically used in a qualitative
fashion. For example, there may be a single dominant hump in the data perhaps
two or more smaller humps in the data. This is usually evident from a histogram of
the data.
11. Measures of Dispersion
Measures of dispersion are non-negative real
numbers that help to gauge the spread of data
about a central value. These measures help to
determine how stretched or squeezed the given
data is.
Measures of dispersion can be defined as
positive real numbers that measure how
homogeneous or heterogeneous the given data
is. The value of a measure of dispersion will be 0
if the data points in a data set are the same.
However, as the variability of the data increases
the value of the measures of dispersion also
increases.
12.
13. • Range: Given a data set, the range can be defined as the difference between the
maximum value and the minimum value.
• Variance: The average squared deviation from the mean of the given data set is
known as the variance. This measure of dispersion checks the spread of the data
about the mean.
• Standard Deviation: The square root of the variance gives the standard deviation.
Thus, the standard deviation also measures the variation of the data about the
mean.
• Mean Deviation: The mean deviation gives the average of the data's absolute
deviation about the central points. These central points could be the mean,
median, or mode.
• Quartile Deviation: Quartile deviation can be defined as half of the difference
between the third quartile and the first quartile in a given data set.
Relative Measures of Dispersion
If the data of separate data sets have different units and need to be compared then
relative measures of dispersion are used. The measures are expressed in the form
of ratios and percentages thus, making them unitless. Some of the relative
measures of dispersion are given below:
• Coefficient of Range: It is the ratio of the difference between the highest and
lowest value in a data set to the sum of the highest and lowest value.
• Coefficient of Variation: It is the ratio of the standard deviation to the mean of the
data set. It is expressed in the form of a percentage.
• Coefficient of Mean Deviation: This can be defined as the ratio of the mean
deviation to the value of the central point from which it is calculated.
• Coefficient of Quartile Deviation: It is the ratio of the difference between the
third quartile and the first quartile to the sum of the third and first quartiles.
14. Hypothesis Testing can be defined as a statistical tool that is used to
identify if the results of an experiment are meaningful or not. It
involves setting up a null hypothesis and an alternative hypothesis.
These two hypotheses will always be mutually exclusive. This means
that if the null hypothesis is true then the alternative hypothesis is
false and vice versa. An example of hypothesis testing is setting up a
test to check if a new medicine works on a disease in a more
efficient manner.
Null Hypothesis
The null hypothesis is a concise mathematical statement that is used to
indicate that there is no difference between two possibilities. In
other words, there is no difference between certain characteristics
of data. This hypothesis assumes that the outcomes of an
experiment are based on chance alone. It is denoted as H0.
Alternative Hypothesis
The alternative hypothesis is an alternative to the null hypothesis. It is
used to show that the observations of an experiment are due to
some real effect. It indicates that there is a statistical significance
between two possible outcomes and can be denoted as H1.
15. • Hypothesis Testing Chi Square
• The Chi square test is a hypothesis testing method that is used to check whether the variables in a
population are independent or not. It is used when the test statistic is chi-squared distributed.
• The Chi-Square is denoted by χ2. The chi-square formula is:
• χ2 = ∑(Oi – Ei)2/Ei
• where
• Oi = observed value (actual value)
• Ei = expected value.
• One Tailed Hypothesis Testing
• One tailed hypothesis testing is done when the rejection region is only in one direction. It can also
be known as directional hypothesis testing because the effects can be tested in one direction only.
This type of testing is further classified into the right tailed test and left tailed test.
• Right Tailed Hypothesis Testing
• The right tail test is also known as the upper tail test. This test is used to check whether the
population parameter is greater than some value. The null and alternative hypotheses for this test
are given as follows:
• H0: The population parameter is ≤ some value
• H1: The population parameter is > some value.
• Left Tailed Hypothesis Testing
• The left tail test is also known as the lower tail test. It is used to check whether the population
parameter is less than some value. The hypotheses for this hypothesis testing can be written as
follows:
• H0: The population parameter is ≥ some value
• H1: The population parameter is < some value.
• The null hypothesis is rejected if the test statistic has a value lesser than the critical value.
16. Analysis of variance (ANOVA)
The hypothesis is based on available information
and the investigator's belief about the
population parameters. The specific test
considered here is called Analysis of
Variance (ANOVA) and is a test of hypothesis
that is appropriate to compare means of a
continuous variable in two or more
independent comparison groups.