SlideShare une entreprise Scribd logo
1  sur  90
Télécharger pour lire hors ligne
STATISTICAL HYDROLOGY
MAL1303/MKAG1273
Summarizing Data
Dr. Shamsuddin Shahid
Associate Professor
Department of Hydraulics and Hydrology
Faculty of Civil Engineering
Room No.: M46-332;
Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Office Location: C09-Room No. 219
11/23/2015 Shamsuddin Shahid, FKA, UTM11/23/2015
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
What is Statistics
Statistics is the science of Learning from Data.
Statistics is the science of collecting data and analyzing
(modeling) data for the purpose of decision making and
scientific discovery when the available information is both
limited and variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Descriptive Statistics and Inferential Statistics
Descriptive statistics refers to statistical techniques used to
summarize and describe a data set. Measures of central tendency,
such as mean and median, and dispersion, such as range and
standard deviation, are the main descriptive statistics. Displays of
data, such as histograms and box-plots, are also considered
techniques of descriptive statistics.
Inferential statistics, or statistical induction, means the use of
statistics to make inferences concerning some unknown aspect.
Some examples of inferential statistics methods include hypothesis
testing, linear regression, and prediction.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Statistical Hydrology
In statistical hydrology, we generally use four-step process in learning
from Data:
(1) Defining the problem
(2) Collecting the data
(3) Summarizing the data
(4) Analyzing data, interpreting the analyses, and communicating results.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Defining the Problem
Defining the problem or the question to be addressed
• What is condition of water quality of the river?
• Is there a relationship between nitrogen fertilizer use and
groundwater quality?
• What effect of rainfall on river discharge in a particular river
basin?
• What is the main responsible factor of water quality
deterioration in a river?
• Is there any change in rainfall in last decade?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Collecting the Data
Proposing a study or experiment to collect meaningful data to answer
the problem.
• What variables have to measure?
• How many data/samples have to collect?
• How the samples should be selected?
• Etc..
The most crucial element in data collection is the manner in
which the sample is selected.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Population
The population is the total set of measurements which could
hypothetically be taken from the entity being studied. It is the set of all
measurements of interest. For example, trace element composition of all
stream sediments in a area, groundwater depth as any point of a
catchment, etc.
The Statistical Sample
A sample is any subset of measurements selected from the population.
There may be confusion about sample in water resources and sample in
statistical. Water sample usually means one bottle of water. In statistics,
sample is a data that is actually available for analysis.
Population and Samples
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Population and Samples
Sample is a subset of population.
Sample should be collected in such as way that it represent the whole
population
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Sampling Size
• Generalization of sample size is not possible to made.
• It depends on the degree of intricacy of the problem being
addressed.
• Many times, sample size is possible to know in advance.
• Many cases, hydrologists collect reasonable number of
sample and analyze to see how accurately it represent the
population. According, they go for next set of sample
collection.
• However, many cases samples can not be collected according
to need.
• If the sample size is not adequate, different methods (non-
parametric) methods are used to analyze the data.
• Whatever it may be, sample size should never be less than 6.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Sample Technique
In hydrology, one of the main decision about sampling
population is where to sample. Sampling techniques can be
classified as below:
1. Random Sampling
2. Stratified Sampling
3. Uniform Sampling
4. Regular Sampling
5. Clustered Sampling
6. Traverse Sampling
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A random sample of n measurements
selected from a population containing
N measurements (N > n). A sample of
n measurements selected from a
population is said to be a random
sample if every different sample of size
n from the population has an equal
probability of being selected.
Sample data selected in a nonrandom
fashion are frequently distorted by a
selection bias. A selection bias exists
whenever there is a systematic
tendency to overrepresent or
underrepresent some part of the
population.
Random Sampling
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The whole population is divided
into a number of groups or strata
according to a property and each
group is sampled differently. This
type of sampling is called Stratified
Sampling.
Let, we want to study
transmissivity of groundwater in a
catchment. Transmissivity of
groundwater heavy depends on
local geology. Therefore, if we
divided the catchment according
to geological units to study the
impact of geology on
transmissivity.
Stratified Sampling
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Systematic Sampling
Uniform Sampling: Planned by
randomization within grid squares.
Regular or gridded: Planned on
rectangular or triangular grid.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
1. Clustered: It is focused on
patchy distribution.
2. Traverse: Often forced by access
and exposure constraints or
logistics.
Systematic Sampling
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Sample Technique
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Quality
Garbage in – garbage out.
Hydrologist must be confident about data quality before
processing of data.
Selection of data analysis technique depends on quality of data.
Use of any measurement device must be accompanied by
awareness of precision and accuracy.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Quality
Precision - A measurement is precise if repeated measurements of the
same entity are similar.
Accuracy – A measurement is accurate if it is close to the true value. In
water resources, the true value is usually unknown, although there are
standard that can be used for calibrating analytical equipment.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Type
There are a variety of categories of data which may be encountered in
Hydrology. It is important to know the data type before selecting the
appropriate data analysis technique.
Ratio scale data
Ordinary measurements such as amount of rainfall, depth of
groundwater level, etc. This is the best quality and most versatile data
type.
Interval scale data
Interval scale data differ from ratio scale data in that the zero point is
not a fundamental termination of the scale. The classical example of
interval scale data is temperature measured in Centigrade.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Type
Ordinal Scale Data
This category is of considerable lower quality that the Ratio or
Interval Scale Data. Only purpose of the scale is to place observations
in relative order. Consequently, it is not valid to apply addition or
subtraction, as well as division, to ordinal scale data. Non-parametric
methods are used to analysis the Ordinal Scale Data
Nominal or Categorical Data
Information is sometimes prescribed in the form of names. Such as
flood sometimes recorded as normal flood, severe flood, very severe
flood, etc. Sometimes, drought is recoded according to it’s
occurrence, such as drought occur in 1974, 1981, 1993, 1990, 2008, etc.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Type
Directional Data
Data that is expressed in angle. Example of directional data: direction
of cyclone, direction of surface runoff, etc. Directional data require
special methods of analysis as the numerical values cycle around
through 360 degree.
Closed Data
There are lower and upper limits of this type of data. These are data
in the form of percentage, parts per million (ppm), etc. Such data
require cautious treatment, especially in bivariate and multivariate
methods, because variables are fundamentally interdependent.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Continuous and Discrete Types of Variable
Discrete Variable
When observations on a quantitative random variable can assume only a
countable number of values, the variable is called a discrete random
variable. It can have on only a countable number of values.
Example: Number of days in a year with rainfall > 20 mm, number of
flood in a year, etc.
Continuous Variable
When observations on a quantitative random variable can assume any one
of the uncountable number of values, the variable is called a continuous
random variable. A continuous variable can take on any value over some
range.
Example: Daily rainfall, River Discharge, Groundwater Depth, etc.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Summarization of Data
To analyze any collection of water resources data appropriately,
the first consideration must be the characteristics of the data
themselves. Therefore, first we need to know the common
characteristics of water resources data we want to study.
Summarization of data means making a summary of the data in
the forms which convey their important characteristics.
As for example, if we want to analyze rainfall data of a station,
first we need to know:
1.What is the average rainfall of that station?
2.How the rainfall vary in the station?
3.How the rainfall distributed over the year?
4.How often heavy rainfall occur?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Summarization of Data
Summarization of data means making a summary of the data
in the forms which convey their important characteristics such
as:
• A measure of the center of the data,
• A measure of spread or variability,
• A measure of the symmetry of the data distribution,
• Percentile of the data (estimates of extremes)
• Etc…
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
 The Mean and Median are the two most commonly-
used measures use to describe center of the data.
 Beside that Mode, Geometric Mean and Trimmed
Mean are also sometimes used to summarize some water
resources data.
 We need to understand:
• What are the properties of these measures?
• When should one be employed over the other?
Measurement of the center of the data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Mean - classical measurement of data
Mean of a set of measurements is defined to be the sum of the
measurements divided by the total number of measurements.
The sample mean is denoted by the symbol, X (x-bar) and
The population mean is denoted by  (mu’).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Computation of Mean - Example
Discharge in a river in every consecutive three days in the month of
January is measured as below. What is mean value of river discharge in
January?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of the center of the data: Mean
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Influence of an observation on the overall mean
is the distance between the observation and the
mean excluding the observation.
Therefore, influence of observation 4.0 on
overall mean is (4.0 – 3.3) = 0.7
The Mean
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Influence of observation on the Mean
Mean Discharge = 3.4 cumec
• All observations do
not have same
influence on mean.
• Observations closer to
mean has less
influence on mean
compared to the
observations away
from the mean.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Mean discharge in the month of January is 6.1 cumec
If we remove the extreme value 34.8 (outlier) from the
observation, the mean discharge = 3.2 cumec
Influence of observation on the Mean
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Mean: Summary
1. It is the arithmetic average of the measurements in a data
set.
2. There is only one mean for a data set.
3. Its value is influenced by extreme measurements; trimming
can help to reduce the degree of influence.
4. Means of subsets can be combined to determine the mean
of the complete data set.
5. It is applicable to quantitative data only.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Median
The median of a set of measurements is defined to be the middle value
when the measurements are arranged from lowest to highest.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Median
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Median
No change in Median
The median is only
minimally affected by
the magnitude of a single
observation. Therefore,
Median is often known
as resistant measure of
central value.
This resistance to the
effect of a change in
value or presence of
outlying observations is
often a desirable
property in water
resources data analysis.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
1. It is the central value; 50% of the measurements lie
above it and 50% fall below it.
2. There is only one median for a data set.
3. It is less influenced by extreme measurements.
4. Medians of subsets cannot be combined to determine
the median of the complete data set.
5. For grouped data, its value is rather stable even when
the data are organized into different categories.
6. It is applicable to quantitative data only.
The Median: Summary
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
When a summary value is desired that is not strongly
influenced by a few extreme observations, the median is
preferable to the mean.
One such example is the chemical concentration in water one
might expect to find over many streams in a given region. Using
the median, one sample with unusually high concentration has
no greater effect on the estimate than one with low
concentration. The mean concentration may be pulled towards
the outlier, and be higher than concentrations found in most of
the samples.
Preference of Mean or Median
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
 The mode is the most frequently observed value.
 It is far more applicable for
• Discrete Data
• Grouped Data
• Data which are recorded only as falling into a finite
number of categories
 It is very easy to obtain
 It is less applicable for continuous data
 When applied on continuous data, it gives a poor measure
of ceter
 Its value often depends on the arbitrary grouping of those
data
Measurement of the center of the data: Mode
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Let consider, in a location rainfall greater than 100 mm in a day considered
as a heavy rainfall day. Data show the number of heavy rainfall days in
different years over the time period 2000-2010. How frequently heavy
rainfall occur that location?
Measurement of the center of the data: Mode
Mean = 3.1 days
Median = 3 days
Mode = 4 days
It is more meaningful if
you say that heavy
rainfall usually happens
4 times in a year.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of the center of the data: Mode
Mode is not influenced by extreme measurements.
Heavy rainfall days in different years over the time period 2000-2010
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of the center of the data: Mode
Heavy rainfall days in different years over the time period 2000-2010
There can be more than one mode for a data set. This type of data
set is called multi-modal data.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Mode: Summary
1. It is the most frequent or probable measurement in the data
set.
2. There can be more than one mode for a data set.
3. It is not influenced by extreme measurements.
4. Modes of subsets cannot be combined to determine the
mode of the complete data set.
5. For grouped data its value can change depending on the
categories used.
6. It is applicable for both qualitative and quantitative data.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Geometric Mean
The geometric mean, is a type of mean, which indicates the central
tendency
• Geometric mean applies only to positive numbers.
• It is less influenced by extreme values
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Trimmed Mean
Trimmed mean drops the highest and lowest extreme values and
averages the rest. For example, a 5% trimmed mean drops the highest 5%
and the lowest 5% of the measurements and averages the rest. Similarly,
a 25% trimmed mean drops the highest and the lowest 25% of the
measurements and averages the rest.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Trimmed Mean
14 samples are collected to monitor the Nitrate concentration in
groundwater as below:
20% trimmed mean of the data:
Lower limit of window = 14*o.20 = 2.8  3
Lower limit of window = 14*o.80 = 11.2  11
Trimmed mean = 2.8
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread
1. Range
2. pth percentile
3. Interquartile Range
4. Deviation
5. Variance
6. Standard deviation
7. Variation of coefficient
The degree of variation of data values can be diagnostic in water
resources. For example, variation in annual rainfall tells us how
rainfall is reliable in an area.
It is also important to obtain a measure of the variability of values in
a population in order to assess the number of observations we need
to draw useful conclusions and to assess the reliability of our
estimates of parameters.
Measures used to show the data variability are:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Range
The range is defined as the difference between the largest and the
smallest measurements of the set.
Range is the simplest but least useful measure of data variation.
For grouped data, because we do not
know the individual measurements,
the range is taken to be the
difference between the upper limit
of the last interval and the lower
limit of the first interval.
Range = 4.8 – 2.5
= 2.3
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: p-th percentile
The p-th percentile of a set of n measurements (arranged in order of
magnitude) is that value that has at most p% of the measurements
below it and at most (100 - p)% above it.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Interquartile Range
The interquartile range (IQR) of a set of measurements is defined to be
the difference between the upper and lower quartiles; that is,
IQR = 75th percentile - 25th percentile
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Deviation
Deviation defines how a particular measurement deviate from the mean
of the set of measurements.
The deviations of the measurements are computed by using the formula
=
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Variance
A more easily interpreted function of the deviations involves the
sum of the squared deviations of the measurements from their
mean. This measure is called the variance.
The variance of a set of n measurements X1, X2, … Xn with mean is
the sum of the squared deviations divided by n – 1:
We have special symbols to denote the sample and population
variances. The symbol s2 represents the sample variance, and the
corresponding population variance is denoted by the symbol 2.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Standard Deviation
The standard deviation of a set of measurements is defined to be the
square root of the variance.
One reason for defining the standard deviation is that it yields a
measure of variability having the same units of measurement as the
original data, whereas the units for variance are the square of the
measurement units.
s denotes the sample standard deviation and  denotes the
corresponding population standard deviation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measurement of Spread: Group Data
We need a simple modification of the formula to calculate the
sample variance of grouped data are available.
If Xi and fi denote the midpoint and frequency, respectively, for
the i-th class interval, then variance of group data:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Means and variances are ways to describe a distribution of
data.
• But all the details of the data can not be understood with only
mean and variance.
• A histogram is one of the most effective ways to depict a
frequency distribution
• To know details of the data distribution we need to draw
frequency histogram and the relative frequency histogram
• Frequency is the number of times a variable takes on a
particular group
• Note that any variable has a frequency distribution
Distribution of Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Total annual rainfall (in millimeter) in a station situated in the sub-
tropical region is given below.
940 2118 1684 1415 1111 1303 1526 1667
1182 1301 1995 1692 2293 1903 870 1678
1647 1236 1180 1515 1650 1614 1384 1845
1315 1879 2326 1539 2110 2038 1976 1117
1710 2066 1295 1814 1265 1419 1751 1582
1446 1468 1810 1471 2102 1895 2474 1645
1769 2119
Distribution of Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
940 2118 1684 1415 1111 1303 1526 1667
1182 1301 1995 1692 2293 1903 870 1678
1647 1236 1180 1515 1650 1614 1384 1845
1315 1879 2326 1539 2110 2038 1976 1117
1710 2066 1295 1814 1265 1419 1751 1582
1446 1468 1810 1471 2102 1895 2474 1645
1769 2119
Distribution of Data
Minimum Rainfall = 870 mm
Maximum Rainfall = 2474 mm
Range = 1604
Group Interval = Range / (number of group -1)
= 1604 / (9 – 1)
 200
Groups: 801 – 1000; 1001-1200; 1201 – 1400; 1401 – 1600;
1601 – 1800; 1801 – 2000; 2001 – 2200; 2201 – 2400;
2401 - 2600
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data: Histogram
• A histogram is used to graphically summarize the distribution of a data set
• A histogram divides the range of values in a data set into intervals
• Over each interval is placed a bar whose height represents the frequency of
data values in the interval.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
Number of group is 17
Because of the arbitrariness in the choice of number of intervals, starting
value, and length of intervals, histograms can be made to take on different
shapes for the same set of data, especially for small data sets. Histograms
are most useful for describing data sets when the number of data points is
fairly large, say 50 or more.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
Number of
group is 5
When the number of data points is relatively small and the number of intervals
is large, the histogram fluctuates too much—that is, responds to a very few
data Values. This results in a graph that is not a realistic depiction of the
histogram for the whole population. When the number of class intervals is too
small, most of the patterns or trends in the data are not displayed.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Frequencies can be absolute (when the frequency
provided is the actual count of the occurrences) or relative
(when they are normalized by dividing the absolute
frequency by the total number of observations [0, 1])
• Relative frequencies are particularly useful if you want to
compare distributions drawn from two different sources
(i.e. while the numbers of observations of each source may
be different)
Absolute and Relative Frequency Distribution
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
We can compare two different samples (or
populations) by examining their relative
frequency histograms even if the samples
(populations) are of different sizes, because
we use proportions rather than frequencies
in a relative frequency histogram.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
This rule applies to data sets
with roughly a “mound-
shaped’’ histogram—that is, a
histogram that has a single
peak, is symmetrical, and
tapers off gradually in the
tails.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data
M0 – Sample Mean
Md – Sample Median
µ - Population mean
TM – Trimmed mean
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Coefficient of variation measures the
variability in the values in a population
relative to the magnitude of the
population mean. In a process or
population with mean  and standard
deviation , the coefficient of variation
is defined as,
Coefficient of Variation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Coefficient of Variation shows how much the hydrological variables vary.
For example, if we calculate Coefficient of Variation over annual rainfall
time series, it weill indicate the reliability of rainfall. If the value is very
high, it means it varies very wide from year to year and less reliable.
Coefficient of Variation
In both cases annual average rainfall is 580 mm. However, rainfall in second
area is more reliable compared to first one.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Measures of Skewness and Kurtosis
• A fundamental task in many statistical analyses is to characterize
the location and variability of a data set (Measures of central
tendency vs. measures of dispersion)
• Both measures tell us nothing about the shape of the distribution
• A further characterization of the data includes skewness and
kurtosis
• The histogram is an effective graphical technique for showing both
the skewness and kurtosis of a data set
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
 The term skewness refers to the lack of symmetry.
 Skewness measures the degree of asymmetry exhibited by
the data
 If skewness equals zero, the histogram is symmetric about
the mean
3
1
3
)(
ns
xx
skewness
n
i
i


Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Consider the two distributions in the figure just below. Within
each graph, the bars on the right side of the distribution taper
differently than the bars on the left side. These tapering sides
are called tails, and they provide a visual means for determining
which of the two kinds of skewness a distribution has:
1. negative skew: The left tail is longer; the mass of the
distribution is concentrated on the right of the figure. The
distribution is said to be left-skewed, left-tailed, or skewed to the
left.
2. positive skew: The right tail is longer; the mass of the
distribution is concentrated on the left of the figure. The
distribution is said to be right-skewed, right-tailed, or skewed to
the right.
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Skewness in a data series may be observed not only
graphically but by simple inspection of the values.
For instance, consider the numeric sequence (49, 50,
51), whose values are evenly distributed around a
central value of (50). We can transform this sequence
into a negatively skewed distribution by adding a value
far below the mean, as in e.g. (40, 49, 50, 51).
Similarly, we can make the sequence positively
skewed by adding a value far above the mean, as in
e.g. (49, 50, 51, 60).
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Skewness
Positive skewness
There are more observations below the mean than above it When the
mean is greater than the median
Negative skewness
There are a small number of low observations and a large number of high
ones When the median is greater than the mean
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Skewness of a distribution cannot be determined simply by
inspection.
If Mean > Median, the skew is positive.
If Mean < Median, the skew is negative.
If Mean = Median, the skew is zero.
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• One measure of absolute skewness is difference between
mean and mode. A measure of such would not be true
meaningful because it depends of the units of
measurement.
• The simplest measure of skewness is the Pearson’s
coefficient of skewness:
Skewness
deviationStandard
Mode-Mean
skewnessoftcoefficiensPearson' 
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Skewness coefficient varies between -3.o to +3.0.
• There is no acceptable range of skewness to measure the
distribution of data.
• Some people says that rule of thumb is -1 to +1 being
acceptable (-2 to +2 is often used too) for normal
distribution.
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
 Kurtosis measures how peaked the histogram is
 The kurtosis of a normal distribution is 0 (zero)
 Kurtosis characterizes the relative peakedness or flatness of a
distribution compared to the normal distribution
3
)(
4
4




ns
xx
kurtosis
n
i
i
Kurtosis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Platykurtic– When the kurtosis < 0, the frequencies throughout the
curve are closer to be equal (i.e., the curve is more flat and wide).
Thus, negative kurtosis indicates a relatively flat distribution
• Leptokurtic– When the kurtosis > 0, there are high frequencies in
only a small part of the curve (i.e, the curve is more peaked). Thus,
positive kurtosis indicates a relatively peaked distribution
• Kurtosis is based on the size of a distribution's tails. Negative
kurtosis (platykurtic) – distributions with short tails. Positive
kurtosis (leptokurtic) – distributions with relatively long tails
leptokurticplatykurtic
Kurtosis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Coefficient of Kurtosis is the most important measure of kurtosis
which is based on the second and fourth moments :
Kurtosis
2
2
4
2


 
N
xxf
2
2
)( 

N
xxf
4
4
)( 

Where,
Second Momentum
Fourth Momentum
• If 2 -3 > 0, the distribution is leptokurtic.
• If , If 2 -3 < 0 the distribution is platykurtic.
• If , 2 -3 = 0 the distribution is mesokurtic (normal).
A kurtosis value of
+/-1 is considered
very good for most
uses, but +/-2 is
also usually
acceptable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Chebyshev’s theorem
According to Chebyshev’s theorem,
At least of the measurements will fall within
[Mean – (k-1)*SD] to [Mean + (k-1)*SD], where K = 2
Empirical rule
Give a set of n measurements possessing a mound-shaped histogram,
then
the interval X  s contains approximately 68% of the measurements
the interval X  2s contains approximately 95% of the measurements
the interval X  3s contains approximately 99.7% of the measurements.
Chebyshev’s Rule
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Empirical rule
Give a set of n measurements possessing a mound-shaped histogram, then
the interval X  s contains approximately 68% of the measurements
the interval X  2s contains approximately 95% of the measurements
the interval X  3s contains approximately 99.7% of the measurements.
Chebyshev’s Rule
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Outlier
An outlier is an observation that lies an abnormal distance from other
values in a random sample from a population.
Outlier or an outlying observation, is one that appears to deviate
markedly from other members of the sample in which it occurs.
Outliers can have many anomalous causes:
• A physical apparatus for taking measurements may have suffered a
transient malfunction.
• There may have been an error in data transmission or transcription.
• Outliers arise due to changes in system behaviour, fraudulent
behaviour, human error, instrument error
• simply through natural deviations in populations.
• A sample may have been contaminated with elements from outside
the population being examined.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Identification of Outliers
There is no rigid mathematical definition of what constitutes an outlier.
Determining whether or not an observation is an outlier is ultimately a
subjective exercise.
Type 1 - Determine the outliers with no prior knowledge of the data. This is
essentially a learning approach. The approach processes the data as a
static distribution, pinpoints the most remote points, and flags them as
potential outliers.
Type 2 – Using model-based methods which assume that the data are from
a normal distribution, and identify observations which are deemed
"unlikely" based on mean and standard deviation.
• Chauvenet's criterion
• Grubbs' test
• Dixon's Q test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Chauvenet's criterion
• A value is measured experimentally in several trials as 9, 10, 10, 10,
11, and 50.
• The mean is 16.7 and the standard deviation 16.34.
• Value 50 differs from 16.7 by 33.3, slightly more than two standard
deviations.
• The probability of taking data more than two standard deviations from
the mean is roughly 0.05.
• Six measurements were taken, so the statistic value (data size
multiplied by the probability) is 0.05×6 = 0.3.
• Because 0.3 < 0.5, according to Chauvenet's criterion, the measured
value of 50 should be discarded (leaving a new mean of 10, with
standard deviation 0.7).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Grubbs' test detects one outlier at a time.
Gcalculated > Gtable then reject the questionable point.
Grubbs' test
Example: 9, 10, 10, 10, 11, and 50
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Grubbs' test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
To apply a Q test, arrange the data in order of increasing values
and calculate Q as defined:
Where gap is the absolute difference between the outlier in
question and the closest number to it.
If Qcalculated > Qtable then reject the questionable point.
Dixon's Q test, or simply the Q test
Example: 9, 10, 10, 10, 11, and 50
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Dixon's Q test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Common Characteristics of Water Resources Data
1. A lower bound of zero. No negative values are possible.
2. Presence of 'outliers‘ regularly occur, specially outliers on the high
side are more common in water resources.
3. Non-normal distribution of data
4. Positive skewness is common.
5. Data reported only as below or above some threshold (censored
data). Examples include concentrations below one or more
detection limits, annual flood above a level, etc.
6. Seasonal patterns. Values tend to be higher or lower in certain
seasons of the year.
7. Positive autocorrelation. Consecutive observations tend to be
strongly correlated with each other. High values tend to follow
high values and low values tend to follow low values.
8. Dependence on other uncontrolled variables. Water discharge
from a well highly depends on hydraulic conductivity, sediment
grain size, or some other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
STATISTICAL HYDROLOGY
MAL1303/MKAG1273
Summarizing Data
Dr. Shamsuddin Shahid
Associate Professor
Department of Hydraulics and Hydrology
Faculty of Civil Engineering
Room No.: M46-332;
Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Contenu connexe

Tendances

Estimation of Evapotranspiration.pdf
Estimation of Evapotranspiration.pdfEstimation of Evapotranspiration.pdf
Estimation of Evapotranspiration.pdfPraveenJ578
 
Groundwater movement Hydraulics, Darcy's law
Groundwater movement Hydraulics, Darcy's lawGroundwater movement Hydraulics, Darcy's law
Groundwater movement Hydraulics, Darcy's lawNaresh Kumar
 
Well Hydraulics (Lecture 1)
Well Hydraulics (Lecture 1)Well Hydraulics (Lecture 1)
Well Hydraulics (Lecture 1)Amro Elfeki
 
Drought indices
Drought indices Drought indices
Drought indices Rahul5110
 
Water resources engineering
Water resources engineeringWater resources engineering
Water resources engineeringhusna004
 
Precipitation and its forms (hydrology)
Precipitation and its forms (hydrology)Precipitation and its forms (hydrology)
Precipitation and its forms (hydrology)Maha Sabri
 
Estimating Missing Rainfall Data.pptx
Estimating Missing Rainfall Data.pptxEstimating Missing Rainfall Data.pptx
Estimating Missing Rainfall Data.pptxRajdipPaul5
 
Hydrology (Estimation of peak flood discharge)
Hydrology (Estimation of peak flood discharge)Hydrology (Estimation of peak flood discharge)
Hydrology (Estimation of peak flood discharge)Latif Hyder Wadho
 
Assessment of Water Supply–Demand for a Watershed using WEAP Model
Assessment of Water Supply–Demand  for a Watershed using WEAP Model Assessment of Water Supply–Demand  for a Watershed using WEAP Model
Assessment of Water Supply–Demand for a Watershed using WEAP Model dipesh thakur
 
Advanced hydrology & water resource engg
Advanced hydrology & water resource enggAdvanced hydrology & water resource engg
Advanced hydrology & water resource enggCivil Engineers
 
Methods of flood control
Methods of flood controlMethods of flood control
Methods of flood controlvivek gami
 

Tendances (20)

Estimation of Evapotranspiration.pdf
Estimation of Evapotranspiration.pdfEstimation of Evapotranspiration.pdf
Estimation of Evapotranspiration.pdf
 
Runoff
RunoffRunoff
Runoff
 
Groundwater movement Hydraulics, Darcy's law
Groundwater movement Hydraulics, Darcy's lawGroundwater movement Hydraulics, Darcy's law
Groundwater movement Hydraulics, Darcy's law
 
Well Hydraulics (Lecture 1)
Well Hydraulics (Lecture 1)Well Hydraulics (Lecture 1)
Well Hydraulics (Lecture 1)
 
Weap final
Weap finalWeap final
Weap final
 
Drought indices
Drought indices Drought indices
Drought indices
 
Presentation aboout flood routing
Presentation aboout flood routingPresentation aboout flood routing
Presentation aboout flood routing
 
Water resources engineering
Water resources engineeringWater resources engineering
Water resources engineering
 
Surface runoff
Surface runoffSurface runoff
Surface runoff
 
Precipitation and its forms (hydrology)
Precipitation and its forms (hydrology)Precipitation and its forms (hydrology)
Precipitation and its forms (hydrology)
 
7 routing
7 routing7 routing
7 routing
 
Estimating Missing Rainfall Data.pptx
Estimating Missing Rainfall Data.pptxEstimating Missing Rainfall Data.pptx
Estimating Missing Rainfall Data.pptx
 
GROUNDWATER MODELING SYSTEM
GROUNDWATER MODELING SYSTEMGROUNDWATER MODELING SYSTEM
GROUNDWATER MODELING SYSTEM
 
Arc swat
Arc swat Arc swat
Arc swat
 
Hydrological modelling i5
Hydrological modelling i5Hydrological modelling i5
Hydrological modelling i5
 
Hydrology (Estimation of peak flood discharge)
Hydrology (Estimation of peak flood discharge)Hydrology (Estimation of peak flood discharge)
Hydrology (Estimation of peak flood discharge)
 
Assessment of Water Supply–Demand for a Watershed using WEAP Model
Assessment of Water Supply–Demand  for a Watershed using WEAP Model Assessment of Water Supply–Demand  for a Watershed using WEAP Model
Assessment of Water Supply–Demand for a Watershed using WEAP Model
 
Advanced hydrology & water resource engg
Advanced hydrology & water resource enggAdvanced hydrology & water resource engg
Advanced hydrology & water resource engg
 
Runoff
RunoffRunoff
Runoff
 
Methods of flood control
Methods of flood controlMethods of flood control
Methods of flood control
 

En vedette

Chapter4
Chapter4Chapter4
Chapter4Vu Vo
 
Chapter4
Chapter4Chapter4
Chapter4Vu Vo
 
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...Global Risk Forum GRFDavos
 
History of meteorology..
History of meteorology..History of meteorology..
History of meteorology..franz28
 
Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata
 Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata
Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldatahydrologyproject001
 
Climate and crop modelling approach-Cropping advisories based on seasonal for...
Climate and crop modelling approach-Cropping advisories based on seasonal for...Climate and crop modelling approach-Cropping advisories based on seasonal for...
Climate and crop modelling approach-Cropping advisories based on seasonal for...ICRISAT
 
Design consideration Of Earth Dams
Design consideration Of Earth DamsDesign consideration Of Earth Dams
Design consideration Of Earth DamsRosul Ahmed
 
SISTEM HIDROLOGI DAN MANUSIA
SISTEM HIDROLOGI DAN MANUSIASISTEM HIDROLOGI DAN MANUSIA
SISTEM HIDROLOGI DAN MANUSIAAsmawi Abdullah
 
Pengukuran aliran fluida pertemuan 9
Pengukuran aliran fluida pertemuan 9Pengukuran aliran fluida pertemuan 9
Pengukuran aliran fluida pertemuan 9Marfizal Marfizal
 
history of meteorology
history of meteorologyhistory of meteorology
history of meteorologymonica bolilan
 
History of meteorology
History of meteorologyHistory of meteorology
History of meteorologyflorenceann
 
History of meteorology
History of meteorologyHistory of meteorology
History of meteorologyCarloNolis
 
History of meteorology and invention of weather instruments by lota joy
History of meteorology and invention of weather instruments by lota joyHistory of meteorology and invention of weather instruments by lota joy
History of meteorology and invention of weather instruments by lota joyLotz Malaluan
 
Meteorology
MeteorologyMeteorology
Meteorologyrichasj
 
history of meteorology and weather instrument
history of meteorology and weather instrumenthistory of meteorology and weather instrument
history of meteorology and weather instrumentJhessa France Dela Cruz
 

En vedette (20)

Chapter4
Chapter4Chapter4
Chapter4
 
Chapter4
Chapter4Chapter4
Chapter4
 
History of meteorology
History of meteorologyHistory of meteorology
History of meteorology
 
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...
Harifidy RAKOTO RATSIMBA "Forest ecosystem resilience and rainfall variabilit...
 
History of meteorology..
History of meteorology..History of meteorology..
History of meteorology..
 
Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata
 Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata
Download-manuals-hydrometeorology-data processing-12howtoanalyserainfalldata
 
Climate and crop modelling approach-Cropping advisories based on seasonal for...
Climate and crop modelling approach-Cropping advisories based on seasonal for...Climate and crop modelling approach-Cropping advisories based on seasonal for...
Climate and crop modelling approach-Cropping advisories based on seasonal for...
 
Design consideration Of Earth Dams
Design consideration Of Earth DamsDesign consideration Of Earth Dams
Design consideration Of Earth Dams
 
SISTEM HIDROLOGI DAN MANUSIA
SISTEM HIDROLOGI DAN MANUSIASISTEM HIDROLOGI DAN MANUSIA
SISTEM HIDROLOGI DAN MANUSIA
 
Analisa hidrograf
Analisa hidrografAnalisa hidrograf
Analisa hidrograf
 
Pengukuran aliran fluida pertemuan 9
Pengukuran aliran fluida pertemuan 9Pengukuran aliran fluida pertemuan 9
Pengukuran aliran fluida pertemuan 9
 
history of meteorology
history of meteorologyhistory of meteorology
history of meteorology
 
History of meteorology
History of meteorologyHistory of meteorology
History of meteorology
 
History of meteorology
History of meteorologyHistory of meteorology
History of meteorology
 
History of meteorology and invention of weather instruments by lota joy
History of meteorology and invention of weather instruments by lota joyHistory of meteorology and invention of weather instruments by lota joy
History of meteorology and invention of weather instruments by lota joy
 
Meteorology
MeteorologyMeteorology
Meteorology
 
history of meteorology and weather instrument
history of meteorology and weather instrumenthistory of meteorology and weather instrument
history of meteorology and weather instrument
 
Embankment lecture 2
Embankment lecture 2Embankment lecture 2
Embankment lecture 2
 
Dam design
Dam designDam design
Dam design
 
Intro to Meteorology: Our Atmosphere
Intro to Meteorology: Our AtmosphereIntro to Meteorology: Our Atmosphere
Intro to Meteorology: Our Atmosphere
 

Similaire à Shahid Lecture-1- MKAG1273

Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273nchakori
 
Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273nchakori
 
Shahid Lecture-7- MKAG1273
Shahid Lecture-7- MKAG1273Shahid Lecture-7- MKAG1273
Shahid Lecture-7- MKAG1273nchakori
 
Shahid Lecture-4-MKAG1273
Shahid Lecture-4-MKAG1273Shahid Lecture-4-MKAG1273
Shahid Lecture-4-MKAG1273nchakori
 
Shahid Lecture-12- MKAG1273
Shahid Lecture-12- MKAG1273Shahid Lecture-12- MKAG1273
Shahid Lecture-12- MKAG1273nchakori
 
Shahid Lecture-8- MKAG1273
Shahid Lecture-8- MKAG1273Shahid Lecture-8- MKAG1273
Shahid Lecture-8- MKAG1273nchakori
 
Shahid Lecture-13-MKAG1273
Shahid Lecture-13-MKAG1273Shahid Lecture-13-MKAG1273
Shahid Lecture-13-MKAG1273nchakori
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Mohammad Aslam Shaiekh
 
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe  first pilot-presentation-hangoutFirst online hangout SC5 - Big Data Europe  first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe first pilot-presentation-hangoutBigData_Europe
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-finalAmin Chowdhury
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 descriptionBigData_Europe
 
1368 triangulation summary
1368 triangulation summary1368 triangulation summary
1368 triangulation summaryouterss
 
Overview of methodologies
Overview of methodologiesOverview of methodologies
Overview of methodologiesMickael Pero
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...imgcommcall
 
Shahid Lecture-6- MKAG1273
Shahid Lecture-6- MKAG1273Shahid Lecture-6- MKAG1273
Shahid Lecture-6- MKAG1273nchakori
 
Brand specificities and study tools developed by DRIVE
Brand specificities and study tools developed by DRIVE Brand specificities and study tools developed by DRIVE
Brand specificities and study tools developed by DRIVE DRIVE research
 

Similaire à Shahid Lecture-1- MKAG1273 (20)

Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273
 
Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273
 
Shahid Lecture-7- MKAG1273
Shahid Lecture-7- MKAG1273Shahid Lecture-7- MKAG1273
Shahid Lecture-7- MKAG1273
 
Shahid Lecture-4-MKAG1273
Shahid Lecture-4-MKAG1273Shahid Lecture-4-MKAG1273
Shahid Lecture-4-MKAG1273
 
Shahid Lecture-12- MKAG1273
Shahid Lecture-12- MKAG1273Shahid Lecture-12- MKAG1273
Shahid Lecture-12- MKAG1273
 
Shahid Lecture-8- MKAG1273
Shahid Lecture-8- MKAG1273Shahid Lecture-8- MKAG1273
Shahid Lecture-8- MKAG1273
 
Shahid Lecture-13-MKAG1273
Shahid Lecture-13-MKAG1273Shahid Lecture-13-MKAG1273
Shahid Lecture-13-MKAG1273
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
 
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe  first pilot-presentation-hangoutFirst online hangout SC5 - Big Data Europe  first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-final
 
How to prepare a thesis
How to prepare a thesisHow to prepare a thesis
How to prepare a thesis
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 description
 
1368 triangulation summary
1368 triangulation summary1368 triangulation summary
1368 triangulation summary
 
Overview of methodologies
Overview of methodologiesOverview of methodologies
Overview of methodologies
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 
BRM_Sampling Procedures.ppt
BRM_Sampling Procedures.pptBRM_Sampling Procedures.ppt
BRM_Sampling Procedures.ppt
 
Shahid Lecture-6- MKAG1273
Shahid Lecture-6- MKAG1273Shahid Lecture-6- MKAG1273
Shahid Lecture-6- MKAG1273
 
Brand specificities and study tools developed by DRIVE
Brand specificities and study tools developed by DRIVE Brand specificities and study tools developed by DRIVE
Brand specificities and study tools developed by DRIVE
 

Dernier

Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 

Dernier (20)

Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 

Shahid Lecture-1- MKAG1273

  • 1. STATISTICAL HYDROLOGY MAL1303/MKAG1273 Summarizing Data Dr. Shamsuddin Shahid Associate Professor Department of Hydraulics and Hydrology Faculty of Civil Engineering Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: sshahid@utm.my 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 2. Office Location: C09-Room No. 219 11/23/2015 Shamsuddin Shahid, FKA, UTM11/23/2015 You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 3. What is Statistics Statistics is the science of Learning from Data. Statistics is the science of collecting data and analyzing (modeling) data for the purpose of decision making and scientific discovery when the available information is both limited and variable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 4. Descriptive Statistics and Inferential Statistics Descriptive statistics refers to statistical techniques used to summarize and describe a data set. Measures of central tendency, such as mean and median, and dispersion, such as range and standard deviation, are the main descriptive statistics. Displays of data, such as histograms and box-plots, are also considered techniques of descriptive statistics. Inferential statistics, or statistical induction, means the use of statistics to make inferences concerning some unknown aspect. Some examples of inferential statistics methods include hypothesis testing, linear regression, and prediction. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 5. Statistical Hydrology In statistical hydrology, we generally use four-step process in learning from Data: (1) Defining the problem (2) Collecting the data (3) Summarizing the data (4) Analyzing data, interpreting the analyses, and communicating results. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 6. Defining the Problem Defining the problem or the question to be addressed • What is condition of water quality of the river? • Is there a relationship between nitrogen fertilizer use and groundwater quality? • What effect of rainfall on river discharge in a particular river basin? • What is the main responsible factor of water quality deterioration in a river? • Is there any change in rainfall in last decade? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 7. Collecting the Data Proposing a study or experiment to collect meaningful data to answer the problem. • What variables have to measure? • How many data/samples have to collect? • How the samples should be selected? • Etc.. The most crucial element in data collection is the manner in which the sample is selected. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 8. Population The population is the total set of measurements which could hypothetically be taken from the entity being studied. It is the set of all measurements of interest. For example, trace element composition of all stream sediments in a area, groundwater depth as any point of a catchment, etc. The Statistical Sample A sample is any subset of measurements selected from the population. There may be confusion about sample in water resources and sample in statistical. Water sample usually means one bottle of water. In statistics, sample is a data that is actually available for analysis. Population and Samples 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 9. Population and Samples Sample is a subset of population. Sample should be collected in such as way that it represent the whole population 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 10. Sampling Size • Generalization of sample size is not possible to made. • It depends on the degree of intricacy of the problem being addressed. • Many times, sample size is possible to know in advance. • Many cases, hydrologists collect reasonable number of sample and analyze to see how accurately it represent the population. According, they go for next set of sample collection. • However, many cases samples can not be collected according to need. • If the sample size is not adequate, different methods (non- parametric) methods are used to analyze the data. • Whatever it may be, sample size should never be less than 6. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 11. Sample Technique In hydrology, one of the main decision about sampling population is where to sample. Sampling techniques can be classified as below: 1. Random Sampling 2. Stratified Sampling 3. Uniform Sampling 4. Regular Sampling 5. Clustered Sampling 6. Traverse Sampling 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 12. A random sample of n measurements selected from a population containing N measurements (N > n). A sample of n measurements selected from a population is said to be a random sample if every different sample of size n from the population has an equal probability of being selected. Sample data selected in a nonrandom fashion are frequently distorted by a selection bias. A selection bias exists whenever there is a systematic tendency to overrepresent or underrepresent some part of the population. Random Sampling 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 13. The whole population is divided into a number of groups or strata according to a property and each group is sampled differently. This type of sampling is called Stratified Sampling. Let, we want to study transmissivity of groundwater in a catchment. Transmissivity of groundwater heavy depends on local geology. Therefore, if we divided the catchment according to geological units to study the impact of geology on transmissivity. Stratified Sampling 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 14. Systematic Sampling Uniform Sampling: Planned by randomization within grid squares. Regular or gridded: Planned on rectangular or triangular grid. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 15. 1. Clustered: It is focused on patchy distribution. 2. Traverse: Often forced by access and exposure constraints or logistics. Systematic Sampling 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 16. Sample Technique 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 17. Data Quality Garbage in – garbage out. Hydrologist must be confident about data quality before processing of data. Selection of data analysis technique depends on quality of data. Use of any measurement device must be accompanied by awareness of precision and accuracy. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 18. Data Quality Precision - A measurement is precise if repeated measurements of the same entity are similar. Accuracy – A measurement is accurate if it is close to the true value. In water resources, the true value is usually unknown, although there are standard that can be used for calibrating analytical equipment. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 19. Data Type There are a variety of categories of data which may be encountered in Hydrology. It is important to know the data type before selecting the appropriate data analysis technique. Ratio scale data Ordinary measurements such as amount of rainfall, depth of groundwater level, etc. This is the best quality and most versatile data type. Interval scale data Interval scale data differ from ratio scale data in that the zero point is not a fundamental termination of the scale. The classical example of interval scale data is temperature measured in Centigrade. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 20. Data Type Ordinal Scale Data This category is of considerable lower quality that the Ratio or Interval Scale Data. Only purpose of the scale is to place observations in relative order. Consequently, it is not valid to apply addition or subtraction, as well as division, to ordinal scale data. Non-parametric methods are used to analysis the Ordinal Scale Data Nominal or Categorical Data Information is sometimes prescribed in the form of names. Such as flood sometimes recorded as normal flood, severe flood, very severe flood, etc. Sometimes, drought is recoded according to it’s occurrence, such as drought occur in 1974, 1981, 1993, 1990, 2008, etc. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 21. Data Type Directional Data Data that is expressed in angle. Example of directional data: direction of cyclone, direction of surface runoff, etc. Directional data require special methods of analysis as the numerical values cycle around through 360 degree. Closed Data There are lower and upper limits of this type of data. These are data in the form of percentage, parts per million (ppm), etc. Such data require cautious treatment, especially in bivariate and multivariate methods, because variables are fundamentally interdependent. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 22. Continuous and Discrete Types of Variable Discrete Variable When observations on a quantitative random variable can assume only a countable number of values, the variable is called a discrete random variable. It can have on only a countable number of values. Example: Number of days in a year with rainfall > 20 mm, number of flood in a year, etc. Continuous Variable When observations on a quantitative random variable can assume any one of the uncountable number of values, the variable is called a continuous random variable. A continuous variable can take on any value over some range. Example: Daily rainfall, River Discharge, Groundwater Depth, etc. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 23. Summarization of Data To analyze any collection of water resources data appropriately, the first consideration must be the characteristics of the data themselves. Therefore, first we need to know the common characteristics of water resources data we want to study. Summarization of data means making a summary of the data in the forms which convey their important characteristics. As for example, if we want to analyze rainfall data of a station, first we need to know: 1.What is the average rainfall of that station? 2.How the rainfall vary in the station? 3.How the rainfall distributed over the year? 4.How often heavy rainfall occur? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 24. Summarization of Data Summarization of data means making a summary of the data in the forms which convey their important characteristics such as: • A measure of the center of the data, • A measure of spread or variability, • A measure of the symmetry of the data distribution, • Percentile of the data (estimates of extremes) • Etc… 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 25.  The Mean and Median are the two most commonly- used measures use to describe center of the data.  Beside that Mode, Geometric Mean and Trimmed Mean are also sometimes used to summarize some water resources data.  We need to understand: • What are the properties of these measures? • When should one be employed over the other? Measurement of the center of the data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 26. The Mean - classical measurement of data Mean of a set of measurements is defined to be the sum of the measurements divided by the total number of measurements. The sample mean is denoted by the symbol, X (x-bar) and The population mean is denoted by  (mu’). 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 27. Computation of Mean - Example Discharge in a river in every consecutive three days in the month of January is measured as below. What is mean value of river discharge in January? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 28. Measurement of the center of the data: Mean 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 29. Influence of an observation on the overall mean is the distance between the observation and the mean excluding the observation. Therefore, influence of observation 4.0 on overall mean is (4.0 – 3.3) = 0.7 The Mean 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 30. Influence of observation on the Mean Mean Discharge = 3.4 cumec • All observations do not have same influence on mean. • Observations closer to mean has less influence on mean compared to the observations away from the mean. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 31. Mean discharge in the month of January is 6.1 cumec If we remove the extreme value 34.8 (outlier) from the observation, the mean discharge = 3.2 cumec Influence of observation on the Mean 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 32. The Mean: Summary 1. It is the arithmetic average of the measurements in a data set. 2. There is only one mean for a data set. 3. Its value is influenced by extreme measurements; trimming can help to reduce the degree of influence. 4. Means of subsets can be combined to determine the mean of the complete data set. 5. It is applicable to quantitative data only. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 33. The Median The median of a set of measurements is defined to be the middle value when the measurements are arranged from lowest to highest. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 34. The Median 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 35. The Median No change in Median The median is only minimally affected by the magnitude of a single observation. Therefore, Median is often known as resistant measure of central value. This resistance to the effect of a change in value or presence of outlying observations is often a desirable property in water resources data analysis. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 36. 1. It is the central value; 50% of the measurements lie above it and 50% fall below it. 2. There is only one median for a data set. 3. It is less influenced by extreme measurements. 4. Medians of subsets cannot be combined to determine the median of the complete data set. 5. For grouped data, its value is rather stable even when the data are organized into different categories. 6. It is applicable to quantitative data only. The Median: Summary 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 37. When a summary value is desired that is not strongly influenced by a few extreme observations, the median is preferable to the mean. One such example is the chemical concentration in water one might expect to find over many streams in a given region. Using the median, one sample with unusually high concentration has no greater effect on the estimate than one with low concentration. The mean concentration may be pulled towards the outlier, and be higher than concentrations found in most of the samples. Preference of Mean or Median 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 38.  The mode is the most frequently observed value.  It is far more applicable for • Discrete Data • Grouped Data • Data which are recorded only as falling into a finite number of categories  It is very easy to obtain  It is less applicable for continuous data  When applied on continuous data, it gives a poor measure of ceter  Its value often depends on the arbitrary grouping of those data Measurement of the center of the data: Mode 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 39. Let consider, in a location rainfall greater than 100 mm in a day considered as a heavy rainfall day. Data show the number of heavy rainfall days in different years over the time period 2000-2010. How frequently heavy rainfall occur that location? Measurement of the center of the data: Mode Mean = 3.1 days Median = 3 days Mode = 4 days It is more meaningful if you say that heavy rainfall usually happens 4 times in a year. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 40. Measurement of the center of the data: Mode Mode is not influenced by extreme measurements. Heavy rainfall days in different years over the time period 2000-2010 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 41. Measurement of the center of the data: Mode Heavy rainfall days in different years over the time period 2000-2010 There can be more than one mode for a data set. This type of data set is called multi-modal data. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 42. The Mode: Summary 1. It is the most frequent or probable measurement in the data set. 2. There can be more than one mode for a data set. 3. It is not influenced by extreme measurements. 4. Modes of subsets cannot be combined to determine the mode of the complete data set. 5. For grouped data its value can change depending on the categories used. 6. It is applicable for both qualitative and quantitative data. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 43. Geometric Mean The geometric mean, is a type of mean, which indicates the central tendency • Geometric mean applies only to positive numbers. • It is less influenced by extreme values 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 44. Trimmed Mean Trimmed mean drops the highest and lowest extreme values and averages the rest. For example, a 5% trimmed mean drops the highest 5% and the lowest 5% of the measurements and averages the rest. Similarly, a 25% trimmed mean drops the highest and the lowest 25% of the measurements and averages the rest. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 45. Trimmed Mean 14 samples are collected to monitor the Nitrate concentration in groundwater as below: 20% trimmed mean of the data: Lower limit of window = 14*o.20 = 2.8  3 Lower limit of window = 14*o.80 = 11.2  11 Trimmed mean = 2.8 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 46. Measurement of Spread 1. Range 2. pth percentile 3. Interquartile Range 4. Deviation 5. Variance 6. Standard deviation 7. Variation of coefficient The degree of variation of data values can be diagnostic in water resources. For example, variation in annual rainfall tells us how rainfall is reliable in an area. It is also important to obtain a measure of the variability of values in a population in order to assess the number of observations we need to draw useful conclusions and to assess the reliability of our estimates of parameters. Measures used to show the data variability are: 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 47. Measurement of Spread: Range The range is defined as the difference between the largest and the smallest measurements of the set. Range is the simplest but least useful measure of data variation. For grouped data, because we do not know the individual measurements, the range is taken to be the difference between the upper limit of the last interval and the lower limit of the first interval. Range = 4.8 – 2.5 = 2.3 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 48. Measurement of Spread: p-th percentile The p-th percentile of a set of n measurements (arranged in order of magnitude) is that value that has at most p% of the measurements below it and at most (100 - p)% above it. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 49. Measurement of Spread: Interquartile Range The interquartile range (IQR) of a set of measurements is defined to be the difference between the upper and lower quartiles; that is, IQR = 75th percentile - 25th percentile 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 50. Measurement of Spread: Deviation Deviation defines how a particular measurement deviate from the mean of the set of measurements. The deviations of the measurements are computed by using the formula = 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 51. Measurement of Spread: Variance A more easily interpreted function of the deviations involves the sum of the squared deviations of the measurements from their mean. This measure is called the variance. The variance of a set of n measurements X1, X2, … Xn with mean is the sum of the squared deviations divided by n – 1: We have special symbols to denote the sample and population variances. The symbol s2 represents the sample variance, and the corresponding population variance is denoted by the symbol 2. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 52. Measurement of Spread: Standard Deviation The standard deviation of a set of measurements is defined to be the square root of the variance. One reason for defining the standard deviation is that it yields a measure of variability having the same units of measurement as the original data, whereas the units for variance are the square of the measurement units. s denotes the sample standard deviation and  denotes the corresponding population standard deviation. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 53. Measurement of Spread: Group Data We need a simple modification of the formula to calculate the sample variance of grouped data are available. If Xi and fi denote the midpoint and frequency, respectively, for the i-th class interval, then variance of group data: 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 54. • Means and variances are ways to describe a distribution of data. • But all the details of the data can not be understood with only mean and variance. • A histogram is one of the most effective ways to depict a frequency distribution • To know details of the data distribution we need to draw frequency histogram and the relative frequency histogram • Frequency is the number of times a variable takes on a particular group • Note that any variable has a frequency distribution Distribution of Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 55. Total annual rainfall (in millimeter) in a station situated in the sub- tropical region is given below. 940 2118 1684 1415 1111 1303 1526 1667 1182 1301 1995 1692 2293 1903 870 1678 1647 1236 1180 1515 1650 1614 1384 1845 1315 1879 2326 1539 2110 2038 1976 1117 1710 2066 1295 1814 1265 1419 1751 1582 1446 1468 1810 1471 2102 1895 2474 1645 1769 2119 Distribution of Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 56. 940 2118 1684 1415 1111 1303 1526 1667 1182 1301 1995 1692 2293 1903 870 1678 1647 1236 1180 1515 1650 1614 1384 1845 1315 1879 2326 1539 2110 2038 1976 1117 1710 2066 1295 1814 1265 1419 1751 1582 1446 1468 1810 1471 2102 1895 2474 1645 1769 2119 Distribution of Data Minimum Rainfall = 870 mm Maximum Rainfall = 2474 mm Range = 1604 Group Interval = Range / (number of group -1) = 1604 / (9 – 1)  200 Groups: 801 – 1000; 1001-1200; 1201 – 1400; 1401 – 1600; 1601 – 1800; 1801 – 2000; 2001 – 2200; 2201 – 2400; 2401 - 2600 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 57. Distribution of Data: Histogram • A histogram is used to graphically summarize the distribution of a data set • A histogram divides the range of values in a data set into intervals • Over each interval is placed a bar whose height represents the frequency of data values in the interval. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 58. Distribution of Data Number of group is 17 Because of the arbitrariness in the choice of number of intervals, starting value, and length of intervals, histograms can be made to take on different shapes for the same set of data, especially for small data sets. Histograms are most useful for describing data sets when the number of data points is fairly large, say 50 or more. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 59. Distribution of Data Number of group is 5 When the number of data points is relatively small and the number of intervals is large, the histogram fluctuates too much—that is, responds to a very few data Values. This results in a graph that is not a realistic depiction of the histogram for the whole population. When the number of class intervals is too small, most of the patterns or trends in the data are not displayed. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 60. • Frequencies can be absolute (when the frequency provided is the actual count of the occurrences) or relative (when they are normalized by dividing the absolute frequency by the total number of observations [0, 1]) • Relative frequencies are particularly useful if you want to compare distributions drawn from two different sources (i.e. while the numbers of observations of each source may be different) Absolute and Relative Frequency Distribution 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 61. Distribution of Data We can compare two different samples (or populations) by examining their relative frequency histograms even if the samples (populations) are of different sizes, because we use proportions rather than frequencies in a relative frequency histogram. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 62. Distribution of Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 63. Distribution of Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 64. Distribution of Data This rule applies to data sets with roughly a “mound- shaped’’ histogram—that is, a histogram that has a single peak, is symmetrical, and tapers off gradually in the tails. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 65. Distribution of Data M0 – Sample Mean Md – Sample Median µ - Population mean TM – Trimmed mean 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 66. Coefficient of variation measures the variability in the values in a population relative to the magnitude of the population mean. In a process or population with mean  and standard deviation , the coefficient of variation is defined as, Coefficient of Variation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 67. Coefficient of Variation shows how much the hydrological variables vary. For example, if we calculate Coefficient of Variation over annual rainfall time series, it weill indicate the reliability of rainfall. If the value is very high, it means it varies very wide from year to year and less reliable. Coefficient of Variation In both cases annual average rainfall is 580 mm. However, rainfall in second area is more reliable compared to first one. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 68. Measures of Skewness and Kurtosis • A fundamental task in many statistical analyses is to characterize the location and variability of a data set (Measures of central tendency vs. measures of dispersion) • Both measures tell us nothing about the shape of the distribution • A further characterization of the data includes skewness and kurtosis • The histogram is an effective graphical technique for showing both the skewness and kurtosis of a data set 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 69.  The term skewness refers to the lack of symmetry.  Skewness measures the degree of asymmetry exhibited by the data  If skewness equals zero, the histogram is symmetric about the mean 3 1 3 )( ns xx skewness n i i   Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 70. Consider the two distributions in the figure just below. Within each graph, the bars on the right side of the distribution taper differently than the bars on the left side. These tapering sides are called tails, and they provide a visual means for determining which of the two kinds of skewness a distribution has: 1. negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. 2. positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right. Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 71. Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 72. Skewness in a data series may be observed not only graphically but by simple inspection of the values. For instance, consider the numeric sequence (49, 50, 51), whose values are evenly distributed around a central value of (50). We can transform this sequence into a negatively skewed distribution by adding a value far below the mean, as in e.g. (40, 49, 50, 51). Similarly, we can make the sequence positively skewed by adding a value far above the mean, as in e.g. (49, 50, 51, 60). Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 73. Skewness Positive skewness There are more observations below the mean than above it When the mean is greater than the median Negative skewness There are a small number of low observations and a large number of high ones When the median is greater than the mean 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 74. Skewness of a distribution cannot be determined simply by inspection. If Mean > Median, the skew is positive. If Mean < Median, the skew is negative. If Mean = Median, the skew is zero. Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 75. • One measure of absolute skewness is difference between mean and mode. A measure of such would not be true meaningful because it depends of the units of measurement. • The simplest measure of skewness is the Pearson’s coefficient of skewness: Skewness deviationStandard Mode-Mean skewnessoftcoefficiensPearson'  11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 76. • Skewness coefficient varies between -3.o to +3.0. • There is no acceptable range of skewness to measure the distribution of data. • Some people says that rule of thumb is -1 to +1 being acceptable (-2 to +2 is often used too) for normal distribution. Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 77.  Kurtosis measures how peaked the histogram is  The kurtosis of a normal distribution is 0 (zero)  Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution 3 )( 4 4     ns xx kurtosis n i i Kurtosis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 78. • Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide). Thus, negative kurtosis indicates a relatively flat distribution • Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked). Thus, positive kurtosis indicates a relatively peaked distribution • Kurtosis is based on the size of a distribution's tails. Negative kurtosis (platykurtic) – distributions with short tails. Positive kurtosis (leptokurtic) – distributions with relatively long tails leptokurticplatykurtic Kurtosis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 79. Coefficient of Kurtosis is the most important measure of kurtosis which is based on the second and fourth moments : Kurtosis 2 2 4 2     N xxf 2 2 )(   N xxf 4 4 )(   Where, Second Momentum Fourth Momentum • If 2 -3 > 0, the distribution is leptokurtic. • If , If 2 -3 < 0 the distribution is platykurtic. • If , 2 -3 = 0 the distribution is mesokurtic (normal). A kurtosis value of +/-1 is considered very good for most uses, but +/-2 is also usually acceptable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 80. Chebyshev’s theorem According to Chebyshev’s theorem, At least of the measurements will fall within [Mean – (k-1)*SD] to [Mean + (k-1)*SD], where K = 2 Empirical rule Give a set of n measurements possessing a mound-shaped histogram, then the interval X  s contains approximately 68% of the measurements the interval X  2s contains approximately 95% of the measurements the interval X  3s contains approximately 99.7% of the measurements. Chebyshev’s Rule 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 81. Empirical rule Give a set of n measurements possessing a mound-shaped histogram, then the interval X  s contains approximately 68% of the measurements the interval X  2s contains approximately 95% of the measurements the interval X  3s contains approximately 99.7% of the measurements. Chebyshev’s Rule 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 82. Outlier An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outlier or an outlying observation, is one that appears to deviate markedly from other members of the sample in which it occurs. Outliers can have many anomalous causes: • A physical apparatus for taking measurements may have suffered a transient malfunction. • There may have been an error in data transmission or transcription. • Outliers arise due to changes in system behaviour, fraudulent behaviour, human error, instrument error • simply through natural deviations in populations. • A sample may have been contaminated with elements from outside the population being examined. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 83. Identification of Outliers There is no rigid mathematical definition of what constitutes an outlier. Determining whether or not an observation is an outlier is ultimately a subjective exercise. Type 1 - Determine the outliers with no prior knowledge of the data. This is essentially a learning approach. The approach processes the data as a static distribution, pinpoints the most remote points, and flags them as potential outliers. Type 2 – Using model-based methods which assume that the data are from a normal distribution, and identify observations which are deemed "unlikely" based on mean and standard deviation. • Chauvenet's criterion • Grubbs' test • Dixon's Q test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 84. Chauvenet's criterion • A value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50. • The mean is 16.7 and the standard deviation 16.34. • Value 50 differs from 16.7 by 33.3, slightly more than two standard deviations. • The probability of taking data more than two standard deviations from the mean is roughly 0.05. • Six measurements were taken, so the statistic value (data size multiplied by the probability) is 0.05×6 = 0.3. • Because 0.3 < 0.5, according to Chauvenet's criterion, the measured value of 50 should be discarded (leaving a new mean of 10, with standard deviation 0.7). 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 85. Grubbs' test detects one outlier at a time. Gcalculated > Gtable then reject the questionable point. Grubbs' test Example: 9, 10, 10, 10, 11, and 50 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 86. Grubbs' test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 87. To apply a Q test, arrange the data in order of increasing values and calculate Q as defined: Where gap is the absolute difference between the outlier in question and the closest number to it. If Qcalculated > Qtable then reject the questionable point. Dixon's Q test, or simply the Q test Example: 9, 10, 10, 10, 11, and 50 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 88. Dixon's Q test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 89. Common Characteristics of Water Resources Data 1. A lower bound of zero. No negative values are possible. 2. Presence of 'outliers‘ regularly occur, specially outliers on the high side are more common in water resources. 3. Non-normal distribution of data 4. Positive skewness is common. 5. Data reported only as below or above some threshold (censored data). Examples include concentrations below one or more detection limits, annual flood above a level, etc. 6. Seasonal patterns. Values tend to be higher or lower in certain seasons of the year. 7. Positive autocorrelation. Consecutive observations tend to be strongly correlated with each other. High values tend to follow high values and low values tend to follow low values. 8. Dependence on other uncontrolled variables. Water discharge from a well highly depends on hydraulic conductivity, sediment grain size, or some other variable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 90. STATISTICAL HYDROLOGY MAL1303/MKAG1273 Summarizing Data Dr. Shamsuddin Shahid Associate Professor Department of Hydraulics and Hydrology Faculty of Civil Engineering Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: sshahid@utm.my 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)