1. 1 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 1
Detection of Analytical Bias
Isaaks & Co
October 17, 2011
2. 2 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 2
Detection and Correction of Analytical Bias – An Example
Consider Figure 1 on your right. The figure is meant to
represent one level of an ore deposit. For the sake of ease,
the shape is a square divided into 4 equal quadrants. Each
quadrant contains 1 sample with values 4, 8, 12, and 40.
The naïve global mean is given by:
(4+8+12+40) / 4 = 16
The weighted average grade of the shape using nearest
neighbor (NN) weights is given by:
¼ * 4 + ¼ * 8 + ¼ * 12 + ¼ * 40 = 16.
.
where the NN weights are all equal to the relative area of each quadrant or ¼. Thus, the
unbiased estimate of the global mean is 16.
Next, we collect another sample from the high grade
quadrant as shown in Figure 2. So now we have a
clustered set of samples. If we calculate the naïve global
mean we get:
(4+8+12+40+40)/5 = 20.8
which is considerably higher than the true global mean 16.
This bias is the result of the clustered samples in the high
grade quadrant and is sometimes known as a “selection
bias”. Often NN weights are used to correct for selection
bias and estimate the true global mean. For example:
¼ * 4 + ¼ * 8 + ¼ * 12 + 1/8 * 40 + 1/8 * 40 = 16
where the NN weights are 1/4, ¼, ¼, 1/8, and 1/8. Note the univariate distribution or
histogram of sample values is given by the set (4, 8, 12, 40, 40).
The clustering of samples or selection bias can also occur
in low grade areas as shown in Figure 3. The arithmetic
average or naïve mean is given by:
(4+4+8+12+40) / 5 = 13.6
which is considerably less than the true global mean 16.
However, the weighted NN mean is given by:
1/8*4 + 1/8*4 + ¼*8 + ¼*12 + ¼ * 40 = 16
3. 3 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 3
Note the univariate distribution or histogram of sample
values is given by the set (4, 4, 8, 12, 40).
Next, consider the case where the sample data suffer from
analytical bias and selection bias as shown in Figure 4.
The low grade samples (4) and (4) have been over-
estimated as (8) and (8) while the high grade sample (40)
has been under-estimated as (36). The naïve global mean
is given by:
(8+8+8+12+36)/5 = 14.4
while the NN declustered mean is given by:
1/8 * 8 + 1/8 * 8 + ¼* 8 + ¼ * 12 + ¼* 36 = 16
Surprisingly, the NN declustered mean of the analytically biased sample set is 16 which
is an unbiased estimate of the global mean???
So what can we conclude from these examples so far?
1. Although the (NN) declustered means from different sets of clustered samples
may be identical, the univariate distribution or naïve histogram of the sample data
may be very different. For example, sample sets (4, 8, 12, 40, 40), (4, 4, 8, 12,
40), and (8, 8, 8, 12, 36) provide identical declustered global means, but their
naïve or sample statistics are completely different.
2. Thus, the equality of NN spatial averages does not guarantee the absence of
analytical bias. For example, the declustering weights applied to the analytically
biased sample data set shown in Figure 4 yield an unbiased estimate of the global
mean!
3. However, differences between NN spatial averages may indicate analytical bias,
but there is a better way to detect analytical bias in the presence of selection bias.
One method is to spatially pair the two data sets. For example, one may center a NN
search on each sample of the first sample data set and locate the closest neighboring
sample from the second sample data set. This provides paired data where each of the
paired values is from the opposite sample data set. Figure 5 shows 5 pairs of samples
where one member of each pair comes from the samples shown in Figure 3 and the other
member from the samples shown in Figure 4.
Figure 5: NN Paired samples from Figures 4 and 3. The declustering weights only correct selection bias.
4. 4 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 4
Note that both end members of the paired samples suffer from selection bias. However,
only one end member of the pairs suffers from analytical bias. We wish to adjust the
analytically biased samples so as to remove the analytical bias if possible. We note the
following assumptions:
1. Since the paired data are spatially “close” or near to each other, it’s probably
reasonable to assume that their metal concentrations are similar (on average) and
thus we can expect their laboratory analyses to yield similar results.
2. Thus, we should be able to “adjust” the analytically biased samples by some sort
of regression equation.
3. But, classical regression is based on a theoretical model where the values on the
“X-axis” are known with certainty. In other words, there is no measurement error
associated with the values on the “X-axis”. However, since both of our sample
data sets originated from a laboratory analysis, neither set of samples is known
with certainty. In other words, each sample value in each sample data set is
associated with some measurement error. Thus classical regression techniques are
generally not appropriate. (see Ripley, Brian D., 1987).
4. Fortunately, JMP provides the correct regression method for this problem under
the name “Orthogonal regression”. The paper by Ripley, “Regression Techniques
for the Detection of Analytical Bias” provides a thorough discussion of the
problem as well as solutions. JMP documentation also clearly discusses
“Orthogonal Regression”. I’ve attached a copy of Ripleys paper for your
convenience.
5. But what about the declustering weights? What should we do with them?
For the moment, we will ignore the declustering weights and proceed with orthogonal
regression. Figure 6 shows the orthogonal regression of the paired sample data given in
Figure 5.
Figure 6: An example of orthogonal regression applied to the paired sample data given in Figure 5.
5. 5 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 5
The JMP analysis shown in figure 6 calculates the naïve means as 14.4 and 13.6 which
agrees with our previous calculations. But just in case you haven’t read Ripleys paper yet,
there are a few things you should know about orthogonal regression.
1. Orthogonal regression assumes you know something about the variance of the
measurement errors for each of the sample data sets (you should be able to get
this from your laboratory). There are several options, each of which yields a
unique solution;
a. The variances of the measurement errors have the same ratio to one
another as the ratio of sample variances. Generally, this is not the case.
b. The measurement error variances are equal.
c. The measurement error variance of one of the sample data sets is zero.
d. You know the measurement error variances for each of the sample data
sets and can specify the ratio.
2. Each of these options yields a predicted value for each of the sample data sets. We
are only interested in the predicted values for the analytically biased data set?
Figure 7 shows the results for 3 of the 4 options.
a. “Predicted Sample with Selection Bias 1” is the result of option (c).
b. “Predicted Sample with Selection Bias 2” is the result of option (a).
c. “Predicted Sample with Selection Bias 3” is the result of option (b).
Figure 7: Results of orthogonal regression for 3 of 4 options.
Some observations:
Interestingly, option (c) provides analytically corrected sample values exactly
equal to the analytically unbiased sample values.
Options (a) and (b) yield analytically corrected sample values close to the
analytically unbiased sample values.
The declustering weights do not impact the analytical bias correction. Thus, one
can apply the orthogonal regression technique to clustered data and ignore
declustering weights.
One should make an effort to obtain information about the variance of the
measurement errors associated with the laboratory estimates of the sample values
and use this information to obtain the “best” orthogonal results.
6. 6 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 6
The discussion so far has dealt with the case where two NN declustered spatial averages
compare closely. But what about the case where the NN declustered spatial averages do
not agree? The answer is that this may well indicate analytical bias. But the problem is
how to adjust or correct the suspected analytical bias. The two NN models can be seen as
“paired data” since each NN grid point contains two NN sample values. However, the
distances between the paired samples will be greater (given the same search radius) than
if we pair the samples directly as described earlier. Thus, the actual spatial correlation
between the NN paired samples will be poorer than that between samples paired directly.
This in turn suggests that the correction computed through regression etc., to the NN
pairs will not be as accurate as that computed using directly paired samples.
It’s probably worth mentioning that one should not compute correction factors by
constructing a regression line through the quantiles of a NN quantile-quantile plot. To do
so is very bad practice. For example, consider the following 9 pairs of data (actual cucn
data from the ***** deposit). Note the short distance between pairs. The correlation
coefficient between the pairs is 0.8, so we can expect to compute reasonably accurate
corrections for any analytical bias.
Figure 8: Nine pairs of grades obtained from two different sampling campaigns.
We can compute corrections for the analytical bias using JMP and orthogonal regression
(Figure 9). The results and residuals are shown in Figure 10. The regression residuals
have a mean of 0.0 and a standard deviation of 52.0
Figure 9: Orthogonal Regression of analytically biased samples shown in Figure 8.
7. 7 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 7
Figure 10: Results of the orthogonal regression of analytically biased samples shown in Figure 8. Note the
relatively small residuals.
Next, we compute corrections for the analytical bias using standard regression applied to
the sample quantiles or QQ plot. Figure 11 is the same as Figure 8 but with the addition
of two columns containing the sample quantiles. . Notice the difference in the pairing of
the data between the original pairs and the quantile pairs. Figure 12 shows the ordinary
linear regression applied to the quantile quantile or QQ plot.
Figure 11: Same table as Figure 8 but with the addition of two columns of sample quantiles.
Figure 12: Ordinary Linear Regression of the quantiles (QQ plot) of the analytically biased samples shown
in Figure 11.
8. 8 Isaaks & Co
Specialists in Spatial Statistics
1042 Wilmington Way Emerald Hills CA 94062 Phone 650-369-7069 ed@isaaks.com Page 8
The results of the Ordinary Linear Regression of the quantiles (QQ plot) are shown in
Figure 13. Note the slightly larger residuals (mean = 0, Std = 59) suggesting the results of
the QQ plot regression are inferior to those obtained by orthogonal regression of the
paired data (mean residual = 0, std = 52).
Now if you are sharp you will have noticed that we have re-ordered the results of the QQ
plot regression. For example the original paired data has sample 4104.0(biased) paired
with sample 364.3(unbiased) and the distance between pairs is 8.4 m.
However, as quantiles sample 4104.0(biased) is paired with sample 564.5(unbiased). The
distance between these two sample is 732 m which is beyond the variogram range. Thus,
there is no reason why these two samples should have similar assay results as predicted
by the QQ plot regression. Thus, the predicted values have been re-ordered so as to agree
with the original pairing. For example, the predicted QQ value associated with
364.3(unbiased) is 436.2. Thus, the re-ordered regression prediction for the original pair
(4104,364.3) is 436.2 with a residual of 72 rather than -6.8 (193.4 – 200.2).
So the problem with correcting analytical bias through the regression of NN QQ plots has
to do with the pairing of quantiles. Quantile pairing ignores the spatial relationship
between paired values. Members of quantile pairs may be separated by distances well
beyond the variogram range which invalidates the original assumption, Since the paired
data are spatially “close” or near to each other, it’s probably reasonable to assume that
their metal concentrations are similar (on average) and thus we can expect their
laboratory analyses to yield similar results.
.
Figure 13: The results of the Ordinary Linear Regression of the quantiles (QQ plot).