Partition Curve - an overview _ ScienceDirect Topics (1).pdf
Rock Typing
1. ISE-5013
Statistical Analysis for System Design
Graduate Project
Rock Typing based on Petrophysical Properties
Submitted by:
Karan Bathla
OU ID: 113072138
2. ISE
–
5013:
Statistical
Analysis
For
System
Design
1
Abstract
Shale is a type of sedimentary rock and is most abundant in earth’s crust. It was formed due to
deposition of silt and organic debris on sea bottoms millions of years ago. Due to geothermal effects, it
cooked and transformed into oil. It has drastically changed the oil and gas production in United States of
America contributing 33 trillion cubic feet of natural gas (source: Energy Information Administration).
However since shale is composed of mud, silt, quartz, calcite and other minerals, it is very important to
classify the properties that have a greater impact on production for estimation of reservoir. The
experiments were performed at the Integrated Core Characterization Center at University of Oklahoma
to know the most important petrophysical properties at various depths of the same shale rock. By
knowing these properties, we can analyze and identify the properties that have most variation and use
them to do clustering to perform rock typing.
In this paper, principal component statistical analysis will be performed using XLSTAT on shale
composition primarily mineralogy, mainly clays, carbonates and feldspar, porosity and total organic
carbon to study the parameters that have maximum variation in them. Further, k-means will be
performed using XLSTAT on essential parameters identified by PCA to classify the rock and cluster
them.
Principal Component Analysis:
Saville and Wood in their book Statistical Methods: A Geometric Approach wrote the following
definition for Principal Component Analysis:
Definition: Given n points in , principal components analysis consists of choosing a
dimension and then finding the affine space of dimension k with the property that the squared
distance of the points to their orthogonal projection onto the space is minimized.
Principal Components are sum of independent linear components and arrange the data set according to
variability and helps to understand the internal structure of our data set. They are used to identify the
patterns and reduce the dimensions of the dataset (reduce the dispersion) with minimum loss of
information. The number of components extracted in PCA equals the number of observed variables.
If the data set has n variables then there will be n principal components:
• The first component will have the largest variance and will be linear combination of original
variables
• The subsequent components are unrelated with previous defined components and will consist of
linear combination of variables with greatest variance
Since, it is a truncated transformation, we will be able to focus on the essential data sets and perform a
detail analysis on them. It is an important step for pattern recognition.
Eigen Value and Eigen Vector:
We can get an estimate of the variance in the data by calculating eigenvalue. The principal component is
therefore the eigenvector, which has the highest eigenvalue. The number of eigenvector is equal to the
dimensions of the system.
Dimension reduction using eigenvalue
The Principal Component Analysis is used to reduce the dimensions of the data set. For example, there
is 3 dimension data that is represented in the figure below. It is plotted along x-axis, y-axis, and z-axis.
Since the data has a common z value and variance in that direction is 0, the value of eigenvalue along
3. ISE
–
5013:
Statistical
Analysis
For
System
Design
2
that direction will be zero as shown in figure 2. Hence, we can represent the data in two dimensions now
since there is no information that can be extracted in z-axis.
Figure 1: 3 Dimensional data set represented alond x-axis, y-axis and z-axis
Figure 2: Calculation of eigenvector for the data set represented in Fig 1
Source: Dallas, George. Access on 11/25/2014, “Principal Component Analysis 4 Dummies:
Eigenvectors, Eigenvalues and Dimension Reduction.” Web blog post, access on 11/25/2014.
Weblink:https://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-
eigenvectors-eigenvalues-and-dimension-reduction/
Petroleum Engineering Definitions
Some Petroleum Engineering terms that have used in this paper have been defined below:
Porosity – is the ratio of pore space and bulk volume
TOC - Total Organic Content is composed of kerogen (hydrocarbon forming material) and
hydrocarbons (found in pore space)
Mineralogy: shales can be found in quartz a) clastics (quartz or clay rich) b) carbonates (carbon rich).
However shales are composed of both clastics minerals and carbonates and FTIR (Fourier Transform
Infrared Spectroscopy) is done to obtain the specific mineralogy.
The data used is from 168 core samples of Barnett shale at various depths from the lab Integrated Core
Characterization tabulated in the Appendix to perform Principal Component Analysis on the
petrophysical properties: Porosity, Total Organic Carbon, Quartz, Carbonates, Illite + Chlorite and
mixed clays to identify the most varying properties that can be for rock typing. The data was scaled
4. ISE
–
5013:
Statistical
Analysis
For
System
Design
3
before performing the PCA by subtracting the mean and dividing by standard deviation. Table 1
indicates the various petrophysical properties and their correlation with each other. The matrix contains
the statistical parameters such as minimum, maximum, mean of the scaled properties.
We try to capture the variation in the eigenvalue of the 6 principal components. We can capture 58%
variation if we consider only the first principal component and 87.4% by considering three principal
components and 100% if we take all 6 principal components corresponding to 6 petrophysical
properties. Table 2 summarizes the variation captured by every principal component and their
cumulative variance.
Table 1: Various petrophysical properties along with their statistical parameters
Table 2: Principal Component Analysis using eigenvalue
Eigenvalues:
F1 F2 F3 F4 F5 F6
Eigenvalue 3.496 1.161 0.592 0.491 0.219 0.040
Variability (%) 58.269 19.352 9.869 8.187 3.656 0.667
Cumulative % 58.269 77.621 87.490 95.677 99.333 100.000
Figure 3: The independent (blue) and cumulative variance captured by each component (red line)
Table 3 gives the correlation between the different parameters and tells us how they are related. The
data has been normalized before calculating the co-variance matrix. From the table we can observe that
porosity, quartz, TOC and mineralogy are inversely proportional to carbonates. However, they are
proportional to each other. Similarly the interrelation between other properties can be observed.
0
20
40
60
80
100
0
0.5
1
1.5
2
2.5
3
3.5
4
F1
F2
F3
F4
F5
F6
Cumulative
variability
(%)
Eigenvalue
axis
Screen
plot
Variable Obs. Minimum Maximum Mean Std
deviation
Scaled porosity 168 -2.506 2.787 0.000 1.000
Scaled TOC 168 -2.212 2.268 0.000 1.000
Scaled quartz 168 -1.985 2.257 0.000 1.000
Scaled carbonates 168 -0.913 2.796 0.000 1.000
Scaled illite +
chlorite
168 -2.591 2.610 0.000 1.000
Scaled mixed clays 168 -1.336 2.264 0.000 1.000
5. ISE
–
5013:
Statistical
Analysis
For
System
Design
4
Table 3: Correlation between 6 different shale composition parameters.
Table 4 represents the contribution of each petrophysical property to the 3 principal components that
captures almost 85% variability. Therefore it is observed that most significant parameters that
contribute maximum to the variability are Total Organic Carbon, Carbonates, and Illite + Chlorite. Also,
the porosity has very less contribution in variability of the matrix and remains almost constant.
Therefore we identify the 3 petrophysical properties Total Organic Carbon, Carbonates and Illite +
Chlorite to perform the k-means clustering in order to perform rock typing.
Table 4 : Correlation between various shale composition parameters with principal components.
*Values in bold correspond for each variable to the factor for which the squared cosine is the largest
k-means
The second step would be to do k-means clustering on the orthogonal data set obtained after based on
Lloyd’s algorithm. The algorithm is based on minimizing the sum of squares within the clusters to
identify the similar clusters required for classification of rocks.
Covariance matrix (Covariance
(n-1)):
Variables Scaled
porosity
Scaled
TOC
Scaled
quartz
Scaled
carbonates
Scaled illite
+ chlorite
Scaled Mixed
Clays
Scaled porosity 1 0.309 0.572 -0.551 0.236 0.223
Scaled TOC 0.309 1 0.579 -0.796 0.564 0.507
Scaled quartz 0.572 0.579 1 -0.685 0.153 0.182
Scaled carbonates -0.551 -0.796 -0.685 1 -0.744 -0.615
Scaled illite +
chlorite
0.236 0.564 0.153 -0.744 1 0.522
Scaled mixed
clays
0.223 0.507 0.182 -0.615 0.522 1
Contribution of the
variables (%):
F1 F2 F3
Scaled porosity 10.584 26.691 48.632
Scaled TOC 20.539 0.535 27.877
Scaled quartz 13.900 31.838 12.518
Scaled carbonates 27.250 0.027 0.140
Scaled illite + chlorite 14.928 21.498 2.413
Scaled mixed clays 12.799 19.411 8.421
Squared cosines of the
variables:
F1 F2 F3
Scaled porosity 0.370 0.310 0.288
Scaled TOC 0.718 0.006 0.165
Scaled quartz 0.486 0.370 0.074
Scaled carbonates 0.953 0.000 0.001
Scaled illite + chlorite 0.522 0.250 0.014
Scaled mixed clays 0.447 0.225 0.050
6. ISE
–
5013:
Statistical
Analysis
For
System
Design
5
Algorithm:
We define k centroids for k clusters. For clustering, we place k centroids very far from each other and
arrange the data on minimum distance from the centroid. After this, we identify k new centroids (close
to the mean of the data points assigned to it) and try to group the same data set according to the nearest
new centroid which is the mean of the data points assigned to it. We repeat this step until a convergence
is obtained and there is no movement of data point from one cluster to another. This algorithm is based
on minimizing the following distance:
!! = New centroid after every iteration
!! = Data point
Steps for k-means clustering:
1. Plot all the points in the space. These points represent initial group centroids.
2. Assign the data points with respect to the closest centroid according to equation 1.
3. After arranging all the data points, reassign the centroids based on the mean of data points present in
the cluster.
4. Repeat Steps 2 and 3 until the data points converge towards common centroid and centroids don’t
change.
5. The result of above process clusters the data in specific groups.
Figure 4: Representation of k-means clustering algorithm
As we can see that k-means clustering is sensitive to the initial centroids selected, it is sometimes not
able to produce the most optimal configuration. Therefore it is an iterative process and is applied
multiple times on the data set in order to reduce the error. We can identify any number of data points in
any specific number of numbers.
PCA points in the direction of maximum variance. Since TOC, Carbonates and Illite +Chlorite have
highest variation, they are used for clustering for k-means. XL-STAT was used for performing the k-
means and upto 6 classes were used to captured so as to observe the within class variance. 500 iterations
were performed to make the results more precise. The summary statistics is given below in the table 5.
1
7. ISE
–
5013:
Statistical
Analysis
For
System
Design
6
Table 5: Summary statistics for k-means for 3 petrophysical properties.
Variable
Observation
Minimum
Maximum
Mean
Std.
deviation
Scaled
TOC
168
-‐2.212
2.268
0.000
1.000
Scaled
carbonates
168
-‐0.913
2.796
0.000
1.000
Scaled
illite
+
chlorite
168
-‐2.591
2.610
0.000
1.000
From the table 6, we can observe that class 1 has maximum within-class variance and we can cluster the
data by noting the 3 classes as the change (figure 5) in between-class variance (red line) is almost 0 after
3 classes. Also, there is not a significant change in within–class variance after 3 classes. Therefore we
will cluster or differentiate the data into 3 classes.
Table 6: Evolution of variance within classes and between classes
VarianceClasses
1
2
3
4
5
6
Within-‐class
3.000
1.124
0.855
0.661
0.584
0.477
Between-‐classes
0.000
1.876
2.145
2.339
2.416
2.523
Total
3.000
3.000
3.000
3.000
3.000
3.000
Figure 5: The between-class variance (red line) and within-class variance (blue line) plotter versus
number of classes.
Results:
By performing the k-means clustering, we have obtained the following results. Table 8 gives the scaled
and the true value of centroid for every petrophysical property in each of the three clusters. Table 9
gives the mean value of every petrophysical property for the 3 clusters.
Table 8: Class Centroids and their true values
Class
Scaled
TOC
Scaled
carbonates
Scaled
illite
+
chlorite
Within-‐
class
variance
True
TOC
True
Carbonates
True
Illite+Chlorite
1
-‐0.19
-‐0.369
0.382
0.904
3.43637
12.7953
28.9343
2
-‐1.411
1.717
-‐1.38
0.9
1.42739
61.83870
278.597
3
0.978
-‐0.57
0.37
0.779
5.35814
8.0697
28.8283
0
0.5
1
1.5
2
2.5
3
3.5
1
2
3
4
5
6
Within-‐class
variance
Number
of
classes
8. ISE
–
5013:
Statistical
Analysis
For
System
Design
7
Table 9:Class central value and their true values
Class
Scaled
TOC
Scaled
carbonates
Scaled
illite
+
chlorite
True
TOC
True
Carbonates
True
Illite+Chlorite
1
(6444.5)
-‐0.151
-‐0.394
0.593
3.50053974
12.20761685
30.79640611
2
(6474.5)
-‐1.403
1.941
-‐1.231
1.44055736
67.10510541
14.69934199
3
(6756.4)
0.754
-‐0.628
0.299
4.98958450
6.706112641
28.20181353
Table 10 gives the final clustering of the cores on the basis of these petrophysical properties.
Table 10: Final clustering of the wells done by k-means using PCA on TOC, Illite
+
Chlorite,
and
Carbonates
Class
1
2
3
Objects
67
36
65
Within-‐class
variance
0.904
0.900
0.779
Minimum
distance
to
centroid
0.216
0.269
0.242
Average
distance
to
centroid
0.835
0.851
0.786
Maximum
distance
to
centroid
2.276
1.620
1.882
6432.5
6438.5
6450.5
6434.5
6456.5
6454.5
6436.5
6458.5
6519.4
6440.6
6463.2
6522.3
6442.5
6464.5
6524.2
6444.5
6466.5
6527.7
6446.5
6468.5
6531.7
6448.4
6470.5
6534
6452.5
6472.7
6535.8
6460.5
6474.5
6542.1
6491
6476.5
6544.1
6493
6478.2
6548.3
6495.2
6480.2
6558.8
6496.8
6482.2
6570.3
6503.6
6484.2
6572
6505.6
6487
6573.8
6507.4
6489
6575.65
6515
6498.6
6578.8
6517.3
6501.2
6580.9
6526.1
6509.5
6585.3
6529.2
6511.2
6589
6538
6513.6
6600
6543
6520.8
6601.4
6546.2
6540
6617
6550.6
6554.6
6619
6552.5
6561.8
6627.5
6556.8
6565.35
6629.3
9. ISE
–
5013:
Statistical
Analysis
For
System
Design
8
6560.6
6583.1
6631.1
6563.6
6591.4
6635.2
6567
6604
6637.1
6568
6633.3
6643.9
6587
6700
6645.9
6593
6722.6
6647.8
6594.6
6745.9
6650
6596.05
6752.6
6652
6598.1
6794
6654
6606
6657.7
6608.7
6660
6611
6662
6613.2
6665.6
6615.1
6669.7
6621
6671.7
6623.2
6673.8
6639.8
6675.6
6641.7
6678.7
6667.6
6680
6669.1
6682
6686
6684
6690
6688
6704.2
6691.9
6708
6693.9
6719
6696
6731
6702.1
6734
6706
6736.1
6709.9
6742.1
6712.1
6750.2
6714
6754.5
6716.3
6760.15
6724.4
6765
6727
6769.1
6729
6771.8
6738.1
6773.9
6744.1
6775.1
6756.4
6780.6
6762.3
6782.2
6784.3
Conclusions:
PCA can be used to identify the most varying components and helps in reducing the dimensions of data
set for appropriate analysis of k-means. We were successfully able to classify the rocks by using PCA
and k-means. We can conclude from the PCA that TOC, Carbonates and Illite+Chlorite are the principal
components that capture maximum variability. k-means can cluster the data in any number of classes,
however the variance within-class reduces as we increase the number of clusters. The data has been
10. ISE
–
5013:
Statistical
Analysis
For
System
Design
9
clustered into 3 groups to obtain the maximum variation. Most of the rocks (67) belong to class 1 and
have mean TOC= 3.5, mean carbonates=12.2 and mean illite +chlorite= 30.7 and least number of rocks
(36) belong to cluster 2 and have mean TOC = 1.44, mean carbonates=67.105 and mean illite +chlorite=
14.699. Class 1 has maximum within-class variance and Class 3 has least within-class variance.
Acknowledgement
The support was this work was provided by Professor Charles Nicholson, University of Oklahoma.
Appreciation is extended to Integrated Core Characterization Center, Department of Petroleum
Engineering, The University of Oklahoma for providing parameter information about the petrophysical
properties for various cores.
References
Hotelling, H. 1933. Analysis of a complex of statistical variables into principal components. Journal of
Educational Psychology, 24, 417-441, and 498-520.
Mendelhall, W. & Sincich, T. 2007. Statictics for Engineering and the Sciences, fifth edition. Published
by Pearson Prentice Hall Inc., New Jersey. ISBN 0-13-187706-2.
R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Prentice Hall, 2007.
Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage Model. In: CHES.
Volume 3156 of LNCS., Springer (2004) 16–29 Cambridge, MA, USA.
Batina, L., Gierlichs, B., Lemke-Rust, K.: Differential Cluster Analysis. In Clavier, C., Gaj, K.,eds.:
Cryptographic Hardware and Embedded Systems – CHES 2009. Volume 5747 of Lecture Notes in
Computer Science., Lausanne, Switzerland, Springer-Verlag (2009) 112–127
Kumar, V., C.H. Sondergeld, and C.S. Rai. 2012. Nano to macro mechanical characterization of shale.
SPE 159804 Presented in SPE Annual Technical Conference and Exhibition, San Antonio, Texas, 8-10
October 2012.DOI 10.2118/159804-MS
Appendix
The sample data for which rock typing was performed.
Depth
Corrected
porosity
TOC
Quartz
Carbonates
Illite
+
Chlorite
6432.5
6.67
4.15
38.6
4.7
29.1
14.7
6434.5
6.07
4.25
48.6
14.1
25.4
3.9
6436.5
4.91
3.4
41
8.4
27.7
12.4
6438.5
6
0.39
4.6
75.5
6.9
0
6440.6
5.63
3.9
37
4.9
30.2
8.9
6442.5
5.85
3.95
40.6
6.5
26.5
7.7
6444.5
0.88
3.5
29.3
12.2
30.8
14.9
6446.5
6.14
4.12
43.1
16.4
23.8
4.4
6448.4
5.27
3.61
33.7
27.8
27.6
1
6450.5
6.02
5.92
54.8
3.3
18.6
9.8
6452.5
4.46
3.15
50
13.3
21.9
0
6454.5
5.27
4.68
42.1
18.2
16
11.4
6456.5
4.42
2.49
27.3
42.7
17.9
1.5
6458.5
5.55
1.7
35.6
32.5
13.6
6.4
6460.5
5.09
3.49
40.9
6.7
27.4
12.9