Interpreting yield variation in commercial production of crops / Como interpretar la variación de la productividad a partir de información comercial de cultivos
Similaire à Interpreting yield variation in commercial production of crops / Como interpretar la variación de la productividad a partir de información comercial de cultivos
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
Similaire à Interpreting yield variation in commercial production of crops / Como interpretar la variación de la productividad a partir de información comercial de cultivos (20)
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Interpreting yield variation in commercial production of crops / Como interpretar la variación de la productividad a partir de información comercial de cultivos
1. www.ciat.cgiar.org Agricultura Eco-Eficiente para Reducir la Pobrezawww.ciat.cgiar.org Agricultura Eco-Eficiente para Reducir la Pobreza
Interpreting yield variation in commercial
production of crops
DAPA
(Decision and Policy Analysis Program)
2. Farmers’ production
experiences/ commercial
production of crops
Principles of
operational
research
Modern
information
technology
What we
do
Environmental characterization of the production
system
Analysis of the Observations to optimize the system
Kg/Arbol Temperatura Edad
Observations made by farmers according to their
particular circumstances
Interpreting yield variation in commercial production of crops
4. 23
• Models rely on on assumptions of:
• Normality
• Homogeneity of Variance
• Independence
• Mostly based on linear relationships
• Models do not rely on assumptions
• Linear/ non-linear relationships
The challenges !
Parametric, non-parametric?... depends on distribution of residuals
Introduction
PARAMETRIC
NON- PARAMETRIC
5. As Sharon quoted:
“La sabiduria del internet”:
I have never come across a situation where a normal test is the right
thing to do.
When the sample size is small, even big departures from normality
are not detected, and when your sample size is large, even the
smallest deviation from normality will lead to a rejected null
http://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-
distributed-in-r :
The challenges !
Parametric, non-parametric?
Introduction
6. “La sabiduria de”: Nassim Nicholas Taleb a “superhero of the mind”
(The Black Swan, Fooled by Randommess, Antifragile) - Nassim Nicholas Taleb
The statistical regress argument
“We need the data to tells us what the probability distribution is,
and a probability distribution to tell us how much data we need”
The challenges !
Parametric, non-parametric?
Introduction
7. The challenges !
Parametric, non-parametric?
Introduction
In terms of Big Data
• Approaching “N=All”
• The first is to collect and use a lot of data rather than settle for small amounts
or samples, as researchers have done for well over a century
• We can learn from a large body of information things that we could not
comprehend when we used only smaller amounts
• Sometimes to inform is better than explain – Looking for patterns
Doctors save lives in Canada by knowing that something is likely to occur,
this can be far more important than understanding exactly why
Big Data (Foreign Affairs magazine / McKinsey's High Tech)
8. What people think it is…
What it actually is…
Was clear for Antoine de Saint-Exupéry
(The little prince )
What people think it is…
What it actually is… Some of our
findings !
The challenges !
Parametric, non-parametric? Not always normal distribution !
Introduction
11. 1st case study- Andean blackberry based on ANNs
Scatter plot displaying MLP predicted yield versus real Andean blackberry yield, using only the
validation dataset1715
R² = 0.892
-0.2
0.3
0.8
1.3
1.8
-0.2 0.3 0.8 1.3 1.8
Predictedyield(kg/plant/week)
Real yield (kg/plant/week)
Predicted
Supervised models - Non-linear regression
Coefficient of determination= 0.89
Histogram displaying yield data distribution of Andean blackberry
(Kg/plant/week)
Numberofobservations
13. Results - Andean blackberry
(a) Kohonen map displaying the resultant 6 clusters and their labels according to yield values (b)
Component plane of Andean blackberry yield, the scale bar (right) indicates the range value of
productivity in kg/plant/week The upper side exhibits high values of yield, whereas the lower displays
low values
Unsupervised model - Visualization – component planes - SOM
17
Andean blackberry yieldKohonen map – 6 clusters
(a) (b)
14. Results - Andean blackberry
Component plane of effective soil depth. The scale bar (right) indicates the range value in cm of soil depth:
the upper side of the scale exhibits high values, whereas the lower displays low values
18
Effective soil depth
Unsupervised model - Visualization – component planes - SOM
15. Results - Andean blackberry
Components planes of the temperature averages. In all figures, the scale bar (right)
indicates the range value in ◦C of temperature. The upper side exhibits high values,
whereas the lower displays low values
19
Unsupervised model - Visualization – component planes - SOM
16. Results - Andean blackberry
Component planes of the specifics geographic areas Nariño–La Union–Chical alto (left) and Nariño–La
union–Cusillo bajo (right). The highest values indicate presence and the lowest absence as they are
categorical variables
Visualization – component planes - SOM
20
Nariño - La Union – Chical Alto Nariño - La Union – Cusillo bajo
17. Drawbacks
20
• Crop management factors not included (only variety)
• Only non-parametric approaches (Based on ANNs)
• Limited spatial variation (Two locations- two departaments)
Advantages
• Predictor-predictor and predictor- response dependencies through Kohonen’s
Maps
• Combination of factors
• Non-linear approach
18. 2nd case study- Lulo
Distribution of R2 obtained with each model
Regression R2
(mean)
Confidence
interval (95%)
Robust (linear) 0.65 0.63 - 0.66
MLP (non-linear) 0.69 0.67 - 0.70
Both models explained more than 60% of
variability in Lulo production
2321
Histogram displaying yield data distribution of lulo
(g/plant/week)
R2
provided by each approach
MLP
Robust regression
0.2877 0.3545 0.4214 0.4883 0.5552 0.6221 0.6889 0.7558 0.8227
0
2
4
6
8
10
12
14
16
18
20
22
24
26
NumberofobservationsNumberofobservations
Numberofobservations
Supervised modelling
19. Results - Lulo
The Sensitivity Matrix
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
%Sensitivity
Jiménez, D., Cock, J., Jarvis, A., Garcia, J., Satizábal, H.F., Van Damme, Pérez-Uribe, A., and Barreto, M., 2010.
Interpretation of Commercial Production Information: A case study of lulo, an under-researched Andean fruit.
Agricultural Systems. 104 (3): 258-270
22
Sensitivity distribution of the model with respect to the inputs/predictors
Effective soil depth
Temperature averages
Slope
20. (a) U-matrix displaying the distance among prototypes. The scale bar (right) indicates the values of
distance. The upper side exhibits high distances, whilst the lower displays low distances; (b) Kohonen
map displaying the 3 clusters obtained after using the K-means algorithm and the Davies–Bouldin index
The three most relevant variables were used to train a Kohonen map and identify clusters of
Homogeneous Environmental Conditions (HECs)
Results - Lulo
Unsupervised model - Clustering – component planes - SOM
23
U-Matrix Kohonen map – 3 clusters
21. Results - Lulo
Clustering – component planes - SOM
A mixed model with the categorical variables of three HECs, location and farmer
explained more than 80% of variation in lulo yield
Parameters Estimate
(g/plant/week)
Standard
Error
%
of total variance
Model including categorical variables of 3 HECs, location and farm
HEC 1.85 2.01 61.2%
Location 0.07 0.20 2.5%
Site-Farm 0.57 0.21 19.0%
Error 0.52 0.04 17.3%
Total 100.0%
Variance components of the mixed model estimations
24
22. Variable ranges HEC
Slope (degrees) EffDepth (cm) TempAvg_0
(°C)
5-14 21-40 15 -16.5 1
8-15 32-69 15 -18.9 2
13-24 40-67 15.8 -19 3
HEC 3 yielded 41 g/plant/week
more fruit than average
Results - Lulo
-30.00
-20.00
-10.00
0.00
10.00
20.00
30.00
40.00
50.00
1 2 3
Luloyield(g/plant/week)
Effects of clusters of environmental
conditions
25
23. Results - Lulo
Farm 7 and 9 in HEC 3. Farm 7 produced 68 g/plant/week less than average, whilst
farm 9 produced 51 g/plant/week more than average
-80.00
-60.00
-40.00
-20.00
0.00
20.00
40.00
60.00
1 2 3 4 5 8 17 5 6 8 10 11 12 13 15 16 17 19 20 7 9 14 18 19 20 21
1 2 3
Luloyield(g/plant/week)
Effects of farms across clusters of environmental conditions
1 2 3
26
Jiménez, D., Cock, J., Jarvis, A., Garcia, J., Satizábal, H.F., Van Damme, Pérez-Uribe, A., and Barreto, M., 2010. Interpretation of Commercial Production
Information: A case study of lulo, an under-researched Andean fruit. Agricultural Systems. 104 (3): 258-270
24. Drawbacks
20
• Crop management factors not included (only variety)
• Compared with the Andean blackberry study, even more limited spatial
Variation (locations within one department)
Advantages
• Iterative procedure (combination of parametric & non parametric /linear & non-
linear)
• Combination of factors
• The study is the first formal research study that evidences the yield gap
between farmers under similar climatic conditions in Colombia...provided the
basis for the site-specific analytical approaches
• Successfully identified farms that have superior management practices for
given environmental conditions
25. 23
Facto Class (Clusters de Clima)
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
Variables factor map (PCA)
Dim 1 (44.64%)
Dim2(27.62%)
bio_1
bio_2
bio_3
bio_4
bio_5
bio_6
bio_7
bio_8
bio_9bio_10bio_11
bio_12bio_13
bio_14
bio_15
bio_16
bio_17bio_18
bio_19
-5 0 5 10
-4-20246
Dim 1 (43.43%)
Dim2(29.83%)
Cluster
1
2
3
4
5
6
7
8
3er Estudio de Caso- Plátano
28. C4S5
3er Estudio de Caso- Plátano
Modelo Linear Generalizado ( MLG)
Log(Yield) = (1.22) + densidad de siembra (0.0008) + E
El modelo - Dependencias entre predictores y la variable de respuesta
Nivel de
significancia al 5%
Log (Y) = B0 + X (B1) + E
29. Log (Y) = B0 + X (B1) + X(B2) + E
C5S5
Log(Yield) = 0.80 + densidad de siembra (0.00101) + MezcVar (0.324154) + E
Modelo Linear Generalizado ( MLG)
3er Estudio de Caso- Plátano
Nivel de
significancia al 5%
30. 23
log(Yield) = β0+ β1 𝑋1 + β2 𝑋2 + … + ε
𝑒log(𝑌𝑖𝑒𝑙𝑑)
= 𝑒β0+ β1 𝑋1+ β2 𝑋2+ … + ε
(No linear)
𝑌𝑖𝑒𝑙𝑑 = 𝑒β0+ β1 𝑋1+ β2 𝑋2+ … + ε (regresando a unidad inicial Tons/ha)
𝑌𝑖𝑒𝑙𝑑 = 𝑒β0 𝑒β1 𝑋1 𝑒β2 𝑋2 … 𝑒ε (dependencias entre predictores y Tons/ha)
Con el modelo es posible calcular en cuantas veces se aumenta o
disminuye el rendimiento, mediante el cambio de una práctica específica
• Interpretación de los parámetros
3er Estudio de Caso- Plátano
Modelo Linear Generalizado ( MLG)
31. 23
Log(Yield) = (1.22) + densidad de siembra (0.0008) + E
Yield = 𝒆(1.22) 𝒆densidad de siembra (0.0008) 𝒆E
Densidad de siembra = 100 𝑒100 (0.008)
Con un nivel de confianza del 90%, se puede esperar que por cada
100 árboles/ha, el rendimiento anual en tons/ha aumente de un
3.2% a un 14.2%.
C4S5(Densidad de siembra)
• Interpretación de los parámetros
Modelo Linear Generalizado ( MLG)
3er Estudio de Caso- Plátano
32. 23
3rd case study- Plantain
Mezc Var = 𝟎. 𝟎𝟎𝟏𝟎 𝑒presencia (0.0010)
Con un nivel de confianza del 90% se puede esperar que sembrar
variedades mezcladas pueda aumentar la producción en más de 10.46%.
Log(Yield) = 0.80 + densidad de siembra (0.00101) + Mezc Var (0.324154) + E
Yield = 𝒆(0.80) 𝒆 densidad de siembra (0.00101) 𝒆Mezc Var (0.00101) 𝒆E
C5S5 (Mezcla de Variedades)
• Interpretación de los parámetros
Modelo Linear Generalizado ( MLG)
33. 23
C4S5 (densidad de siembra)
Yield = 𝒆(−2.078) 𝒆 densidad de siembra (0.0077) 𝒆dibujo de siembra(0.2079) 𝒆E
Con un nivel de confianza del 90%, se puede esperar que por cada 10
árboles/ha que se aumente en la densidad de siembra, el rendimiento anual
en toneladas por hectárea puede aumentar de un 2.3% a un 13.2 %
Densidad de siembra = 10 𝑒10 (0.0077)
• Interpretación de los parámetros
Modelo Linear Generalizado ( MLG)
4to Estudio de Caso- Aguacate
34. 23
C2S4 (Dibujo de siembra)
Yield = 𝒆(3.6) 𝒆 densidad de siembra (−0.006) 𝒆variedad (0.434) 𝒆dibujo de siembra (0.7946) 𝒆E
Dibujo de siembra = 10 𝑒presencia (0.7946)
Con un nivel de confianza de 90%, se puede esperar que un productor de esta zona
que siembre en tresbolillo en vez de cuadrado, puede aumentar su producción en
más de 30.21%
4to Estudio de Caso- Aguacate
• Interpretación de los parámetros
Modelo Linear Generalizado ( MLG)
35. Drawbacks
20
• Not enough crop management factors to applied a hierarchical approach such as
mixed models
• Limited temporal variation
Advantages
• Iterative procedure (combination of parametric and semi-parametric)
• Crop management factors included (Farmer can control them)
• Predictors- response dependencies through GLM
• Large spatial variation
• Soil information included
• Linear & non-linear approach