2. Flow of Seminar
4/6/2018 IABM Bikaner 2
Introduction
What is Data Mining and Stages of Data Mining
Steps Involved in Data Mining
Applications of Data Mining in Agriculture and Agribusiness
Case Study
Agribusiness Startups Employing Data Mining Techniques
References
3. Introduction
• Agriculture is a business with risk
• Depends on climate, geography, political and economic factors
• Some risks which can be quantified by mathematical, statistical methods, and
advanced computing
• Challenge is to extract information from large agri databases
• Data Mining is such a technology which can bring the knowledge to agriculture
development and for predicting future trends of agricultural processes
4/6/2018 IABM Bikaner 3
5. What is Data Mining
• Data mining is a process of extraction of implicit, previously unknown and
potentially useful information from large sets of data and transform it into an
understandable structure for further use
• Data mining uses sophisticated mathematical algorithms, techniques of artificial
intelligence, machine learning and statistics to segment the data and evaluate the
probability of future events
4/6/2018 IABM Bikaner 5
6. Evolution of Database Technology
4/6/2018 IABM Bikaner 6
Evolutionary steps Enabling technologies Product providers
Data collection
(1960s)
Computers, tapes IBM
Data access
(1980s)
Relational databases Oracle, Informix,
IBM, Microsoft
Data warehousing
(1990s)
Multi dimensional databases,
data warehouses
Pilot,
Microstrategy
Data mining
(emerging today)
Advanced algorithms,
multiprocessor computers
Pilot, IBM,
others
7. Rapid Growth of Data
• Databases today are huge:
- More than 1,000,000 entities/ records/ rows
- From 10 to 10,000 fields/ attributes/ variables
- Giga-bytes, Terabytes and now in Peta and Exabytes
• Databases are now growing at an unprecedented rate
• The Corporate word is a cut-throat world
- Decisions must be made rapidly
- Decisions must be made with maximum knowledge
4/6/2018 IABM Bikaner 7
8. Data Mining Stages
4/6/2018 IABM Bikaner 8
Selection
Data Understanding
Data Preparation
Deployment
Evaluation
Modelling
Get a clear understanding of the problem to solve, how it impacts organization, and
goals for addressing it
Review the data, document it, identify data management and data quality issues
Get data ready to use for modeling
Use mathematical techniques to identify patterns within the data
Review the patterns discovered and assess their potential for use
Put your discoveries to work in everyday business
Source: http://www.crisp-dm.org/
10. Steps Involved in Data Mining (KDD Process)
4/6/2018 IABM Bikaner 10
Source: Fayyad et. al. (1996)
11. Data Mining Techniques
• Classification and Prediction
example – Classifying crops based on climate requirement
• Cluster Analysis
example – Optimizing pesticide use by data mining
• Outlier Analysis
example – Fault diagnosis and Quality Control
• Association Analysis
example – Market Basket Analysis
• Evolution Analysis
example – Forecasting of changing patterns in data over time
4/6/2018 IABM Bikaner 11
12. Data Mining Tools
• Oracle Data Miner
o http://www.oracle.com
• Data To Knowledge
o http://alg.nsca.uluc.edu
• SAS
o http://www.sas.com
• Clemetime
o http://spss.com/clemetine
• Intelligent Miner
o http://www.306.ibm.com/software
• Rapid Miner
o https://rapidminer.com/data-mining-tools-try-rapidminer/
• Microsoft SQL Server
o https://www.microsoft.com/en-in/sql-server/sql-server-downloads
4/6/2018 IABM Bikaner 12
13. Applications of Data Mining in Agriculture (1/4)
• Crop yield estimation
• Estimation of damage caused by pest
• Mushroom grading
• Spatial data mining reveals interesting pattern related to agriculture
4/6/2018 IABM Bikaner 13
14. Applications of Data Mining in Agriculture (2/4)
• Detecting weeds
• Characterize agricultural soil profiles
• Studying problematic wine fermentations
• Recognizing and grading of fruits
4/6/2018 IABM Bikaner 14
15. Applications of Data Mining in Agriculture (3/4)
• Pre-diction of foodborne disease outbreaks
• Sorting apples by water cores
• Integration of agricultural data that includes pest scouting, pesticide usage and
meteorological recording is found to be useful for optimization of pesticide usages
• Precise forecasting and forewarning models of plant diseases
4/6/2018 IABM Bikaner 15
16. Applications of Data Mining in Agriculture (4/4)
• To perform forecast of pollution within the atmosphere
• Predicting the flowering and maturity dates
• Forecasting of water resources parameters
• Detection of diseases from sounds issued by animals
4/6/2018 IABM Bikaner 16
17. Applications of Data Mining in Agribusiness
• Division of market and customers into segments
• Identification of valuable clients and potential customers in the future
• Investigation of causes in customer behavior
• Defining different prices for individual customer segments
• Identification of poor payers
• Creating customer profiles that the organization desires to acquire and keep
• Identify successful tactics for keeping and acquiring customers
4/6/2018 IABM Bikaner 17
18. Role in Agricultural Domain
Data Mining methodologies Applications
Neural networks Focuses on weather forecasts, prediction of
rainfall
K-means Classifying soil in combination with GPS,
wine fermentation problem, yield prediction
Fuzzy logic For detecting weed in precision agriculture
K-nearest neighbor Simulating daily precipitation and
other weather condition
4/6/2018 IABM Bikaner 18
19. • Data mining techniques were adopted in order to predict crop
production
• Comparing the estimated values of density-based clustering with
estimated values of multiple linear regression values
Hyderabad Ramesh et al.,2013
4/6/2018 IABM Bikaner 19
CASE STUDY
Analysis of Crop Production Using
Data Mining Techniques
20. Overview of Data
• The data is used in years from 1955 to 2009 for East Godavari district of Andhra
Pradesh in India
• The information gathering process is done with three government units
like Indian Meteorological Department, Statistical Institution and Agricultural
department
• Each area in this collection is identified by the respective longitude and latitude of
the region
• The estimation of the crop production is analyzed with respect to eight parameters
namely Year, Rainfall, Area of Sowing, yield, and fertilizers
4/6/2018 IABM Bikaner 20
21. Contd…
• The year attribute specifies the year in which the data available in hectares
• Rainfall attribute specifies the Rainfall in East Godavari in the specified year in
centimeters
• Area of sowing attribute specifies the total area sowed in east Godavari district in the
specified year that region in Hectares
• Production attribute specifies the production of crop in east Godavari district in the
specified year in tons
• Yield specifies in Kilogram per hectare
• Production attribute specifies the production of crop in the specified year in metric
tons
• Fertilizers specify in tons in the specified year
4/6/2018 IABM Bikaner 21
22. Methodology (1/2)
• The statistical method namely multiple linear regression technique and data
mining method namely Density-based clustering technique were take up for
the estimation of crop production analysis
• Multiple Linear Regression
• Multiple linear regression (MLR) is the method used to model the linear
relationship between a dependent variable and one or more independent
variable(s). The dependent variable is sometimes termed as predictant i.e.
rainfall and independent variables are called predictors i.e. Year, Area of sowing,
Production
• Yβ0 β1X1 β2X2 ........ βp Xp ε
4/6/2018 IABM Bikaner 22
23. Methodology (2/2)
Density-based clustering technique
• Density is usually defined as the number of objects in a particular
neighborhood of data objects
• The Density-based clustering techniques is that, for each point of a cluster,
the neighborhood of a given unit distance contains at least a minimum number
of points
4/6/2018 IABM Bikaner 23
24. Table 1: Exact Production and Estimated Values Using Multiple Linear
Regression Technique
The estimated results using Multiple Linear Regression technique which are ranging
between -14% and +13% for 40 years interval
Observation Year Production
(Exact )
40 years interval
Production
(estimation)
Percentage of difference
2000 683423 592461 13
2001 579850 566050 2
2002 551115 579433 -5
2003 762453 722638 5
2004 743614 742752 0
2005 348727 399062 -14
2006 547716 551541 -1
2007 691069 691069 3
2008 716609 697227 3
2009 616567 633494 -3
IABM Bikaner 24
Source: D., Ramesh, & B. V., Vardhan. (2013). Data Mining Techniques and Applications to Agricultural Yield Data. International Journal of Advanced
Research in Computer and Communication Engineering
25. Observation Year Production (
Exact )
6 clusters
Production
(estimation)
Percentage of difference
2000 683423 666011 3
2001 579850 651103 -12
2002 551115 566972 -3
2003 762453 703914 8
2004 743614 737897 1
2005 348727 392770 -13
2006 547716 534709 2
2007 691069 791589 -11
2008 716609 676321 6
2009 616567 695574 -13
IABM Bikaner 25
Table 2: Exact Production and Estimated Values Using Density Based
Clustering Technique
The estimated results using Density based clustering technique which are ranging
between -13% and +8% for 6-clusters approximation
Source: D., Ramesh, & B. V., Vardhan. (2013). Data Mining Techniques and Applications to Agricultural Yield Data. International Journal of Advanced
Research in Computer and Communication Engineering
26. Result
Observation
Year
Production
( Exact )
Production ( Estimation)
Multiple Linear Regression
technique
Density-based clustering technique
2000 683423 592461 666011
2001 579850 566050 651103
2002 551115 579433 566972
2003 762453 722638 703914
2004 743614 742752 737897
2005 348727 399062 392770
2006 547716 551541 534709
2007 691069 691069 791589
2008 716609 697227 676321
2009 616567 633494 695574
IABM Bikaner 26
Table 3: Comparison between Exact production and estimated values
using Multiple Linear Regression technique and Density- based
clustering technique
Source: D., Ramesh, & B. V., Vardhan. (2013). Data Mining Techniques and Applications to Agricultural Yield Data. International Journal of Advanced
Research in Computer and Communication Engineering
27. IABM Bikaner 27
Result
Source: D., Ramesh, & B. V., Vardhan. (2013). Data Mining Techniques and Applications to Agricultural Yield Data. International Journal of Advanced
Research in Computer and Communication Engineering
28. Conclusion (Case study)
• Initially the statistical model Multiple Linear Regression technique is applied on
existing data. The results so obtained were verified and analyzed using the Data
Mining technique namely Density-based clustering technique
• In this procedure the results of two methods were compared according to the
specific region i.e. East Godavari district of Andhra Pradesh in India. Similar
process was adopted for all the districts of Andhra Pradesh to improve and
authenticate the validity of yield prediction which are useful for the farmers of
Andhra Pradesh for the prediction of a specific crop
4/6/2018 IABM Bikaner 28
31. Applications of Data Mining in Agribusiness by Cropin
Private Limited
• Agri-input Companies
• Maximizing sales efficiency per acre
• Demand projection
• Inventory module
• Farming Companies
• Real time visibility of farm activities
• Weather analytics and crop health monitoring
• Crop stage management
• Seed Companies
• Precision tracking of outcome quality & quantity
• Crop health monitoring & Yield prediction
• Micro to Macro optimization of Inputs- Weather advisory, remote sensing,
insights strengthened by machine learning and AI
4/6/2018 IABM Bikaner 31
32. Applications of Data Mining in Agribusiness by Cropin
Private Limited
• Financial Lending institutions
• Weather risk prediction, yield prediction using satellite imagery
• Machine learning and AI built algorithms for plot based monitoring
• Govt. and Advisories
• Impact more and more farmers through a systematic and archived database
• Plot based advisory module for pest and disease resolution, best agri practices
• Insurance Companies
• Reduced cost of operations
• Effective region/ Plot level Crop risk assessment
4/6/2018 IABM Bikaner 32
33. Key Features of Services Provided by Cropin (1/2)
• Farm to Fork traceability
• Quality control
• Flexible Inventory management
• SKU-tagging & traceability to
the source
• Better marketability
4/6/2018 IABM Bikaner 33
• Farm management
• Alert log & management( Pest
infestation, Diseases etc.)
• Satellite & weather input based
advisory
• Geo-tagging for accurate
predictability
34. Key Features of Services Provided by Cropin (2/2)
• Agri alternate data for accurate
decision making
• Reduced cost of operations
• Plot level monitoring system
4/6/2018 IABM Bikaner 34
• Sales team performance management
• Demo plots performance monitoring
• Order booking & tracking
• Order & dealer management
• CRM solution
• Achieve unrealized sales
36. Conclusion
• Data mining, through better management and data analysis, can assist agricultural
organizations to achieve greater profit
• Data mining technology provides user oriented access to new and hidden patterns
in data, from which knowledge is generated which can help with decision making
in agricultural organizations
• Data mining techniques and tools has a grander role in the field of agriculture and
upcoming startups viz. ranging from classification of soils to prediction of crop
yield
4/6/2018 IABM Bikaner 36
37. References (1/3)
• Bhojani, S. H. (2011). Geospatial Data Mining Techniques: Knowledge Discovery
in Agricultural. Indian Journal of Applied Research, 3(1), 22-24.
doi:10.15373/2249555x/jan2013/10
• D., Ramesh, & B. V., Vardhan. (2015). Analysis Of Crop Yield Prediction Using
Data Mining Techniques. International Journal of Research in Engineering and
Technology, 04(01), 470-473. doi:10.15623/ijret.2015.0401071
• M., Geetha. (2015). A Survey on Data Mining Techniques in Agriculture.
International Journal of Innovative Research in Computer and Communication
Engineering, 3(2), 1-6. Retrieved March 27, 2018.
• Armstrong, L. J., Diepeveen, D., & Maddern, R. (2007, December). The
application of data mining techniques to characterize agricultural soil profiles. In
Proceedings of the sixth Australasian conference on Data mining and analytics-
Volume 70 (pp. 85-100). Australian Computer Society, Inc.
4/6/2018 IABM Bikaner 37
38. References (2/3)
• Cunningham, S. J., & Holmes, G. (1999). Developing innovative applications in
agriculture using data mining. In The proceedings of the Southeast Asia regional
computer confederation conference (pp. 25-29).
• Mirjankar, N., & Hiremath, S. (2016). Application of Data Mining In Agriculture
Field. International Journal of Computer Engineering and Applications,
iCCSTAR-2016, Special Issue.
• J., Solanki, & D. Y., Mulge. (2015). Different Techniques Used in Data Mining in
Agriculture. International Journal of Advanced Research in Computer Science and
Software Engineering, 5(5), 1-5. Retrieved March 26, 2018, from
http://ijarcsse.com/Before_August_2017/docs/papers/Volume_5/5_May2015/V5I5
-0184.pdf
4/6/2018 IABM Bikaner 38
39. References (3/3)
• D., Ramesh, & B. V., Vardhan. (2013). Data Mining Techniques and Applications
to Agricultural Yield Data. International Journal of Advanced Research in
Computer and Communication Engineering, 2(9), 1-4. Retrieved March 25, 2013
• J., Brownlee. (2017, November 29). What is Data Mining and KDD. Retrieved
March 27, 2018, from https://machinelearningmastery.com/what-is-data-mining-
and-kdd
• Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to
knowledge discovery in databases. AI magazine, 17(3), 37.
• Mucherino, A., Papajorgji, P. J., & Pardalos, P. M. (2009). Data Mining in
Agriculture. New York: Springer, 1-19. Retrieved March 24, 2018.
4/6/2018 IABM Bikaner 39