Contenu connexe Similaire à Knowledge discovery from vehicle e governance data using data warehousing an (20) Plus de IAEME Publication (20) Knowledge discovery from vehicle e governance data using data warehousing an1. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
40
KNOWLEDGE DISCOVERY FROM VEHICLE E-GOVERNANCE
DATA USING DATA WAREHOUSING AND DATA MINING
Pushpal Desai1
1
(M.Sc. (I.T.) Programme, VNSGU, Surat, India)
ABSTRACT
In this paper, multi dimensional schema design, data cube and OLAP operations on
Vehicle e-governance data is discussed. The proposed data mining model and its
implementation on Vehicle e-governance data is also discussed. In the first phase, Clustering
data mining algorithm is implemented to identify important clusters from the Vehicle e-
governance data. In the second phase, Association Rules Mining algorithm is applied to
explore novel relationships from the important data clusters observed in the first phase. The
results indicate that novel relationship can be found using the proposed model.
Keywords: Clustering, Association Rules Analysis, Microsoft SQL Server Analysis
Services.
I. INTRODUCTION
Inmon who is known as the father of data warehousing defines “a data warehouse as a
subject oriented, integrated, nonvolatile, and time variant collection of data in support of
management decisions” [7] [8]. Hen and Kamber defined data mining as “Extracting or
mining knowledge from large amount of data” [2]. The data warehouse and data mining
algorithm are applied in various domains for knowledge discovery. The data warehouse and
data mining algorithms are successfully used in Banking, Insurance, Finance, Marketing,
Education, Telecommunication, Medical Science, Power Industry, Weather Forecasting,
Product Design, Customer Relationship Management (CRM) etc… In our earlier research
work, we tried to find association rules from Birth registration, Decease Registration,
Property and Vehicle e-governance data [4] [5] [6]. In this article, Association Rules
algorithm is applied to find interesting patterns and relationship from the different clusters of
Vehicle e-governance data.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY &
MANAGEMENT INFORMATION SYSTEM (IJITMIS)
ISSN 0976 – 6405(Print)
ISSN 0976 – 6413(Online)
Volume 5, Issue 2, May - August (2014), pp. 40-50
© IAEME: http://www.iaeme.com/IJITMIS.asp
Journal Impact Factor (2014): 6.2217 (Calculated by GISI)
www.jifactor.com
IJITMIS
© I A E M E
2. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
41
II. METHODOLOGY
To provide better understanding of proposed knowledge discovery model, a flow-
chart for the proposed model as shown in the Figure 1. The proposed model for knowledge
discovery involves three major phases.
In the first phase, various data preprocessing tasks on source data to convert into clean
and consistent data.
Fig 1: Proposed Model for Knowledge discovery for e-governance data
In the second phase, data warehouse is designed considering various analytical needs
of the organization from the preprocessed data. In the first task, various dimensions, fact and
measures are indentified keeping in mind organization’s analytical purpose. In the next task,
the multidimensional schema design is developed considering various dimensions tables and
fact tables. In the last task, data cubes are created and perform various OLAP operations like
data drill, slice, dice etc…on it.
In the third phase, clustering and association rules mining algorithms are used to
discover knowledge from the data warehouse. In the first task, clustering algorithm is applied
on data cube to indentify major clusters or group from the data cube. In the second task,
association rules mining algorithm is applied to find novel and interesting relationship from
the data clusters observed in the first task.
3. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
42
III. MULTIDIMENSIONAL SCHEMA DESIGN
The multidimensional schema is designed for Vehicle e-governance data and OLAP
operations are performed on data cubes for data analysis. Typically, automobile companies
keep on adding new models and hence frequent updates are required in the data warehouse.
The Snowflake schema design is proposed because the vehicle’s models can be easily
updated in the data warehouse. The Snowflake Schema which stores data in normalized form
allows us to easily update data in the data warehouse. In contrast to Snowflake, the Star
Schema design stores data in de-normalized form and that make it difficult to update data in
the data warehouse. In the proposed Snowflake Schema design, VehicleRegistrationbase
Table was used as the Fact Table and Modaelmasterbase, Companynamemaster,
Vehicletypebase and Sitemaster were used as Dimension Tables. There are many measures
like Vehicle Registration Count, Vehicle Amount, Tax amount. The Figure 2 shows the
proposed Snowflake Schema design of the Vehicle data.
Fig 2: Proposed Snowflake Schema Design for Vehicle Data
After implementing Snowflake schema, Data Cube are created and various OLAP
operations are performed like Slice, Dice, Drill-down and Roll-up on Vehicle Data Cube
using Microsoft SQL Server Analysis Services [1] [3] .
4. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
43
IV. CLUSTERING
The Owner Surname, Vehicle Model, Vehicle Company, Vehicle Type, Vehicle
Price, Vehicle Tax and Registration Year are used as input parameters and Registration Date
is used as key column and generated Clustering model for Vehicle Data Cube. The Cluster
model is used to identify important group of data from the source data. The Clustering is
performed using K-mean algorithm using Microsoft Analysis Service [1] [3].
Fig 3: Proposed Clustering Data Mining Model for Vehicle Data Cube.
V. ASSOCIATION RULES MINING
The Association Rules algorithm was applied on Vehicle Cluster data. For example,
to find interesting relationship from vehicle data, ‘Car’, ‘Motorcycle’, ‘Autorikshaw’ and
‘Moped and Scooter’ clusters data are used. The Owner Surname, Vehicle Type were used as
input fields and Vehicle Company and Vehicle Model Name were used as predict only
attributes. The Apriori algorithm was used to find Association Rules from important data
clusters [1] [3].
5. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
44
Fig 4: Proposed Association Rules Data Mining Model for Vehicle Data Clusters.
VI. RESULTS
The data cube was created considering “Vehicle Registration Count” and “VAT”
measures. The “Model Masterbase”, “Vehicle Typebase”, “Year master”, “Site master” and
“Company master” tables were selected as dimension tables.
Fig 5: Vehicle Data cube’s Dimensions and Measures
6. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
45
In the drill-down and dice operation on “Vehicle Data Cube”, two dimensions “Year”
and “Company Code” were selected. The “Registration Year” value as “2003” to “2005” and
“Company Code” value as 1 – “Hero Honda” were selected. The result shown in the Figure 6
indicates that “48,189” vehicles are registered of “Hero Honda” company during the year
“2003” to the year “2005”.
Fig 6: Drill-down and Dice operation on Vehicle Data Cube with Two Dimensions
In further drill-down operation on “Vehicle Data Cube”, “Model Id” dimension with
value 1 – “Splender” was added. The Figure 7 shows that “18,015” are registered for this
particular vehicle model. The roll-up operation can be performed on all above data cubes by
removing various dimensions used in drill-down operations.
Fig 7: Drill-down and Dice operations on Vehicle Data Cube with Three Dimensions
The Clustering data mining algorithm created 10 Clusters from the Vehicle data. The
“Cluster Diagram” of the same is shown in the Figure 8.
Fig 8: Cluster Diagram for Vehicle data
7. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
46
The Cluster profile of the same model indicates presence of variables like “Company
Name”, “Owner Surname” and “Vehicle Type Name”. The variable “Company Name” name
has many states such as "Maruti Suzuki", "Hero Honda", "Bajaj", "Honda", "Huyndai",
"Tata Motors", "TVS" and Others. The variable “Owner Surname” has different states like
"Patel", "Wala", "Shah", "Desai", "SINGH", "SHAIKH", "KHAN", "PATIL" and Others.
The “Vehicle Type Name” variable has "CAR", "MOTORCYCLE", "AUTORICKSHAW",
"MOPED_SCOOTER" and "COMMERCIAL" states.
Fig 9: Cluster Profile for Vehicle data Clustering Model
To properly understand each cluster data and to answer questions such as:
• Which clusters contain data of “AUTORICKSHAW”? What are the names of the
Companies that manufactured the “AUTORICKSHAW”? What are the Surnames of citizens
who purchased “AUTORICKSHAW”?
• Which clusters contain data of “MOTORCYCLE”? What are the names of the
companies which manufactured the “MOTORCYCLE”? What are the Surnames of citizens
who purchased “MOTORCYCLE”?
To answer such questions, cluster diagram’s shading variables are used to understand
impact of different variables with its states. For example, the “Vehicle Type Name” with
“AUTORICKSHAW” state result is shown in the Figure 10. The result indicates that “Cluster
7” is having 100% population for “AUTORICKSHAW” state.
8. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
47
Figure 10: Cluster Diagram for Vehicle data (Vehicle Type Name =
“AUTORICKSHAW”)
Further analysis can be performed by viewing characteristics of “Cluster 7”. The
“Cluster 7” characteristic is provided in the Figure 11.
Figure 11: “Cluster 7” Characteristic for Vehicle data
The “Vehicle Type Name” variable with “Motorcycle” value and its’ cluster diagram
is shown in the Figure 12. The result indicates that Cluster 3, Cluster 4 and Cluster 9 are
having population for this state. The Cluster 3 is having 100% population for the value
“Motorcycle”.
9. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
48
Figure 12: Cluster Diagram for Vehicle data (Vehicle Type Name = “Motorcycle”)
The characteristics of the “Cluster 3” shown in the Figure 13 indicate that “Company
Name” variable is present with two values “Hero Honda” and “TVS”. For the “Hero Honda”
value the probability is 97.09% percent where as for “TVS” value the probability is 1.23%.
For the “Owner Surname” field the value “Patel” is present with 92.72% probability.
Figure 13: “Cluster 3” Characteristic for Vehicle data
10. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
49
In “Vehicle Data Cluster”, “Company Name”, “Vehicle Type” and “Model Name”
variables were used to find novel relationship. The “Company Name” and “Vehicle Type”
were used as input fields and “Model Name” was used as predict only attribute. Many
interesting relationships were found from the association rules mining model.
For example, the results provided in the Table 1 indicate that for “Ford” company’s
car model “Ford Ikon” is likely to be sold in the city of Surat. Similarly, “Toyota”
company’s car model “Qualis”, “Honda” company’s car model “Honda City”, “Mahindra”
company’s car model “Scorpio”, “Fiat” company’s car model “Palio” , “Tata Motors”
company’s car model “Indica” and “Huyndai” company’s car model “Santro” most likely to
be sold in the city of Surat.
Table 1: Association Rules for Company Name, Vehicle Type=”Car” and Model Name
attributes
Rule
Confidence
Rule
Importance
Association Rules
0.866 4.476980868 Company Name = Ford, Vehicle Type Name = CAR -> Model Name =
FORD IKON
0.7 4.381475564 Company Name = Toyota, Vehicle Type Name = CAR -> Model Name =
QUALIS
0.941 4.209951446 Company Name = Honda, Vehicle Type Name = CAR -> Model Name =
HONDA CITY
0.694 4.08037921 Company Name = Mahindra, Vehicle Type Name = CAR -> Model Name =
SCORPIO TURBO
0.68 4.072957414 Company Name = Fiat, Vehicle Type Name = CAR -> Model Name =
PALIO
0.719 3.680400153 Company Name = Tata Motors, Vehicle Type Name = CAR -> Model Name
= INDICA
0.749 3.613327965 Company Name = Huyndai, Vehicle Type Name = CAR -> Model Name =
SANTRO
Similarly, interesting relationship between moped / scooter manufacturer and its
model were found. The results provided in the Table 2 suggest that for “Suzuki” company’s
model “Access 125”, “Honda” company’s model “Honda Activa”, “TVS” company’s model
“Pep” and “Hero Honda” company’s model “Pleasure” is most likely to be sold.
Table 2: Association Rules for Company Name, Vehicle Type=”Moped_Scooter” and
Model Name attributes
Rule
Confidence
Rule
Importance
Association Rules
0.828 4.458154689 Company Name = SUZUKI, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = ACCESS 125
0.83 3.362508672 Company Name = Honda, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = HONDA ACTIVA
0.434 3.060536766 Vehicle Type Name = MOPED_SCOOTER -> Model Name = HONDA
ACTIVA
0.49 2.973270492 Company Name = TVS, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = PEP
0.711 2.7995445 Vehicle Type Name = MOPED_SCOOTER, Company Name = Hero Honda
-> Model Name = PLEASURE
11. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 –
6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 © IAEME
50
VII. CONCLUSION
The practical research demonstrates that data cube operations such as drill-down, roll-
up, slice and dice could be extremely useful to administrator working at the municipal
corporation, as they are able to query data considering several dimensions. The cube
operations also provide lot of freedom to the administrators as query is not fixed in nature
like we normally find in OLTP systems. Data cube operations allow administrators to
execute ad hoc queries which not possible in the OLTP systems. These results can be utilized
by automobile companies to increase sales of their products by focusing on specific
community residing in the city of Surat. The results are unique in sense that e-governance
data can be utilized by private companies to increase their sales, improve marketing of the
product and analyze the vehicle purchase trend of the citizens. Furthermore, results of
Clustering and Association Rules data mining gives better understanding of data and finds
hidden trends and new relationships from e-governance data.
VIII. LIMITATIONS
All results are based on data provided by the municipal corporation for the research
purpose only. Hence results may change, if data warehouse and data mining is applied on
actual data sets.
IX. REFERENCES
[1] Brian Larson 2008. Delivering Business Intelligence with Microsoft SQL Server 2008,
McGrawHill.
[2] Hen and Kamber 2011. Data Mining Concepts and Techniques, Morgan Kaufmann
Publishers.
[3] Jamie MacLennan et al. 2008. Data Mining with SQL Server® 2008, Wiley.
[4] Pushpal Desai and Dr. Apurva Desai 2011, The Study on Data Warehouse and
Data Mining for Birth Registration System of the Surat City, International Journal
of Computer Applications, Number 4 - Article 2, 2011, pp. 1-5, ISBN: 978-93-80746-
63-0.
[5] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis using data mining on
property tax - e-governance data, In the proceedings of National Seminar on Natural
language Processing and Data Mining, Department of Computer Science, Surat, India.
[6] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis based on association
rules mining on E-Governance system, In the proceedings of International Conference
& Workshop on Recent Trends in Technology 2012, TCET, Mumbai, India.
[7] W. H. Inmon 2005. Building the Data Warehouse, Wiley.
[8] W. H. Inmon et al. 2001 Corporate Information Factory, Wiley.
[9] Kuldeep Deshpande and Dr. Bhimappa Desai, “A Critical Study of Requirement
Gathering and Testing Techniques for Datawarehousing”, International Journal of
Information Technology and Management Information Systems (IJITMIS), Volume 5,
Issue 1, 2014, pp. 60 - 71, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413.
[10] Pushpal Desai, “Building Aggregates in the Data Warehouse: A Case Study of Birth,
Deceased and Property Registration E-Governance Data”, International Journal of
Advanced Research in Engineering & Technology (IJARET), Volume 5, Issue 6, 2014,
pp. 8 - 14, ISSN Print: 0976-6480, ISSN Online: 0976-6499.