Thesis summary knowledge discovery from academic data using association rule mining
1. Knowledge Discovery from Academic Data
using Association Rule Mining
SUBMITTED BY
Rajshakhar Paul
Student ID: 0805020
Shibbir Ahmed
Student ID: 0805097
Summary of the Thesis
Department of Computer Science and Engineering
BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY
2. Page | 1
i. Introduction
Students are one of the fundamental elements of any academic institution. Indeed, the prime concern for
an educational institution is to ensure qualified technical foundation, scholarly guidance and high standard
education to all of its students. For a large educational institute like public university which generates
large volumes of data, it requires an efficient way to apply data mining techniques for obtaining
knowledge on the development and performance improvement of academic activities. The knowledge
acquired from the institutional database will be sufficient to look for answers to such questions as: Which
factors determine better or worse academic performance of students? What are the causes behind the
students' retention i.e., the extended continuation of the studies in the university? Why do students drop
out before graduation i.e., students‟ abandonment from an educational institute. Concepts and techniques
of data mining are essential to discover the hidden knowledge from large datasets.
Bangladesh University of Engineering and Technology (BUET) is the topmost technological university of
Bangladesh and it enrolls the top most brilliant 1000 students selected by a competitive examination
among one million students competing higher secondary education. Among these 1000 students, top
ranked students can get admission into the different departments under different faculties. Although, this
university possesses most of the brightest students of Bangladesh, statistics demonstrates that
performance of some students degrades noticeably. On the other hand, some students perform
outstandingly at the initial stage of the undergraduate studies but they can not demonstrate the same level
of excellence till the completion of their graduation. Some students can not perform well initially but at
the end of their graduation they possess pretty good academic career. Again, there are some students in
this university who have to continue their studies year after year and take a very long time for the
completion of their graduation. Unfortunately there are also some meritorious students who drop out
before the graduation. Only statistical analysis is not sufficient for finding the reasons of all the above
problems in any academic institution. The hidden knowledge inside the institutional academic and
personal data of students is necessary to find out the possible causes of all these problems and take
suitable precaution for them. That is why knowledge discovery and data mining form academic data is
essential for educational institution like BUET to improve academic performance of students as well as
refine the standard of teaching methodologies and reshape the decision makings for the betterment of the
institution.
Discovering the hidden knowledge from educational data and applying it properly for decision making is
essential for ensuring high quality education in any academic institution. For this, data mining techniques
are very effective. But all the data mining techniques can not be applied directly on academic data
because of complex structure. This requires rigorous preprocessing. The choice of support and
confidence, selection of important association rules from huge number of generated rules are other
significant problems of knowledge discovery from academic data.
ii. Motivation
In a developing country like Bangladesh, too many students from rural area come to city for higher
education. They usually come to city leaving their family and have to accommodate with a completely
new environment. They start their new educational life at institution‟s hall. New living place, new types
of foods, new companions, new atmosphere. It is seen that they usually need some time to cope up both
physically and mentally with all of these new things which may hamper their educational activities at the
very beginning. And the scenario is bit more difficult for girls than boys. So sometimes they lag behind at
the beginning of the race of their higher studies which may create an adverse effect in the long run for
them. On the other hand, the city students are more likely familiar with the environment, living with their
family and provided with more opportunities of educational, technological and psychological aspects
3. Page | 2
which may give them some advantages in the track of higher education. Though the scenario can be
different, the more opportunities may drive them away from the track and demoralize them in studies.
In higher education system like BUET, the performance of one course depends on different aspects such
as class attendance, class test, quiz, assignments, term final examinations, etc. some of which start from
very beginning of the class. So if any student gets poor marks in any of these, it may affect the final
result. And the later courses are sometimes dependent of previous courses. So if any student gets poor
result in any course it may affect the performance of other related courses too.
So it is very obvious to discover all possible knowledge from academic data to know all the relevant rules
behind students‟ performances whether they are doing well or bad. And if they cannot perform well then
the reason behind it can also be discovered.
iii. Goal and Objectives
The department of Computer Science and Engineering (CSE) is one of the prestigious departments of
BUET. Although, this department possesses most of the brightest students of Bangladesh, statistical data
demonstrates that performance of some students degrades noticeably. Moreover the problem of retention
as well as abandonment is also prevalent among the students. The main objective of this research study is-
To discover knowledge of students‟ academic progress from academic performance with personal
statistics through the impact of different assessment of courses e.g., class test, attendance, term final
examination etc.
To find out reasons behind the degradation of student‟s merit i.e., decay in their potentiality
To discover causes behind extended continuation for graduation i.e., retention of students
To find out why some meritorious students drop out before graduation i.e., abandonment of students
iv. Key Techniques used to achieve the Goal
A. Data Analysis
1) Personal and Academic Data
In this research, we have considered academic data structure of BUET. The student data of the BIIS
(BUET Institutional Information System) contains several personal and academic information of a
particular student. We have collected them anonymously for the data preprocessing and data analysis. We
have considered these personal and academic data stated in the Table 1 for knowledge discovery
regarding academic performance, abandonment and retention of students illustrated in Figure 4.1.
Table 4.1: Selected Data from BIIS database
Academic Information
Department
Admission Year / Batch
Overall CGPA
Marks of Class test, Attendance, Two Answer
Scripts, Total Marks and Grades of all Theory
Courses
Total Marks and Grades of all Sessional Courses
Total Completed Credit Hour
Personal Information
Gender
Hall Resident/Non-resident
4. Page | 3
Figure 4.1: Factors related to Academic Performance, Abandonment and Retention of students
2) Course and Curriculum
As we have experimented with the students‟ data of the department of Computer Science and Engineering
(CSE) in BUET, we have analyzed all the courses in the curriculum which has to be taken to complete the
BSc degree. A student has to take total 68 departmental and non-departmental courses in total. All the
courses along with their credit hour are shown in Table 4.2.
Table 4.2: All Undergraduate Courses for department of CSE
Among them there are 40 theory courses (25 departmental and 15 non-departmental) and 28 sessional
courses (20 departmental and 7 non-departmental) including thesis. We determine academic performance
and impact of other factors on basis of these courses‟ final grade and marks of attendance, class tests,
term final answer scripts, total marks etc.
Course Type Credit Hour Course Number
Departmental
Theory Courses
4.0 CSE307, CSE321
3.0
CSE103, CSE105, CSE201, CSE203, CSE205, CSE207,
CSE209, CSE303, CSE305, CSE309, CSE311, CSE301,
CSE313, CSE315, CSE317, CSE401, CSE403, CSE423,
CSE409, CSE461, CSE463
2.0 CSE100, CSE 211
Departmental
Sessional Courses
1.5
CSE106, CSE202, CSE206, CSE210, CSE214, CSE304,
CSE308, CSE314, CSE316, CSE404
0.75
CSE204, CSE208, CSE300, CSE310, CSE322, CSE324,
CSE402, CSE410, CSE462, CSE464
Non-Departmental
Theory Courses
4.0 PHY109, MATH143, EEE263, MATH 243,
3.0
EEE163, MATH141, ME165, CHEM101, HUM175, MATH241,
EEE269, IPE493
2.0 HUM211, HUM275, HUM371
Non-Departmental
Sessional Courses 1.5
PHY102, EEE164, ME160, HUM272, CHEM114, EEE264,
EEE270
Thesis 6.0 CSE400
Academic Performance
Student Retention
Student Abandonment
ResidenceGender
Records of all Continuous
Assessments
Records of
Departmental Courses
Records of Non
Departmental Courses
5. Page | 4
B. Preprocessing for Mining Academic Database
1) Relational Database
Students take courses through BIIS account via registration. In the relational database illustrated in Figure
4.2, all the personal information as well as the results of taken courses of a student are stored. Through
which we can obtain the relational table containing a student‟s gender, hall status, performance of all
courses, CGPA etc.
Figure 4.2: Relational database
2) Universal Database
A universal database is created for the purpose in which records of all taken courses along with personal
information like gender, hall status of corresponding student id are stored in a single row of the table. For
a specific course, the grade, attendance, marks of class tests, marks of each section (section A and section
B) of term final answer scripts and total marks. Like this the similar records of all other taken courses are
stored in the database with the corresponding student id. And by this process the records of other students
are stored in the database one after another after the corresponding Gender and Hall Status of a particular
student. Another attribute is stored as Student Type by which we have determined the student type-
regular, retentive or abandoned. As, for applying Apriori algorithm of Association Rule Mining, we have
to set the value of attribute in discrete form. So, record such as student id has been omitted in the
universal table.
Table 4.3: Partial portion of universal database
3) Data Transformation
The universal database of Table 4.3 has been transformed into an equivalent transformation table by
transforming the continuous valued attribute as discrete valued attribute representing some knowledge for
the suitability of implementing Apriori algorithm of Association Rule Mining. As for example, CGPA is a
continuous attribute and it has been transformed into five classifications as excellent, very good, good,
average and poor. We have used one algorithm for transforming all continuous numbers for attendance,
class tests, and both sections of answer scripts of term final and total marks of a course. We have used
another algorithm for transforming all grade or grade points of courses or overall CGPA into those five
classifications.
Gender
Hall_
Status
Student_
Type
CSE103_
Grade
CSE103
_Attend
ance
CSE103
_CT
CSE103_
Section A
CSE 103_
SectionB
CSE103
_Total
…
Male Resident Regular A+ 30 55 90 75 250
Female
Non-
Resident Regular
A
25 45 85 70 225
… … … … … … … … …
Student
Grade
Sheet
Course
achieves represents
6. Page | 5
For transforming the numbers of universal table i.e., attendance, class tests, section A, section B, total
marks of each course, Algorithm1 has been developed to populate the transformed table in such a way
that there is no continuous value in an entry.
Similarly the grades of universal table are also transformed by an algorithm named as Algorithm2. As the
real data set contains CGPA in grade points we similarly consider another variable grade point and
transformed the continuous value of CGPA to these five classified definitions.
As there are theory courses of credit 4.0, 3.0 and 2.0 and sessional with credit hour 1.5 and 0.75, we need
different transformation rule tables for all these different courses. Below, Transformation rules for 3.0
credit hour (in Table 4), for 4.0 credit hour (in Table 5), for 2.0 credit hour (in Table 6) theory courses
and for all sessional courses (in Table 7) are illustrated.
Algoithm1: Marks_Transformation ( )
Input: marks of Attendance, CT, Section A, Section B, Total Marks of each course from Universal
Table of Studentlist
Output: discrete level of marks for the Transformation Table
for i=1 to | Studentlist |
if (marks>=80%)
level = “Excellent”
else if (marks<80% && marks>=75%)
level = “Very Good”
else if (marks<75% && marks>=60%)
level = “Good”
else if(marks<60% && marks>=50%)
level = “Average”
else if(marks<50%)
level = “Poor”
end for
Algoithm2: Grade_Transformation ( )
Input: all acquired Grade of each courses in the Courselist of the universal table
Output: transformed_ grade for the Transformation Table
for i=1 to | Courselist |
if grade = A+
transformed_grade = „Excellent‟
else if grade = A
transformed_grade = „Very Good‟
else if grade = A- or B+
transformed_grade = „Good‟
else if grade = B
transformed_grade = „Average‟
else if grade = B- or C+ or C or D
transformed_grade = „Poor‟
end for
7. Page | 6
Table 4.4: Transformation rule table for 3.0 credit theory course
Table 4.5: Transformation rule table for 4.0 credit theory course
Table 4.6: Transformation rule table for 2.0 credit theory course
Table 4.7: Transformation rule table for all sessional courses
To construct the entire transformed table as given in Table 4.8, we have used the universal table and
above transformation rules.
Table 4.8: Transformed table from universal table
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 27≤ M ≤30 48≤M≤60 84≤M≤105 240≤M≤300
Very Good 24≤ M ≤26 45≤M≤47 78≤M≤83 225≤M≤239
Good 21≤ M ≤23 36≤M≤44 63≤M≤77 180≤M≤224
Average 18≤ M ≤20 30≤M≤35 52≤M≤62 150≤M≤179
Poor 0≤ M ≤17 0≤M≤29 0≤M≤51 0≤M≤149
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 36≤ M ≤40 64≤M≤80 112≤M≤140 320≤M≤400
VeryGood 32≤ M ≤35 60≤M≤63 105≤M≤111 300≤M≤319
Good 28≤ M ≤31 48≤M≤49 84≤M≤104 240≤M≤299
Average 24≤ M ≤27 40≤M≤47 70≤M≤83 200≤M≤239
Poor 0≤ M ≤23 0≤M≤39 0≤M≤69 0≤M≤199
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 18≤ M ≤20 32≤M≤40 56≤M≤70 160≤M≤200
Very Good 16≤ M ≤17 30≤M≤31 52≤M≤55 150≤M≤159
Good 14≤ M ≤15 24≤M≤29 42≤M≤51 120≤M≤149
Average 12≤ M ≤13 20≤M≤23 35≤M≤41 100≤M≤119
Poor 0≤ M ≤11 0≤M≤19 0≤M≤34 0≤M≤99
Classified
Name
Range of Marks (M)
Sessional Credit Hour=1.5 Sessional Credit Hour=0.75
Excellent 120≤ M ≤150 60≤ M ≤75
Very Good 112≤ M ≤119 56≤ M ≤59
Good 90≤ M ≤111 45≤ M ≤55
Average 75≤ M ≤89 37≤ M ≤44
Poor 0≤ M ≤74 0≤ M ≤36
Gender Hall_Statu
s
Student_Type CSE103_
Grade
CSE103_
Attendance
CSE103_CT CSE103_
SectionA
CSE103_
SectionB
CSE103_
Total
……Male Resident Regular Excellent Excellent Excellent Excellent Good Excellent
Female Non-
resident
Regular Very
Good
Very Good Very Good Excellent Good Very
Good
…. …. …. …. …. …. …. …. ….
8. Page | 7
4) Dataset and Application Environment
In this experiment, we have considered the data up to the last five graduated batch in the department of
CSE, BUET. The institutional dataset of BUET consist academic and personal data of 9210 students in
last 10 years. We have categorized relevant academic and personal information of those students which
are gender, hall status, admission year, completed credit hour, all records of theory and sessional courses,
overall CGPA etc. from the relational BIIS database and transformed into universal table structure.
Finally we transformed it into a transformed table structure for applying association rule mining. The
entire experimental setup is illustrated in Figure 5.1.
Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to generate
Association Rules
After preprocessing step, we have obtained a transformed table of 582 students of department of CSE
who have already graduated. Universal table also contain one additional attribute which is student type –
retentive, regular or abandoned. Student type is obtained by analyzing completed credit hour and
admission year. We have manipulated the transformation table containing all continuous data transformed
into five discrete value- Excellent, Very Good, Good, Average and Poor. Finally we have used Weka
Explorer to the transformation table (in .csv file format) to generate interesting Association Rules.
BUET Institutional Dataset of 9210 Students
of All Departments in Last 10 years
Gender Hall Status Admission Year Completed CreditHour
All Records of Theory & Sessional Courses Overall CGPA
Universal Table Structure
Regular 552
Student Type
Retentive 26
Abandoned 4
Male 473
Gender
Female 109
Resident 348
Hall Status
Non Resident 234
Theory Course 40
Attendance Classtest Section A Section B Total Grade
Sessional
Course 28
Total Marks Grade
Transformation Table Structure
Regular 552
Student Type
Retentive 26
Abandoned 4
Male 473
Gender
Female 109
Resident 348
Hall Status
Non Resident 234
PoorAverageGoodVery GoodExcellent
All Marks & Grade of 68 Theory & Sessional Courses
Including Overall CGPA of 582 Students
9. Page | 8
v. Main Results and Discussions
1) Impact of Gender
We have found the impact of gender in the overall academic performance. This indication is very
important in terms of socio economic condition of the country. In BUET majority of the students are male
and lives in the university dormitories. There are multiple factors that affect the academic environment
and students‟ academic performance. The result of Table 5.1 points out that the male students have a very
high confidence level with the poor CGPA. The reason is that male students are generally affected by
various societal problems of a third world country like Bangladesh. All other rules support that the
academic performance of female students is better than the male students.
Table 5.1: Impact of Gender
No. Generated Interesting Rules Minimum Support Confidence
01 CGPA=Poor ⇒ Gender=male 10% 87%
02 CGPA=Average ⇒ Gender=male 10% 79%
03 CGPA=Very Good ⇒ Gender=male 10% 83%
04 Gender=male ⇒CGPA=Good 10% 26%
05 Gender=male ⇒ CGPA=Average 10% 21%
06 CGPA=Good ⇒ Gender=female 5% 22%
07 CGPA=Average ⇒ Gender=female 5% 21%
08 CGPA=Excellent ⇒ Gender=female 5% 20%
2) Impact of Residence
In BUET, most of the students live in institution hall. But the number of students live in home is also
significant fact. Analyzing the rules we have found that both the students of hall and the students residing
at home get good CGPA with a descent minimum support and confidence (in table 5.2). So if any student
wants to do well in academic prospect he can do from anywhere.
Table 5.2: Impact of Hall Status
No Generated Interesting Rules Minimum Support Confidence
01 CGPA=Average ⇒ Hall_Status=Resident 10% 65%
02 CGPA=Very Good ⇒
Hall_Status=Resident
10% 63%
03 CGPA=Good ⇒ Hall_Status=Non-
Resident
10% 43%
04 CGPA=Good Hall_Status=Resident ⇒
Gender=male
10% 82%
But it is found that the percentage of getting poor CGPA is high in hall. Because in hall, there is very little
restriction and sometimes there is no one to take care of a student as family members do. So a student can
be demoralize and get a very poor grade due to lack of studies. And as shown in rule number 1 in table
4.3, the percentage of male resident students is higher in this regard. In most of the cases, it is inevitable
that the poor CGPA holders are resident of hall (rule number 1 and 5 of table 5.3).
10. Page | 9
Table 5.3: Impact of Hall Status and Gender
No Generated Interesting Rules Minimum Support Confidence
01 CGPA=Poor Gender=male ⇒
Hall_Status=Resident
5% 51%
02 CGPA=Very Good Gender=male ⇒
Hall_Status=Non-Resident
5% 40%
03 Hall_Status=Non-Resident Gender= female ⇒
CGPA=Average
5% 24%
04 Hall_Status=Resident Gender=female ⇒
CGPA=Good
5% 21%
05 CGPA=Poor ⇒ Hall_Status=Resident 5% 52%
3) Correlation between Courses
The analyzed Association Rules show that the grade of one course may depend on prerequisite courses. In
rule number 1 we find that if anyone gets excellent grade in CSE105, he/she gets excellent grade in the
course CSE205 too with a confidence of 0.48 where CSE105 is Structured Programming Language
course and CSE201 is Object Oriented Programming Language course. We also discover that the
interrelation of course CSE311 (Data Communication-I) and CSE321 (Networking) in rule number 6, 7
and 8. We also find the impact of course CSE205 (Digital Logic Design) and CSE209 (Digital Electronics
and Pulse Technique) on course CSE403 (Digital System Design) in rule number 10 in Table 5.4.
Table 5.4: Correlation between Courses
No Generated Interesting Rules Minimum Support Confidence
01 CSE105_Grade=Excellent⇒
CSE201_Grade=Excellent
10% 48%
02 CSE201_Grade=Very Good ⇒
CSE105_Grade=Very Good
5% 30%
03 EEE163_Grade=Excellent ⇒
EEE263_Grade=Very Good
5% 27%
04 CSE205_Grade=Excellent ⇒
CSE403_Grade=Excellent
10% 50%
05 CSE403_Grade=Poor ⇒
CSE205_Grade=Average
5% 28%
06 CSE321_Grade=Average ⇒
CSE311_Grade=Average
5% 36%
07 CSE321_Grade=F ⇒ CSE311_Grade=Poor 3% 13%
08 CSE321_Grade=Poor ⇒ CSE311_Grade=Poor 3% 16%
09 CSE205_Grade=Very Good
CSE209_Grade=Excellent ⇒
CSE403_Grade=Excellent
5% 53%
11. Page | 10
4) Impact on Retention
If any student fails to pass any course then he becomes retentive because he needs to take that course
again later to complete his graduation. We find that retentive students usually struggle with the grades in
rule number 2, 3, 4, 5 and 6. If a student has not passed in CSE100 which is the first fundamental course
of CSE, he or she is retentive i.e., he or she has not passed in the later departmental courses also. This is
illustrated by the generated rule no. 1 in the Table 4.5. Moreover, we have discovered that maximum
retentive student are hall resident and male which are illustrated in rule number 7 and 8 respectively with
a high confidence in the Table 5.5.
Table 5.5: Impact on Retention
5) Impact on Abandonment
The students who have given up their academic studies without completing all the required courses are
typed as „abandoned‟. By analyzing the rules illustrated in Table 5.6, it is discovered that with a high
confidence, the abandoned students are male and resident of hall. But the minimum value of support is
very low. Thus it is found that the rate of abandonment is very low in the CSE department of this
university.
Table 5.6: Impact on Abandonment
No Generated Interesting Rules Minimum Support Confidence
01 Student Type=Abandoned ⇒ Gender=male 0.5% 100%
02 Student Type=Abandoned ⇒
Hall_Status=Resident
0.5% 75%
03 Student Type=Abandoned ⇒ Gender=male
Hall_Status=Resident
0.5% 75%
6) Impact of Continuous Assessment
The grading of a course depends on various aspects such as marks of attendance, class test, both sections
of term final examination. From rule number 7 which has a maximum confidence value 1.00, we have
discovered that the excellent grade of a course depends on the excellent performance of all other aspects
of continuous assessment. Again, the performance of class test depends on attendance which is illustrated
by rule number 5 in Table 5.7 with a confidence of 0.95 which is very high.
No Generated Interesting Rules Minimum Support Confidence
01 CSE100_Grade=F ⇒
Student Type=Retentive
5% 42%
02 Student Type=Retentive ⇒
MATH243_Grade=Poor
5% 35%
03 Student Type=Retentive ⇒
CSE205_Grade=Average
5% 35%
04 Student Type=Retentive ⇒
CSE311_Grade=Average
5% 27%
05 Student Type=Retentive ⇒
EEE263_Grade=Poor
5% 33%
06 Student Type=Retentive ⇒
CSE409_Grade=Average
5% 43%
07 Student Type=Retentive ⇒
Hall_Status=Resident
5% 65%
08 Student Type=Retentive ⇒ Gender=male 5% 81%
12. Page | 11
Table 5.7: Impact of Continuous Assessment
No Generated Interesting Rules Minimum Support Confidence
01 CSE103_Attendance=Excellent
CSE103_SectionB=Poor ⇒ CSE103_Grade=Average
10% 63%
02 CSE103_Grade=Very Good CSE103_CT=Good ⇒
CSE103_Attendance= Excellent
10% 97%
03 EEE163_Grade=Average ⇒ EEE163_SectionB=Poor 10% 57%
04 EEE163_Grade=Very Good ⇒ EEE163_Attendance=
Excellent EEE163_CT=Excellent
10% 67%
05 HUM275_CT=Excellent ⇒ HUM275_Attendance=
Excellent
10% 95%
06 HUM275_CT=Excellent HUM275_SectionA=Good⇒
HUM275_Grade=Very Good HUM275_Attendance=
Excellent
10% 75%
07 CSE401_Grade=Excellent CSE401_CT=Excellent
CSE401_SectionA= Excellent ⇒
CSE401_Attendance= Excellent
10% 100%
08 CSE401_SectionB=Excellent ⇒
CSE401_Grade=Good
10% 75%
7) Impact of Non Departmental Courses
After analyzing the generated Association Rules (in Table 5.8) we observed various impacts of non-
departmental courses on academic performances. According to curriculum we need to take some non-
departmental courses‟ performance which is added to the final result. So it may happen that some students
get poor grades in those non departmental courses. But according to generated rules though the good
performance of the non-departmental courses brings good grade but the impact of getting poor grade in
non-departmental courses causes less harm to the final CGPA because those courses are less in quantity
and maximum of those are studied at the beginning of undergraduate level. So students get enough
opportunities to improve their CGPA later.
Table 4.8: Impact of Non Departmental Courses
No Generated Interesting Rules Minimum Support Confidence
01 CGPA=Very Good ⇒ HUM272_Grade=Very Good 10% 73%
02 CGPA=Very Good ⇒ MATH143_Grade=Average 5% 37%
03 CGPA=Good ⇒EEE163_Grade=Average 5% 36%
04 CGPA=Very Good ⇒ CHEM101_Grade=Average 10% 52%
05 CGPA=Average ⇒ IPE493_Grade=Very Good 5% 29%
06 CGPA=Good ⇒ ME165_Grade=Average 10% 43%
07 CGPA=Average ⇒ MATH243_Grade=Poor 5% 27%
8) Impact of Departmental Courses
As there are too many departmental courses are studied and there some inter connection between some
courses because of prerequisite courses, the result of departmental courses affect the final CGPA very
much. From the analyzed rules, it is found that the good grade of departmental courses brings good
CGPA. On the other hand poor grade in departmental courses results in poor overall CGPA. This
significant knowledge is discovered from the rules illustrated by the impact of departmental courses in
Table 5.9.
13. Page | 12
Table 5.9: Impact of Departmental Courses
No Generated Interesting Rules Minimum Support Confidence
01 CGPA=Very Good ⇒ CSE100_Grade=Very Good 5% 42%
02 CGPA=Very Good ⇒ CSE105_Grade=Average 5% 31%
03 CGPA=Very Good⇒ CSE206_Grade=Very Good 10% 44%
04 CGPA=Good ⇒ CSE303_Grade=Average 5% 31%
05 CGPA=Poor ==> CSE321_Grade=Poor 5% 29%
06 CGPA=Excellent ⇒ CSE401_Grade=Excellent 5% 50%
07 CGPA=Average ⇒ CSE401_Grade=Average 5% 29%
08 CGPA=Average ⇒ CSE409_Grade=Average 5% 42%
v. Conclusions
Knowledge discovery from academic data is very important to improve the academic performance of any
higher educational institution. In this research, we study the academic system, the existing problems and
the performance data of the most renowned Engineering University of Bangladesh. We have found
problems like abandonment, retention and potentiality decay of the most brilliant students. We have
applied Association Rule Data Mining technique to explore the root of the cause of the above problems.
Before applying the data mining algorithm, the existing academic data has been preprocessed to make it
suitable for data mining. We have developed a data transformation technique that transforms the
relational database into an equivalent universal relational format. In this format, we have also transformed
the continuous data into discrete valued qualitative data. We have found interesting Association Rules
applying Apriori Association Rule generator on the transformed data using WEKA tool. From the large
number of association rules, we have extracted the interesting rules regarding the impacts of gender,
residence, continuous assessment on the academic performance. We have also found the association
among the courses, retention and abandonment. The obtained result is found to be very much significant
for the decision maker to improve the overall academic condition of the institution.
According to the results found, 10% of 582 students of CSE department who have already graduated are
male and have CGPA below 3.00 and the probability of being male students among poor CGPA holders
is 0.87. Again, we have discovered that, 5% of total students have poor CGPA and they are hall resident
and the probability of hall resident among poor CGPA holders is 0.52. We have also discovered the
significant correlation between courses. For example, more than 58 students have excellent grades in both
CSE105 (Structured Programming Language) and CSE201 (Object Oriented Programming Language).
The probability of having excellent grade in CSE201 among students having excellent grade in CSE105 is
0.48. We have found that there are about 30 students who has to retake MATH243 courses. We found that
5% of total male students are both retentive and hall resident and 65% of total retentive students are hall
resident. Abandonment rate is very low in CSE department of BUET as we found that only 3 male
students dropped out before completing graduation and 75% of abandoned students were hall resident.
We have also determined the impact of several Non-departmental courses. For example, more than 60
students possess very good grade in HUM272 as well as have CGPA over 3.50. We have also determined
the impact of several departmental courses. For example, 5% of 582 students have CGPA over 3.75 and
have got A+ in CSE 401. 50% of students having CGPA over 3.75 have obtained A+ in CSE 401.
We hope all these quantitative findings will be helpful to the decision maker for improving the quality of
education provided in this department. We have applied the technique to only the CSE department of
BUET but it is applicable to any department of any higher educational institute.