Analyzing undergraduate students’ performance in various perspectives using d...
Karen Tran - ENGE 4994 Paper
1. Tran 1
Karen Tran
Dr. Jacob R. Grohs
ENGE 4994
8 May 2016
The Effects of Statics Credit on Future Mechanics Courses
Objective
The goal for this course (ENGE 4994) during the Spring 2016 semester was to hard-code
raw data, provided by Virginia Polytechnic Institute and State University, containing a large
subset of students taking statics and mechanics courses for a select portion of time (a few years).
The data obtained was 165,543 rows of de-identified transcripts, with each student identified by
a unique number. The raw data was then converted and coded to a Statistical Package for the
Social Sciences (SPSS) file by Dr. Jacob R. Grohs. The true objective was to run different
statistical analysis tests on the data to observe and quantify how transferring statics credit may
affect the performance of a student in future mechanics courses. However, because the data was
quite large, SPSS crashed multiple times trying to run certain tests. This resulted into
restructuring the data into a much more powerful and stronger program that could handle these
statistical tests: R.
Restructuring the data in another programming language (R) became the initial and
prominent objective of the course. Of course, the end goal remained to investigate and
understand how transferring statics credit affected future mechanics courses such as deforms and
dynamics. Other factors that were to be considered were the amount of times a student took a
course, how many other credits were they taking during the same semester, their GPA, their
2. Tran 2
major, and much more. There was an endless amount of questions to be proposed and answered
using statistical analysis.
Overarching Challenge
The real overarching challenge was learning how to code in R. Throughout the
experience of a typical engineering student at Virginia Polytechnic Institute and State University,
the most programming and coding knowledge through degree courses is standard MATLAB
(Mathematica) and basic Java. Unless the student is a Computer Science or Computer
Engineering major, it poses a slight disadvantage for students of other majors. Restructuring the
data into R became task that constantly needed research, Google, manuals, online guides, and
YouTube tutorials. The goal was to format each row of data to a unique student number
identified as the following:
{student_id, student_admit_type, degree1, degree2, degree3, degree4, class1, class2, etc.1
}.
Ideally, the goal was to be able to simply call out a unique student or specific category
and retrieve all of the information necessary and needed for statistical analysis. The desired
statistical analysis tests such as comparisons, hypothesis testing (p-values), and t-tests were
going to be the stepping stones to understanding performance in future courses after taking
statics (at Virginia Tech or somewhere else).
1
There were 26 different classes.
3. Tran 3
Key Progress
While there were many roadblocks and struggles, there were also many movements of
progression during the semester. I was able to sort and understand the raw data in a short amount
of time (a few weeks) as well as organize and manipulate the data to structure it aesthetically.
The original data frame (mydata), at 165,543 rows, was condensed to unique student identifiers
in a new data frame (myfinaldata) at 23,364 rows. The next few columns of myfinaldata were
built from left to right and contained headers labeled: student_admit_type, degree1, degree2,
degree3, and degree4, respectfully. With the header “student_admit_type,” it contained a list of
three (freshman or transfer, first term attended, last term attended). The degree headers contained
lists of three as well (major, GPA, graduating year). Creating lists within lists was necessary to
be able to call out a certain element from a specific header for a distinct student.
R: Programming Language
Although the raw data was not completely restructured enough to run statistical tests, I
learned a great deal about R coding. I was able to code and restructure a good amount of raw
data using built in functions and logic. I found it extremely similar to MATLAB, however the
syntax was very different and building a data frame was much more extensive than solving a
mathematics problem. By learning how to use functions such as naming and assigning variables,
“unique,” “str,” and “as.list,” I was able to make lists within each column and could call out
pieces (elements) of data for a certain student. This was helpful in a sense that it was possible to
4. Tran 4
isolate certain variables (students, degrees, freshman/transfer) and manipulate them for future
analyses.2
Remaining Challenges
Still, there are many hurdles to overcome. For students who only had one degree, the
placeholders for degree2, degree3, and degree4 were replaced with “NA” for each element (9
NA’s). This created an immense amount of unnecessary space in the data frame which could
have been easily fixed by creating a list that could have an unlimited amount of elements within
the same column (appending to a list).
For example, taken from myfinaldata, there are two students (Student 1 and Student 17)
who have a different number of degrees. The data currently looks like this:
• [Student1], [Freshman,199909,2015012], [MATH,2.18174603,199907,NA,NA,NA,NA,NA,NA,NA,NA,NA]
• [Student17], [Freshman,199809,201401], [CE,2.89947644,200301,ME,2.89947644,201201, NA,NA,NA,NA,NA,NA].
Ideally the rows should project and display like this:
• [Student1], [Freshman,199909,2015012], [MATH,2.18174603,199907]
• [Student17], [Freshman,199809,201401], [CE,2.89947644,200301,ME,2.89947644,201201].
The list should close and end if there is not anymore information applicable to the unique
student. If a student only had one degree, the third header should only contain a single list of
three. If a student only had two degrees, the third header should contain two lists of three (a total
2
The code for myfinaldata and screenshot demonstrating how to call out a unique student is
provided at the end of this document.
5. Tran 5
of six elements in the degree category). And the pattern should continue for students who had
three and four degrees (nine and 12 elements, respectively).
Another major challenge that needs to be tackled is a way to enter the data into R so that
it can read it line by line. There needs to be a method (while or for loop) to only enter
information for each unique student if they have taken a certain class. Because there are 26
classes, it would be redundant to have NA for 25 classes if a student only took one class.
9. Tran 9
Figure 2. mydata - Raw Data Output
Figure 3. myfinaldata - Manipulated Data Output
10. Tran 10
Figure 4. myfinaldata - Calling Out Select Students and Categories
Analysis of Figure 4
• x = student_admit_type
• x2 = degree1
• x3 = degree2
• x4 = degree3
• x5 = degree4
Interpretation:
• x2[[2]] = retrieve all information of degree1 for student number two
• x2[[2]][[1]] = retrieve the first element of degree1 for student number two
• x2[[3]] = retrieve all information of degree1 for student number three
• x2[[3]][[2]] = retrieve the second element of degree1 for student number three
• x3[[3]] = retrieve all information of degree2 for student number three
• x[[5]] = retrieve all information of the student_admit_type for student number five
• x[[5]][[1]] = retrieve the first element of the student_admit_type for student number five