SlideShare une entreprise Scribd logo
1  sur  138
Télécharger pour lire hors ligne
University of Exeter
College of Engineering,
Mathematics and Physical Sciences
ECM3735 Mathematics Group Project
Computer Assessment - The
Challenges and Potential
Solutions
Authors:
Candidate Numbers
003440, 035429, 006702,
000997, 019169, 008339,
011812, 006667.
Advisor:
Dr. Barrie COOPER
College of Engineering, Mathematics and
Physical Sciences Harrison Building
Streatham Campus
University of Exeter
North Park Road
Exeter
UK
EX4 4QF
Tel: +44 (0)1392 723628
Fax: +44 (0)1392 217965
Email: emps@exeter.ac.uk
December 7, 2015
Abstract
The purpose of this report is to explore the challenges
and potential solutions of current computer-based assessments. With
increasing numbers of applications for graduate jobs, there exists a
growing pressure among applicants to succeed at online assessments
set by employers. The vast number of applications received, compared
to available positions, puts an even greater need for employers to de-
velop effective and fair assessments. These can then identify the most
appropriate candidates who are able to best demonstrate their abilities
in numerical reasoning, which have been shown to be a reliable predic-
tor of job performance. In our report we approach the four questions:
How do people learn through computer-based assessment? Why is it
important to study mathematics? The Numeracy Vs. Mathematics
debate. Why do certain employers use numerical testing? Are certain
types of learners better at numerical reasoning tests? By creating our
own numerical reasoning test, we hoped to explore the factors that
affect participant’s performance. The team carried out extensive sta-
tistical analysis hoping to relate our findings back to our hypotheses.
We found significant findings for all four of our proposed hypotheses.
The overall findings of this report demonstrate that current numerical
reasoning assessments and practice tests are potentially flawed. Our
findings suggest that they fail to accommodate to all types of learners,
and in most cases fail in providing comprehensive feedback. From our
research and test findings we encourage companies and educational in-
stitutions to take on board our recommendations, such as to improve
both the feedback and preparation they offer to candidates.
Contents
1 Introduction 3
1.1 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Preliminary Findings . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 The ’Mathematics Vs Numeracy’ debate. . . . . . . . . 5
1.2.2 Why is mathematics important? . . . . . . . . . . . . . 7
1.2.3 Do people forget mathematics skills as they get older? 8
1.2.4 Why do certain employers use numerical reasoning as-
sessments? What skills do they think it will show? . . 9
1.2.5 How do people learn through computer-based assess-
ment? What works and what does not? . . . . . . . . . 14
1.2.6 Will different types of learners (kinaesthetic, visual
etc.) have different levels of numeracy? . . . . . . . . . 15
2 Methodology 16
2.1 Group Organisation . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Meeting Times . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Communication . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Combatting Risk . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Preliminary data collection . . . . . . . . . . . . . . . . 20
2.2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Test design, creation and analysis . . . . . . . . . . . . . . . . 23
2.3.1 Producing the Questions . . . . . . . . . . . . . . . . . 23
2.3.2 Programming the Test . . . . . . . . . . . . . . . . . . 25
2.3.3 Test distribution . . . . . . . . . . . . . . . . . . . . . 31
2.3.4 Test Analysis . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Report Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Skill development - Graduate Skills . . . . . . . . . . . 36
3 Findings 38
3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1
3.2.1 The Maths Vs. Numeracy Debate. Why is mathemat-
ics important? . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Why do employers use numerical reasoning testing? . . 50
3.2.3 Do different learners perform better on numerical rea-
soning tests? . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.4 How do people learn through computer-based assess-
ments? . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.5 Regression Modelling . . . . . . . . . . . . . . . . . . . 56
3.3 Feedback Findings . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Conclusion 63
5 Evaluation 66
5.1 SWOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.4 Threats . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Bibliography 77
7 Appendix 81
2
1 Introduction
In recent history our world has bared witness to some of the most
revolutionary and exciting technological advances of all time. We live in an
age where computers seem to hold a role in society on par with that of the
basic necessities such as food or water. Our planet no longer revolves merely
around the sun, but around all things computer related. These advancements
have caused a great evolution in human society; we have gone from a very
much physical world to a more paperless and virtual one. This is even true
for assessments. Nowadays companies require candidates to be assessed via
online tests as opposed to the traditional paper and pen exam. This report
endeavoured to explore such computer-based assessments. In particular, we
have looked at numeracy tests exploring both the challenges they face and
the potential solutions.
1.1 Aims and Objectives
1.1.1 Aims
Computer-based assessments are widely used by employers, govern-
ment departments and educational organisations, the list nowadays is endless.
The question is why? First and foremost, we have examined the aptitude of
a candidate in a particular subject. Two common subjects that have been
examined via online assessments are numeracy and literacy, which are two
key skills for employability, the applied skills of the subjects - Mathematics
and English respectively. There is a definite need for candidates to both learn
and improve from these tests. Motivated by this fundamental necessity we
have focused on creating our own form of computer-based assessment as part
of our project. We have used this as a vehicle with which to answer several
questions that target the way in which a person learns. This has been done
by analysing data from a sample of people who had taken our test. We have
carried out research around this subject area of computer-based assessments
and have used our data to compare our findings with the current literature
available and thus provided insight into online testing that can be useful to
both universities and employers.
3
1.1.2 Objectives
Our objectives were clear. Firstly, we have researched extensively
into literature relating to education and current learning methods. This lit-
erature has enabled us to understand the theory behind the ways in which
people learn and improve. Our research on this was both general and specific
to computer-based assessments. Furthermore, we have explored the types of
learners that exist. Once these were clearly classified we were able to explore
effective learning methods catered specifically to them. This acted as an im-
portant step since the aim had been to create an assessment which provided
effective and comprehensive feedback.
It had been planned to develop a computerised numerical reasoning
test with which we gathered the necessary data, in order to support or con-
tradict the literature and current research hypotheses that exist within the
community. We had planned to incorporate an adaptive feature into our as-
sessment which has been crucial in equipping our assessment with the ability
to tailor questions to an individual’s ability based on their performance in
the previous questions. This has not only aided our statistical analysis but
has also provided participants with the relevant practice and training that
they require. It has also been planned to provide real time feedback which
will allow participants to instantly identify where they went wrong and more
importantly how they were able to correct this.
The test was primarily a numerical reasoning test and so incorpo-
rated numerical skills similar to those tested in graduate job applications.
We therefore felt it would be beneficial to research the question “Why test
mathematics?” These findings shed light on exactly why employers incorpo-
rate such assessments into their application process and what they hope to
discover through doing so.
Age is a factor which uniquely defines a person. In our project, we
have explored whether there is evidence to support the idea that mathemat-
ics skills can deteriorate if you stop using them. Moreover, the idea that
mathematics skills are not permanent and require regular revision in order
to remain within one’s memory. This period of time could be in the years
post GCSE or A Level. We have been able to examine whether people who
study mathematics have a greater advantage compared to others. All of these
findings have been compared against the current literature available and thus
after analysis of the results of our test, we have been able to highlight any
supporting or contradicting trends.
4
There is a wide spread controversial debate, not only across the aca-
demic community, but also across the world as to whether mathematics and
numeracy are essentially the same thing. It poses the question of whether nu-
meracy skills essentially rely on mathematical skills and vice versa, whether
they are in fact completely separate disciplines. We felt this is a relevant
area to explore since it there has been talk that the UK Government has
plans to change the current mathematics GCSE by removing numeracy from
mathematics and treating them as independent subjects as mentioned above.
We hope to discover whether mathematics students actually have an advan-
tage in numeracy tests, given that this is a skill that is not relevant to, nor
practiced at degree level.
We have modelled the data that we had collected from our test re-
sults, with relevant graphs, and have further analysed it using appropriate
statistical techniques. Modelling our data in appropriate mediums has en-
abled us to efficiently compare our results to those found in the literature.
Finally, we have aimed to assess how useful our findings are relative
to our broader stakeholders. We have set out to measure to what extent,
if any, we had been able to contribute to the current problem of learning
using computer-based assessments. It has been an objective of our’s to both
highlight the problems with what is currently available and possibly improve
it, by providing solutions using our findings. We have planned to approach
professionals and experts in this field with our results in order to get reliable
feedback.
1.2 Preliminary Findings
1.2.1 The ’Mathematics Vs Numeracy’ debate.
An ever greater need for both mathematical and numerical skills
is constantly emerging. However, there is a big debate amongst society as
to whether mathematics and numeracy should be considered to be the same
thing, or whether numeracy should be a subject in its own right. There are
current plans for GCSE Mathematics in the UK to split the subject into two
separate, independent GCSE’s: Mathematics and Numeracy [60].
Mathematics is defined by Oxford Dictionary as “the abstract sci-
ence of number, quantity and space, either as abstract concepts (pure math-
ematics), or as applied to other disciplines such as physics and engineering
(applied mathematics)” [18]. Moreover, numeracy is defined by Oxford Dic-
5
tionary as “the ability to understand and work with numbers” [19]. It may
be concluded from these two definitions that numeracy is a subset of mathe-
matics. However, it can also be argued that numeracy is a subject in its own
right and should be separated from mathematics as it is more applicable in
society and workplace.
Interestingly, a paper on Numeracy and Mathematics from the Uni-
versity of Limerick, Ireland, contained no universally accepted definition of
numeracy [42]. This is backed up by research from the University of Arizona
which found that the difference between numeracy and elementary mathe-
matics is analogous to the difference between quantitative literacy and math-
ematical literacy. [29] More importantly, no universal definition for numeracy
was agreed upon, although there was much overlap between current working
definitions. The most important difference between the two forms of literacy
is that quantitative literacy puts more emphasis on context, whilst mathe-
matical literacy focuses on abstraction [29].
In a paper produced by the University of Stony Brook’s Applied
Mathematics and Statistics Department, it is stated that all mathematics in-
struction should be devoted to developing “deeper mastery” of core topics,
through computation, problem-solving and logical reasoning - which is effec-
tively what a numerical reasoning test examines. Simple proportion problems
can be integrated into fraction calculations early on. In addition, the devel-
opment of arithmetic skills in working with integers, fractions and decimals,
should be matched with increasingly challenging applied problems, many in
the context of measurement. Solving problems in different ways ought to
be an important aspect of mathematical reasoning with both arithmetic and
applied problems in order to ensure a sufficient level of numerical skills for
further progression in society [56].
The Guardian Newspaper produced an article exploring a world-
wide problem associated with the difference between mathematics in edu-
cation and mathematics in the real world. The article states that all over
the world we are mostly teaching the wrong type mathematics [28]. The
Guardian then went on to describe how computers are primarily used for
calculation purposes, despite the fact that we tend to train people for this
use. This is true almost universally [28]. We are able to relate this to the
context of the ’Mathematics Vs Numeracy’ debate, since generally, the math-
ematics taught in education is too pure and distant from the real world, while
on the whole the mathematics used in the everyday life is numeracy.
Many companies require potential employees to sit a numeracy test
6
before commencing employment, despite the fact they already hold nation-
ally recognised qualifications in mathematics. An article from an electronic
journal for leaders in education explores this. Findings show that although
the term “numeracy” is not widely used across the world, there does ex-
ist a strong consensus that all young people are required to become more
competent and confident in using the mathematics they have been taught.
Furthermore, numeracy is found to bridge the gap between school-learned
mathematics and its applications in everyday life [1]. These findings sup-
port companies in their efforts to use numerical reasoning testing as a way of
seeing whether candidates can efficiently use their formal qualifications in a
practical environment. A candidate may have achieved high results in their
school exams, but this does not necessarily mean that they will be able to use
the gained qualifications for practical problem solving which is recognised as
the main use of mathematics [28].
An insufficient level of numeracy skills has been found to lead to
unemployment, low wages and poor health furthermore highlighting the im-
portance of numeracy [43]. The need for mathematics exists in all aspects of
everyday life within the workplace and in other practical activities such as
schools, hospitals, news and understanding any statistics [14].
1.2.2 Why is mathematics important?
The study of mathematics can lead to a variety of professional ca-
reers such as research, engineering, finance, business and government services
[14]. This is supported by the University of Arizona, Department of Mathe-
matic’s who also added social sciences to the above fields [15]. It should be
noted, however, that these careers are fundamental for the world’s economy.
Therefore, it is important to ensure that people working within those fields
have sufficient skills to ensure correct and efficient problem solving and pre-
vent any detrimental consequences.
Finally, it has been suggested that poor numeracy leads to depres-
sion, low confidence and low self-esteem, leading to social, emotional and
behavioural difficulties increasing social exclusion, truancy and crime rates
[41]. With the digital age, 90% of new graduate jobs require a high level
of digital skills which are built on numeracy. Although computers are able
to reduce the need for human interaction in certain calculations, sufficient
numeracy skills are required to enable efficient use [41].
7
1.2.3 Do people forget mathematics skills as they get older?
Research has found that a severe loss of both numeracy and lit-
eracy skills often occur in adulthood, with 20% of adults experiencing dif-
ficulties with basic skills needed to function in the modern day society [6]
[20] [38]. Simple numerical calculations, such as percentages and powers, are
found difficult despite being taught and tested to the government’s standard
throughout education.
The effect of being unemployed has been explored for both men
and women and it has been found that numeracy skills get steadily worse
the longer a person is without a job [6]. Interestingly, women experience a
lesser effect than men due to their role in society being more diverse, hence
requiring them to use their numeracy and literacy skills more frequently. It
has also been found that the loss of skill largely depends on the starting level
of knowledge and understanding, and that those who have poor skills to
begin with experience a more severe deterioration. Furthermore, numeracy
skills have a smaller presence in everyday life as more people find themselves
reading than they do performing calculations. However, a decrease in liter-
acy skills leads to an even further loss of numeracy skills as it increases the
difficulty of understanding the posed question.
Important findings have been made amongst a group of nursing stu-
dents, who were asked to sit a numeracy test containing questions similar to
those that they will have to answer as part of their future job [20]. The aver-
age score was 56% with the most common types of errors being arithmetic.
There is expected to be a significant difference in results between students
with those who took a year out before commencing higher education. Those
who started immediately score on average 70%, while those who didn’t aver-
age only 47%. This shows that being in an environment that doesn’t require
the use of numeracy skills, has a deteriorating effect, not only on the ability
to perform simple calculations but also that of being able to extract relevant
information to set up an equation. This means that even with the use of a
calculator, these students are still likely to make mistakes. Students have
also been found unable to identify errors in their work, even when the result
found is unreasonable and unrealistic. Such results are potentially danger-
ous, for example, nursing students must perform calculations such as drug
dosage, which if incorrect will cost both the public in their suffering, and the
employer in having to provide additional training.
Due to the ever increasing importance of skills in the world of work,
8
especially early on in a career, a lack of numerical competence has an un-
desirable effect on employment of these individuals, which in turn affects
their standard of living [6] [46] [38]. Such requirements are brought about by
the recent changes to the labour market, with less semi- or unskilled manual
jobs available due to technological developments [46]. Unskilled workers have
difficulty in both gaining and retaining employment, and so are the first to
suffer in the case of downsizing or a crisis [6]. A low level skill set also limits
individuals to lower and middle range jobs (bottom 10% to 20%), preventing
them from experiencing career growth, and leading to severe social exclusion
[46] [6] [7]. This causes a downward cycle as low skill level is passed on from
parents to children, therefore, accelerating unemployment through the gen-
erations [7].
The government has recognised this problem and created a ’Skills
for Life’ programme, which aims to provide basic skills to adults in order to
help them gain employment [38]. Other solutions include on-the-job train-
ing, or, as research suggests, we can even prevent such severe skill loss by
ensuring pupils reach a certain skill level whilst still in education [6].
There are, of course, other factors which lead to a low level of nu-
meracy skills, such as family background, learning environment and quality
of education [6] [3]. However, in this report we concentrate on how the low
level of demand for numeracy in everyday life affects a student’s performance
in an online test.
1.2.4 Why do certain employers use numerical reasoning assess-
ments? What skills do they think it will show?
In addition to this, with a constantly changing and advancing busi-
ness world, the way in which people are hired may be a natural result of
shifts in the business environment and modern workforces. A number of
studies mentioned in A. Jenkin’s, 2001 paper speculated that the increase in
numerical tests is due to the greater professionalism of the human resource
sector of many businesses, as well as the inclusion of standard selection pro-
cedures in their business [33]. In the 21st century, Human Resources (HR)
has evolved massively, and is now an integral part of most organisations [58].
All these factors may have led to the rise of assessment centres, due to a con-
tinuous desire amongst companies to gain a professional edge. They do this
by searching for alternatives to traditional methods of employment, much of
which is done through HR. This greater reliance on HR as a business sector
9
has led to the employment of much stronger recruitment methods, which (for
reasons that will follow) enable them to meet legislation requirements, and
promote a fair practice.
In many work forces, it has been clear in recent years that employ-
ability tests have been used for means other than just performance testing.
It has enabled a platform that assesses based on merit rather than personal
criteria reducing the impact of discriminatory practices [4]. Due to equal
opportunity legislation in many countries, which is most commonly related
to the differing proportions of ethnic groups hired, many employers could be
vulnerable to prosecution [58]. These types of random psychometric tests
can therefore be used as a way to reduce bias and discrimination [33]. One
factor in explaining the increase in the use of these tests may therefore be as
a prudential response to changes in hiring attitudes and legislature. On the
other hand, the opposite has also been said - that companies need to keep
legal compliance in mind when they use psychometric tests [12] so as not to
offend candidates by using irrelevant tests. In addition, when using these
tests, the role of bias has been explored as many psychologists and compa-
nies note that testing is an intrinsically culturally biased procedure that can
cause discrimination against ethnic minorities. This is as a result of cultural
differences leading to consistently different answers across several different
social groups. Although it can be noted that this applies more specifically to
judgement and situational tests, and not to numerical and verbal listening
tests that we are planning to test our research on [30].
The rise of these tests could also be attributed to the workplace’s
lower regards for formal qualifications as a method of streaming candidates
and predicting their future abilities [33]. This may be because young labour
force entrants across the EU have much higher attainments than they previ-
ously did, and hence it is harder to sort applicants out at the top end of the
spectrum based on attainment than in the past. This may lead employers to
screen applicants much more carefully [33]. Potentially this was caused by
the previous decade of education being hailed as ’too easy’ [27], which caused
achievements to be very high. Periods like this can have knock on effects on
recruitment methods, as a reaction to these ’more qualified’ applicants fil-
tering through the recruitment system and into the business environment.
However, this may be subject to change given that recent education reforms
claiming to ’toughen’ up the curriculum, have yet to see their full effect -
particularly in terms of employment. Examples of a lack of belief in the edu-
cation system can be seen by the movements of top employers. An example of
10
this would be one of the ’big four’ professional services firms, Ernst & Young
[26], who have recently changed their application rules so that educational
attainments, such as degree class, are no longer taken into account. Instead
they believe that their own in-house testing and assessment centres are a
reliable enough indicator of whether candidates will succeed [26]. Another
example of this was with the introduction of the Army’s own mathematics
test for applicants. The reason for its development was the increasingly chal-
lenging task of using GCSE mathematics results as a discriminator amongst
applicants for technician roles [33]. If formal qualifications continue to be an
insufficient indicator of applicant’s abilities, then companies will have to find
new methods to screen them, as is happening already with the increase in
psychometric testing.
When beginning our research, we went down many different routes
to get a broad range of information. Through emails and other means of
correspondence, we identified a few problems that employers encounter with
these psychometric tests. Firstly, they are not always sat in test centres, and
many are done online. This always leaves the possibility that people may
try to cheat on these tests and get other people to sit them on their behalf
[4]. This is unfair on other candidates, as well as misrepresentative, causing
people who may not be suited for a role to progress further in applications
than they otherwise would. Having said this, most of these tests have been
designed in such a way that they are fairly difficult to cheat on - for instance
having time restraints [53]. We have also found that these tests are mostly
used as a means of filtering candidates, so passing them doesn’t necessar-
ily guarantee any further success. Secondly, some companies have said that
tests may potentially be unrepresentative since people only get one chance to
take them[53]. Due to many different circumstances, an employee may well
underperform on the test, and so not demonstrate their full potential. This
could cause companies to miss out on hiring perfectly well-suited candidates,
in which case the tests would be causing a misallocation of their resources.
Some companies have a validation test in place that allows people who got
unexpected results to retake the test. However, obviously not all companies
will guard against inconsistencies in this way [53]. On the contrary, many
recruiters we spoke to stated that these tests and their scores are used only
to assist in the recruitment process, and are not the sole factor for employ-
ing people [51]. Instead they are used as a guidance to help make informed
decisions on applicants, so a well rounded application is essential in addition
to these tests [4].
11
“Numerical reasoning is the ability to understand, interpret and
logically evaluate numerical information. Numerical reasoning is a major
facet of general cognitive ability, the strongest overall predictor of job per-
formance” [44].
Due to the numerical reasoning skills you exhibit when you take a
numerical reasoning assessment, they are seen to be the ’best overall predic-
tor of job performance’ [44]. Both numerical and verbal reasoning tests are
combined to be an overall aptitude assessment that highlights the most well
rounded, suited people for the job. Aptitude tests show employers skills that
cannot be replicated in interviews, nor be observed by reading CVs and look-
ing at past references. They are a true, accurate and quick assessment of how
candidates perform on the spot in a pressured environment. The ’govern-
ment mathematical report’ [25] alongside Careers’ websites, such as Assess-
ment Day Ltd [37] and Inside Careers [31], agree that the only mathemati-
cal abilities being tested on numerical assessments are addition/subtraction,
multiplication, percentages, currency conversions, fractions and ratios. In
addition, they are testing the ability to “interpret the tables and graphs
correctly in order to find the right numbers to work with” [31]. Numerical
reasoning tests are normally timed, in order to measure applicants’ ability to
think on their feet and problem solve under time pressure.
Prospects [45], a website designed to help people looking for jobs,
stated that employers in most industries are looking for applicants with plan-
ning and research skills, i.e. those applicants with the ability to find relevant
information from a variety of different sources. Information can be presented
in a variety of ways, such as with numbers, statistics or text in tables, graphs
and reports. Employees need to be able to understand, analyse and interpret
research and appropriately use it. Numerical assessments are testing these
exact skills.
In addition, tests can have varied levels of difficulty, to represent
the levels of numerical skill that will be needed for the specific job. SHL
Talent Measurement Assessments create a wide range of tests, ranging from
aptitude and personality tests, to customised tests for individual companies
[8]. They create a variety of tests appropriate for different job levels and
industries. Numerical reasoning tests can be adapted to have more complex
questions, requiring a more advanced level of numerical knowledge and skill.
Another way of making them more challenging is to shorten the time avail-
able to complete the test. SHL have quoted that their tests represent the
’level of work and academic experience’ [8] required for a specific job role.
12
For example, SHL released an ’Aptitude: Identify the Best Talent Faster and
at Less Cost’ brochure [9] stating that a Semi-Skilled staff job will require
a VERIFY Calculation Test, where as a Director or Senior managerial job
will need to be tested using the VERIFY Numerical Test, which is far more
advanced.
Furthermore, as numerical reasoning is just one aspect of an apti-
tude assessment it means applicants applying for highly numerical jobs may
also get asked to take a verbal reasoning test. In all jobs, an ability to com-
municate with colleagues is essential. This reiterates the fact that aptitude
tests are used to find the overall highest-calibre applicant.
A job application process is not a simple task. For many job appli-
cations candidates must spend hours researching the company, before writing
the application form and preparing for interview. Practising the skills ex-
amined in numerical tests is just another aspect of a job application that
requires preparation. Does an applicant’s mark improve with practice? If so,
then applicants can practise in order to achieve high results, no matter what
degree they study or how long it’s been since they last studied mathemat-
ics. For example, even an applicant that stopped studying mathematics at
GCSE level can use the numerous online resources available to practice and
prepare for numerical tests, and hence could easily ’revise’ for such a test
and potentially perform very well.
The overall consensus from our sources, is that numerical tests used
by large companies (especially those with large numbers applicants), are gen-
erally a candidate streaming process. With UK education standards rising
and a larger number of students receiving higher education (In January 2015,
592,000 people had applied to University, up 2% from the year before) [11],
more people are eligible to apply for graduate scheme jobs. High Fliers Re-
search presented their findings in a report, ’The Graduate Market in 2014’
the Telegraph [49] and stated that graduate schemes now receive approxi-
mately 39 applications to every available job. With the number of students
applying to such schemes high and rising, it is extremely hard to differenti-
ate between candidates who have all achieved high grades and well regarded
university degrees. How do you select the ’best’ candidate from thousands
of similar applications? Due to this difficulty, companies use these to reduce
the number of applicants they consider in the next application step. Accord-
ing to Personnel Today [47], 80% of companies use standard off-the-counter
numerical tests provided by companies such as SHL. Only 18% use a test
which they have tailored to measure unique, customised skills that they are
13
looking for. Some would argue that since off the counter tests aren’t unique
to a company, then such a numerical test will not truly assess competency
for a specific job role.
1.2.5 How do people learn through computer-based assessment?
What works and what does not?
Another topic we explored was how people learn through computer-
based assessment. There are many methods that aid learning on a com-
puter. The most popular and commonly used forms of these are multiple
choice or true/false questions, labelling images, rank ordering and gap fill-
ing. Computer-based assessment can be very popular with both students and
teachers. They increase student confidence and are liked by students due to
the fact that they get rapid, if not immediate results. They can even be com-
pleted in a student’s own time when they are ready to do so. A teacher is
also likely to use these methods as a way of administering frequent formative
or summative assessments, since less time is spent marking. Then not only
can they spend more time adapting their teaching methods (depending on
the results of these assessments), but they can do so reasonably soon after
the test is taken [39].
Feedback is crucial to the learning process and, as mentioned, one
of the advantages of immediate feedback is that the student receives their
result straight away, rather than after they’ve moved on from a particular
topic. A study conducted at the University of Plymouth [36] compared two
groups of students; one using several online materials with two levels of feed-
back and another using none of them, to see how they performed in an end
of module summative assessment. The group using the available study ma-
terials performed significantly better than the other group.
Although computer-based assessments can greatly benefit a stu-
dent’s learning, there are concerns that online tasks, especially multiple-
choice questions don’t encourage deep thinking about a topic, and so don’t
aid learning [34]. In order to be as beneficial as possible, these assessments
need to both engage and motivate students.
14
1.2.6 Will different types of learners (kinaesthetic, visual etc.)
have different levels of numeracy?
Our final area of research was different learner types, and whether
some of them would be better at numeracy than others. According to ESL
kid stuff, there are many different types of learners, such as Tactile, Global
and Analytic. However most people fall into at least one of the following
three categories: Kinaesthetic, Visual and Auditory [52]. Katie Lepi [35]
describes these types of learners in her article, “The 7 Styles of Learning:
Which Works For You?”. She describes kinaesthetic (or physical) learners
as people who prefer using their bodies, hands and sense of touch. Writing
and drawing diagrams are physical activities, so this sort of activity really
helps them learn. Role-play is another commonly used activity for these
types of learners. They often have a ’hands-on’ approach, so learn best from
partaking in physical activities. On the other hand, visual learners do better
by looking at graphs, watching someone do a demonstration or simply by
reading. Finally, auditory learners are the kind of people who would rather
listen to something being explained to them than read about it themselves.
A common way for them to study is to recite information aloud, or to listen
to recordings. They also usually like to listen to music while they study [57].
There are many different types of learner styles, and even though
most people use a combination of all three techniques, they usually have an
idea of how they learn best. If you know what type of learner you are from
a young age, then it puts you at an advantage. However, it is also important
to adapt your learning techniques whilst you are young so that you are able
to use each learning technique effectively [48]. Our aim is to see if there is
a correlation between numerical ability (based on our test results) and type
of learner. We understand that online computer-based assessments mainly
cater for visual types of learner, and so we do not aim to change the online
test in order to reflect this, but instead hope to test this theory as part of
our analysis.
15
2 Methodology
2.1 Group Organisation
In this section, we discuss how we took full advantage of the time given to
complete this project, by organising the group members efficiently.
2.1.1 Meeting Times
In order to make the most of our meetings, it was important to
choose a suitable time for everyone. We decided it would best to meet 2-3
times a week, including a weekly meeting with our project advisor. We ini-
tially discovered that there were not many slots in the week that we could
all do, due to timetable clashes. To make things clearer we used the widely
acknowledged online scheduling tool Doodle (see Figure 1), to pick a conve-
nient time for all group members. The Doodle worked well as it was very
efficient and quick to carry out, and prevented the confusion we found with
suggesting times among ourselves. In the first few weeks of the project, we
met a considerable amount, however as term progressed we had set times
in which to meet every week; 15:30-16:30 on a Monday and 10:00-12:00 on
a Wednesday. To make sure we had a private space for every meeting, we
assigned one person to be responsible for booking rooms. During these meet-
ings we would discuss development of the project, by updating each other on
the progress of our individual responsibilities, and we would delegate future
tasks.
16
Figure 1: An example of us using Doodle to decide on suitable times for our
group meetings.
2.1.2 Communication
One in seven people now use Facebook to connect to their family
and friends [32]. It is the most popular form of social media. As a result,
we decided the best form of communication between group members, would
be through Facebook. We created a closed group (see Figure 2) so that we
could share files containing any work we had completed. We also exchanged
numbers and created a ’Group Chat’ on Whatsapp, an instant messaging
application. The team looked into using Google Documents to keep and edit
our work. We found we were limited by this as the site required a Google ac-
count which not all group members had. It also was more difficult to facilitate
comments and project related discussions. In contrast, our Facebook group
allowed all these things and it was quickly decided that this site would be our
main form of communication, as no other platform worked more efficiently.
17
Figure 2: Evidence that we created a closed Facebook group with all mem-
bers.
2.1.3 Combatting Risk
The decision to use Facebook as our main method of communication
was ideal for our project. It minimised the possibility of losing files and data,
which would have had a huge impact on our project. The use of a closed
group meant every member of the group could access and upload documents
quickly and efficiently throughout the project. So that the rest of the group
could edit key information or findings, if necessary. We also decided to split
into subgroups which combatted the risk of absence. If one member of a
subgroup was not able to complete a certain piece of research, for example
due to illness. The other members of the subgroup would be able to finish
it, since they would also have a good understanding of the task, having been
studying the same topic.
18
Initially we went about identifying all the tasks and activities we
wanted to complete throughout our group project. We were then able to
create a critical path (see Figure 3) to see if we would be able to finish
all these tasks within the time available. The critical path also allowed us
to recognise what needed to be prioritised and what could be completed in
parallel to one another.
Figure 3: Our Critical Path Analysis.
2.1.4 Subgroups
Once we had highlighted the key parts of our projects we decided
that we would split into subgroups to spread the workload. This enabled
us to undertake multiple tasks at once so that we can collaborate to meet
our timeframe. The four groups were: writing the questions, programming,
statistics and writing up of the report. When deciding whom to put in which
subgroup we asked each individual what their strengths and weaknesses were,
in order to best utilise our skills for instance, some members of the group
preferred programming to statistics.
19
Deciding who would be in each subgroup was not difficult. Some
members of the team were interested in the creative nature of writing the
questions. While others had enjoyed computer programming modules taken
in previous years. We decided to put more people into the programming
subgroup, having highlighted early on that this was probably going to be the
most time consuming part of the project, and that there was not a lot of
previous programming knowledge within in the group. Some members have
already statistically analysed models in the past, so they formed a statistics
subgroup. Then finally, another subgroup has put themselves forward for
editing and compiling the final report, as they have experienced working
with LaTeX and enjoyed editing the written information. Even though the
final version of the report will be passed through this subgroup, everyone has
taken a very active role in the write up of the report.
2.2 Data Collection
2.2.1 Preliminary data collection
The next stage for our team was to gather preliminary data to aid
our project - in particular with the development of our own online test. We
started by doing some initial research around our topic, in order to find areas
that we could look into further. After discussing our initial findings, we came
up with four main topics that we would research further, as stated in our
introduction. As a result, we had to forgo many other interesting areas, but
we decided that these were the four most relevant areas on which to focus
our objectives. We also felt that including any more areas of study would
cause us to not have enough time to complete the project, nor would we be
able to write about them in sufficient depth. We split up our team into four
two-person groups and assigned a different area of research to each one, so
as to manage our time and resources more efficiently. The only down side of
this was that not everyone in the group was fully informed on every topic.
However, this was easily overcome by compiling our research into one docu-
ment, and making it available on every social platform that we were using.
We went about our research in a variety of different ways. Firstly,
using available literature such as reading papers, articles, books and web-
sites to find evidence for or against our initial thoughts on each topic. This
involved much information dissemination and analytical skills on the part
of the researchers who had to read through huge amounts of information
20
and extract the necessary details in an articulate way. In addition, we car-
ried out primary data collection by emailing and contacting relevant sources,
such as employers, online test providers and academics. For some of these
we established individual contact, asking them specifically for advice or more
information on our project, but for the bulk of employers and career websites,
we generated a questionnaire to distribute to them. We decided to do it in
bulk after the quick realisation that not many companies were responding
to our emails. This could have been due to the fact that they were not in-
terested in our group project, or some companies might have been too large
to assign a contact or specific department to contact us. Using inputs from
the separate research groups, so as to make the questionnaire as relevant and
useful as possible, we asked a range of questions. This questionnaire was also
in a far easier format for companies to respond to, as it saved them time
and effort formulating unassisted responses. Bulk distribution ensured that
we got as many responses as we could in the limited time frame we had to
complete our research.
Once the research stage of the project had been completed and we
had all our necessary sources, we began to write it up. Within our subgroups,
we compiled our best findings and formalised them for our report. We each
wrote up our sections, complete with references, ready to be passed along to
the editing team. With this, we also included a full write up of our reference
information to go into our bibliography.
2.2.2 Survey
Now that the research stage of our project had been completed, it
was time to move forward with the creation of our own online resource to
test our findings. After discussing it as a group we decided that one of the
easiest and quickest ways to gather information was by creating an online
survey. We felt that this was far quicker to distribute and analyse results
with than other methods, such as focus groups, meaning we would have less
of a time constraint. The aims of the survey were firstly, to test some of
the conclusions and theories formed from our research and discuss what this
showed and secondly, to help us create our computer-based assessment by
finding out what students find most useful when they are learning. To do
this, we asked several questions about learning techniques, types of learners
and effective testing methods. We then passed this information on to the
subgroup in charge of writing the questions for our online test. They used the
21
survey feedback to help us create a test in response to what people preffered.
We felt this would give us a more tailored test written in the most helpful
way to students. The fact that the test was designed with student input
in mind meant that we could try to benefit test participants, and hopefully
improve on currently available tests.
Figure 4: The first page of our survey.
We created the survey using Google documents (see Figure 4) and
set it up as a form. We looked into other online survey distributors but
found Google documents to be the best platform as most required a payment
to release the survey if it contained more than 10 questions. Google forms
allowed us to make an unlimited amount of questions, was quick and easy
to use and exported our data straight into Excel for us to analyse. Using
contributions from all research groups we generated a draft survey. The sur-
22
vey was then checked by the group to ensure it was appropriate before it
was released. This allowed us to make a few necessary changes to the word-
ing and remove overlapping questions in order to shorten the test. We were
aware that people might be put off taking our survey if it was too lengthy
and therefore time consuming. For this reason, we tried to ensure that most
questions involved answering either with multiple choice or with a scale of
agreement. We also tried to make sure that the survey took no longer than
15 minutes to complete.
We then distributed the survey to the public so that we could anal-
yse our results as quickly as possible, given that we were on a tight schedule.
To take the survey all that was needed was the web link. We spread this link
across as many social media platforms as we could, including Facebook and
Whatsapp. We felt that this would be the quickest way to distribute our sur-
vey as it would target our main audience, students, in a way that was easily
accessible for them. The fact that the form was created online made analysis
far easier as we could see responses as they came in, and so by keeping an
eye on the data, we were able to start analysing the feedback as soon as we
had a sufficient number of responses. After approximately a week, we had
gathered a large amount of responses and when numbers began to plateau
we decided to start reviewing the data.
2.3 Test design, creation and analysis
2.3.1 Producing the Questions
While the programming group focused on the technical aspects of
creating a computer based assessment, those tasked with writing questions
for the test had to make sure they referred back to the information we had
already collected about online assessments. We started off by looking at re-
sults from our survey in order to determine what types of questions we ought
to be asking. As found in our initial research, multiple choice questions were
the preferred method of answering. The survey has shown us that gap filling
was the least popular method, however we decided that we would still in-
clude questions of this form in our test for two reasons. Firstly, it is the most
accurate way of seeing if a student has really understood a question, since
they can’t guess the answer, and secondly, we thought it would be interesting
to include so that we could see if students tended to do worse in these types
of questions, as we had hypothesised.
23
The next stage was to decide what topics to base our questions on.
We wanted to focus on the numerical reasoning style of questions just like
on the currently available employability tests. We did this by researching
these numerical reasoning tests and replicating their style of questions. This
ensured our test was relevant and had the potential of preparing people for
such testing.
Some initial points raised focused on the types of questions we would
have to ask, what topics we would focus on, and how many levels of difficulty
we should have. It was also noted that our questions would have to be both
realistic to program and relevant to research, in order for the results to pro-
vide us with useful information that the statistics group would then be able
to analyse. Each member of the subgroup was then tasked with a different
research assignment. One member focused on how to effectively test different
learner types, while the other two members focused on looking up example
questions at different levels of difficulty. Having done this, it emerged that
online tests naturally cater more for visual learners and not for the other two
learner types [10] [40] [50]. We took the decision not to focus on this aspect
when writing our questions, as we would not be able to created different
types of questions for a specific learning style, other than visual.
Having established that a variety of levels was essential to fulfil our
aim of creating an adaptive test, it remained to decide which difficulties we
would pick. Since we knew that all participants would have a minimum of
GCSE-level mathematics or an equivalent qualification, but not necessarily
any further qualifications, we decided to make this our top level of difficulty.
However, after their preliminary discussion with statistician - Dr Ben Young-
man, the statistics subgroup informed us that having any more than three
levels of difficulty in our test would significantly hinder statistical analysis
of data later on in the study. This is due to the fact we would be unable to
create an effective model. On the other hand, we were concerned that this
would reduce the range of results, as if we had six similar questions at the
same level, and it was likely that if a participant could answer one question
correctly, then they could complete them all. For this reason, we decided to
incorporate both KS2, KS3 and GCSE level mathematics. The final element
of the decision making process involved reading through the current curricu-
lum for KS2, KS3, as well as GCSE-level Mathematics.
The final element of the decision process involved reading through
the curriculum for Key Stages 2 and 3, as well as GCSE-level mathematics,
in order to single out the recurring, most important topics so that we could
24
base our test questions around them[22] [24] [23]. The final decision we made
was to write two questions for each of ’percentages’, ’ratios’ and ’algebra’ at
Key stages 2 and 3, and then to write six statistics GCSE questions, which
would incorporate these topics. In this way, we would have 3 multiple choice,
and 3 gap-fill questions at each level of difficulty.?
Once we?d made all the relevant choices, it was time to write the
questions. We found examples of questions on the topics we were focusing
on, by looking at teaching resources websites, such as TES [2] [54]. We
then adapted these to suit our own needs ? not only did we want to model
questions to resemble currently available online assessments, we also had to
generate wrong answers for every question that was to be multiple choice.
This was the hardest element of the process, as it involved deliberately mak-
ing common mistakes with the aim of generating possible wrong answers.
Luckily, this was achievable, and due to diligently writing down our thought
processes, we were able to relate how we?d created these wrong answers to
the programming team, so that they had an algorithm to use in the randomi-
sation of questions later in the process.
2.3.2 Programming the Test
In this section, we discuss the writing of our online test.
Our test acted as a vehicle to provide relevant data in order to help
answer the theories we had posed from our research. This meant it was an
integral part of our project outcome and was therefore very important to us.
We began meetings regarding the creation of the test very early on as we
were aware that it would be a very time consuming part of our project. In
these, we discussed how we were going to approach the programming aspect.
Firstly, we had to choose the programming languages that we would
use. We looked into a few different methods. Our first idea involved using the
Exeter Learning Environment so that all Exeter students would be able to
easily access the test. We thought this would help with distribution, as this
website is used by all students at the university, however, the programming
behind the website was far too restricting in terms of what we had planned
with regards to coding. It also presented the problem that our results would
be restricted to one university. Another language we could have used was
a version of Maple that would both code and present our questions. It
became apparent that it wouldn’t facilitate certain aspects of our test, such
as feedback and randomisation. After exploring these different options with
25
our project advisor, we decided it was best to use the popular server sided
language PHP, HTML to code the questions, and to store data on a MySQL
database. We choose PHP as it is a relatively simple language that integrates
easily with HTML, which is the main language used in the appearance of web
pages and was what our questions would need to be written in. It was also the
most flexible language so would not restrict us in the design of our test and
would enable us to create dynamic webpages involving randomised variables.
This was very important as many of our test aims involved randomisation
and forms, something that PHP would facilitate and so it would enable us
to move information on and off our database effectively. The only limitation
of this was that before we had access to an online server we would find it
difficult to practice running our code. This was overcome by using XAMPP,
a free software that replicates the process of using a server but can be done
offline. This meant that we could run our test as it was developed, in order
to check its appearance at every stage of developing the test.
26
Figure 5: Above is an example of PHP code which we used to generate
Question 1 in our Numerical Reasoning Assessment.
27
Figure 6: Above is an example of HTML code being echoed in PHP which
we used to submit Question 1 in our Numerical Reasoning Assessment.
The subgroups had researched the current curriculums, decided on
the layout and the contents of the test, so the next step was for the program-
ming team to create it. Firstly, we familiarised ourselves with both PHP
and HTML and got used to writing functions. We used a variety of resources
from the library [13] and the internet [59], as well as using our own previously
acquired skills. We aimed to understand how to print text, show images and
generate tables using HTML so that we could write a well-presented and pro-
fessional looking test. We also had to learn how to interact with our online
database, move data on and off it and store our results. Following this, we
split up the workload between five people, each person being in charge of cer-
tain questions and aspects of the test. The limitations we came across were
28
the time restraints for programming, because of the short 10 week period.
Due to our initially low level of programming skills, a significant amount of
time was spent on familiarising ourselves with the languages and understand-
ing the capabilities of the chosen languages.
The starting page of our test provided some preparation informa-
tion on materials the participant would require, as well as explaining the
procedure of the test. The voluntary nature of the test had been specified
to ensure the participants did not feel pressured and could terminate at any
point. The second page of the test was dedicated to data collection, gath-
ering information on age, gender, subject area, GCSE mathematics grade as
well as how long it had been since they had last studied mathematics. We
also included their university ID,as one of the variables which was then used
as an identifier. This was in case a participant chose to sit the test more than
once, so we would be able to determine whether an improvement in the mark
occurred. The scores awarded were also linked to this identifier so only rows
with a matching identifier would be changed, with a score of one for a correct
answer and a zero otherwise. We also chose to ask the participants what type
of learner they thought they were by providing relevant descriptions, in order
to aid us in determining whether that had an effect on their mark in later
data analysis. Another piece of data collected throughout the test was the
time it took the participants to complete each question, we did this using
timestamps in PHP. This helped to determine whether any cheating took
place, as well as determining the questions which were found most difficult.
As all of us already had sufficient understanding of the code for the
write up of the questions, as they were only tables and simple text so were
quick to do, allowing us to concentrate on the more complex parts of the
programming as described below.
Some of our questions (please refer to the Figures 39 to 81 in the
Appendix Section for print screen shots of our Numerical Reasoning Assess-
ment) included images, such as pie charts and stick diagrams to cater for
different types of learners, as mentioned earlier on in this report. Initially
we attempted to code them instead of just inserting the images themselves
so that we would be able to adapt them, but soon realised it would be an
unrealistic target for the short time and limited skills we had. As a group, we
made the decision to include them as static JPEG images instead, deciding
that the impact of this would be very small. In certain instances we could
avoid this limitation, as we were still able to randomise the questions. For
others, we decided it was more important to meet our time constraints and
29
generate our statistics than to worry about randomisation.
As we wanted to produce questions with both multiple choice an-
swers and manual input answers, two types of code had to be written. The
approach to writing the multiple choice questions was more complex and
time consuming, as realistic answers had to be developed in order to make
mistakes believable and the correct answer was not too obvious. However,
recording both types of answers as either right or wrong used the same pro-
cedure of defining a correct answer and comparing an answer given to it,
therefore assigning a value of zero or one.
One of the main aims of our project was to build a test that provided
immediate feedback, in order to help students improve as they went along
and provide understanding if they made any mistakes. Therefore, following
every question there was a separate page with a full step by step solution to
show how it should have been approached.
Another goal of ours was to randomise all of our questions. This
involved randomising any values that were used within the questions, so that
although the approach and the formula were the same, the question values
and answers would be different every time the page was opened. We chose to
do this to prevent people from cheating if sitting the test with other people.
Also, it enabled us to see more accurately if people’s performance improved,
in case they sat the test more than once. The process of randomisation made
creating false multiple choice answers and providing feedback more complex.
Multiple choice false answers were created using formulas covering the com-
mon mistakes, the values used within that had to be fetched and carried
through to the PHP page that submitted scores. The same page also pro-
vided the feedback, so values were carried to the formula explanation.
Another one of our initial aims was to make the test adaptive, so
that the next question depended on whether or not you had got the previous
one correct. The purpose of this was to enable people to reach an under-
standing of a topic before moving to a more difficult question. The team
began looking into various methods that would allow us to create banks of
questions with varying difficulties. However, when our statistics team con-
sulted with our statistical advisor, he advised us that this would be far too
hard to model, as we would have many different categories within our vari-
ables. Without the model we would then be unable to analyse our statistics
well and gain any consequential evidence from them that we could compare
to our research. Also with our insufficient programming skills, this would
have taken far too long to complete within the time frame. Being such an
30
unrealistic target we decided to exclude it, allowing us to concentrate more
on our other objectives.
Despite time constraints and possibly insufficient skills, a test capa-
ble of gathering the required data was developed within the timescale. The
next step was making our test live for participants to sit. We looked into some
different ways of doing this, but settled for uploading it using our university’s
servers. This meant that anyone with the web link would be able to access
and sit our test, giving us the most opportunity for people to sit it. One
other option we explored was to pay for an online server but this would have
been costlier, which was unnecessary when we had free resources. Another
option was to use our university college intranet servers, but this would have
limited respondents as our test would then only be accessible for CEMPS
(College of Engineering, Mathematics and Physical Sciences) students.
2.3.3 Test distribution
To ensure that we achieved statistically significant data analysis,
our statistics subgroup required a minimum of 40 responses to the test. We
were aware that we had a short amount of time available to distribute our
tests and that there were many potential difficulties associated with getting
enough participants. As a result, we made a very concerted team effort to
distribute the test widely, and as quickly as possible. We did this using a
variety of social platforms such as Facebook and Whatsapp in order to raise
awareness about the project, and to provide a web link for people to take
our test. A leaflet was also created to inform people about our test and the
benefits it could provide, which we distributed on campus to encourage a
wider spread of participants in terms of demographic such as degree type
and age (see Figure 7).
31
Figure 7: A leaflet promoting our Numerical Assessment.
2.3.4 Test Analysis
The first task for the statistics team was to identify what type
of analysis we wanted to carry out on our test data. This needed to be
completed at an early point in the project so we could relay this to the
programming team. The relevant questions were then programmed into the
test. We went about this task by breaking down each of the research sections,
reading all the research findings, and then deciding the relevant statistics we
needed to look into.
1. Why is mathematics important? The Mathematics vs Numeracy De-
32
bate.
(a) Look at the correlation between test score and GCSE mathematics
performance, degree and time since studying maths to see if any
of these affects the score.
2. Why do employers test for numeracy skills?
(a) What was the average score? What was the range of scores?
(b) What was the standard deviation of scores? This can identify
whether numerical reasoning tests are able to differentiate between
people.
(c) What is the standard deviation in the score achieved by people
studying the same degree?
(d) Did anybody resit the test? Did they achieve a better score the
second time?
(e) What was the range, standard deviation and mean time taken to
complete the test?
3. Do different learners perform differently on numerical reasoning tests?
(a) Look at the correlation between score and type of learner.
(b) Break down the questions categorically into charts, tables and text
questions. Which type of question got the best score?
(c) Look at the correlation between the type of learner and the score.
Do some types of learners perform better than others?
4. How do people learn through computer-based assessments?
(a) Did people read the feedback? What was the average time taken
between the questions, on the feedback page? Plot the frequency
of time.
(b) Did people perform better on the multiple choice questions or the
manual input questions?
(c) Did people speed up as they took the test?
33
’Practical Regression and Anova using R’, [21] stated that regression analysis
is beneficial because firstly, predictions of future observations can be made.
Secondly, the relationship and effect of multiple variables can be assessed and
finally, a general understanding of structure of all the data analysed can be
gathered. Therefore, for all the statistics we required in each research topic,
it was necessary to make a regression model for the test scores. The same
article also identified that the steps taken in regression analysis are:
1. Identifying the distribution of the data.
2. Identifying the initial regression model.
3. Carrying out an initial assessment of the goodness of fit of the model.
This would be through hypothesis tests on the variables and numerous
diagnostic plots.
4. Using methods to identify the best model fit.
’Applied Regression Analysis’ [5] proposed using stepwise regres-
sion to achieve the ’best’ regression fit. This is because working with more
variables than necessary is avoided whilst still improving the regression fit.
Stepwise regression starts with a regression model with one variable. It sub-
sequently adds and removes various variables until the largest coefficient of
determinant is achieved. Hence, the model with the largest significance is
identified. After this best regression is found, we will be able to identify
which variables have the most significant effect on test scores. This is vital
for answering our four research topics. We will also be able to make pre-
dictions on future scores, such as what score would a ’visual learning girl,
studying law, with a grade B in GSCE mathematics and who hasn’t studied
mathematics since GCSE’ achieve?
It was realised that we would need to collect as many responses
from our test as possible. We posed a question to ourselves, ’How many peo-
ple need to take our test in order for the results to be significant?’. Having
spoken to Dr. Ben Youngman, a University of Exeter Statistic Lecturer, we
agreed we cannot create a ’minimum number’ and that the distribution of
the scores will depend on the scores of those who take the test. It was clear
that as little as four scores would be insufficient to create strong arguments
from our findings so, as a group we made a personal aim to get at least 60
entries.
34
We decided to use R-statistic to run all of our statistic analysis.
R-statistic is a leading tool for statistics and data analysis. It very efficiently
performs the type analysis that we required, such as producing correlation
matrices and modelling data. R-statistic also easily integrates with other
packages such as Microsoft Excel. Hence, making it easy for us to export
our MySQL database, containing all the test data into an Microsoft Excel
spreadsheet and perform our analysis using R-statistic from that. Output
in R-statistic is presented in a very clear way that is easy to interpret. Our
final reason for using R-statistic was that everyone in the statistics team had
used it before, making us very familiar with the built in functions and pro-
gramming language. Additional reading in ’Practical Regression and ANOVA
using R’ [21] was also used to refresh and improve our R-statistic knowledge.
Figure 8: Above is an example of R-statistic code.
2.4 Report Feedback
As a group, we recognised the importance of getting external feed-
back on our report. Our project’s main aim was not just to create a test, but
also to see how our findings related to literature and to observe their poten-
tial impact on future students. Receiving opinions on our results would give
us a more comprehensive view of our work and would enable us to perform a
35
more thorough and independent evaluation. We decided to contact experts
via email as we thought this would be the most efficient form of communi-
cation.
Our first thought was to seek a statistician - we needed someone to
evaluate our model and give feedback on our findings. We met with the same
person who had advised us earlier on in our project, Dr Ben Youngman. We
hoped that he would be able to advise us on anything we may have missed.
We also sent our report to Rowanna Smith, the lead Careers’ Con-
sultant for the College of Engineering, Mathematics and Physical Sciences,
based in the Career Zone at the University of Exeter. We wanted to find
out whether, based on our findings, the university would consider using a
similar test as a resource made available to students in preparation for job
application tests. We also wanted to find whether our results were significant
enough for the University Career Zone to consider a change in the advice
they currently offer students with regards to preparing for these kinds of as-
sessments.
Our final point of call was SHL, a provider of numerical reasoning
tests. We wondered if they would consider changing their test writing meth-
ods based on our own assessment and its findings; for instance inclusion of
feedback. We also questioned whether they would consider taking into ac-
count different learner types by adapting their tests to suit a wider range of
people and their learning habits.
2.4.1 Skill development - Graduate Skills
The project we undertook led us to develop a variety of skills as
well as gain new ones. As the project involved a very tight time frame, a
large amount of time management and task delegation had to take place to
ensure all the different sections of the project came together effectively and
on time. To enable this to happen the project was broken down into separate
sections, which helped us stay on track. These enhanced skills will prove very
useful in later life, as many graduate roles will require efficient management
of many different tasks, most likely with tight deadlines. Not only did we
have to manage our time well by setting realistic targets, but we also had
to adapt to changes and challenges that occurred along the way. Over the
course of the project, this enabled group members to become more flexible,
something required in all future aspects of life.
Working in a team has been an essential part of this project, without
36
which our outcome would have been completely unattainable. The ability to
work in a team is an invaluable skill for later life and prepares us for situations
both in and out of the workplace. The ability to communicate effectively with
the other members was crucial in enabling the team to stay on track and be
transparent so that we could be aware of any potential problems. As a grad-
uate, this is vital in order to be able to be part of a working society. Another
skill acquired during this project was the ability to research quantitatively
and qualitatively as well as to disseminate information and synthesise oth-
ers’ ideas. This process was approached in different ways, including a vast
amount of reading and contacting both employers and academic members
of staff, resulting in a well-rounded background for the report. Research
skills are essential to many roles, either directly for graduates in technical
roles, or indirectly as transferable skills by improving general analytical and
summarising abilities. Designing the test to collect our data developed the
team’s problem solving skills as we had to explore several ways to achieve
our programming criteria. It also gave us all a basic understanding of one
of the most popular scripting languages on the web, an invaluable skill to
many employers. The team has also acquired skills in data collection and
statistical analysis in order to understand and present the project’s findings,
something that many employers look for and value highly.
A large aspect of our project involved presentation, both as small
progress reports and as a final summary of our report. Through this, all
group members had a chance to present their work to an audience, gaining
beneficial speaking and performance practice, something we get very little
chance to do due to the nature of our degree. This enables people to gain
the vital social skills that employers hold in high regard and make up a large
component of job applications.
37
3 Findings
3.1 Survey
When it came to collecting survey results, it was reasonably simple
to analyse our data. Due to having created the survey in Google Forms, we
could monitor responses as they came in. Google Forms also produced some
basic statistical representation for us, so immediately we had an overview of
the key information. Overall, we gained 79 responses which was much higher
than our aim of a minimum of 40 respondents. In terms of demographics,
we noticed that we had a higher number of female participants with over
70% being women. Also, almost 70% of our respondents were in their third
year of university and so dominated our responses (see Figure 9). This was
likely due to the fact that our own group was made up of a group of third
year students, who were predominantly female. However, due to the nature
of our survey and the questions asked we did not feel that this would cause
any issue. Especially considering the fact that third year students are the
most likely to have come into contact with employability tests, and should
also have a good idea of how they learn best at this stage in their education.
Figure 9: Pie Chart of Gender and Year of Study of participants in the
Survey.
38
Figure 10: Bar Charts of responses in two survey questions.
The first set of questions in the survey gave us information on the
different ways in which people like to learn and to be tested. The survey
worked in two ways. Firstly, it acted as preliminary data for our research,
through gathering more information and current opinions on online tests.
We planned to compare it to our test findings later on in the process, in
order to make comparisons. Secondly, the survey provided new data for
us to compare with what the group had already learned from the research
carried out. We found that the majority of people preferred multiple choice
questions on online assessments, concurring with our research findings that
this is a popular, commonly used method. It is worth noting that since
possible answers are always provided, it means that these questions do not
require as much original thought on the part of the student. It also means
that students already have a percentage probability of selecting the correct
39
answer, in our case on average 20%, something that may influence people’s
preference for this style, based on perceived comparative ease. The fact that
this style was preferred was passed along to the subgroup tasked with writing
the online assessment questions, so that this could be taken into account.
It was also seen that people feel they benefit significantly from feedback.
This matches the opinion we found when conducting research, based on a
Plymouth study [36]. This suggests that not only do people want feedback,
but that a student’s results can improve significantly as a result of it. This
confirmed our decision to include feedback as a major component of our own
online test, to ensure that people would be able to learn from their mistakes
in previous questions.
In terms of Mathematics vs. Numeracy, there was a mixture of
results. There were originally mixed opinions from people when asked if they
believed their mathematical skills had deteriorated since they had stopped
studying mathematics, with the majority of people taking a neutral stance
(see Figure 11). The second largest response was ’slightly agree with the
statement’, implying that slightly more people may feel this to be true. This
may be slightly skewed, as people still currently studying mathematics are
likely to strongly disagree that their abilities have deteriorated, due to the
fact they are still using them. This surpassed the purpose of the question
- to investigate people who have stopped studying maths, and consequently
do not use it as often. This may have been the reason for the large spike
in people strongly disagreeing with the statement, which made it harder to
analyse how people perceived their maths skills, as many of the results shown
were not relevant.
Figure 11: Bar Chart of responses on deterioration of mathematical skills.
40
Figure 12: Bar Chart of responses on deterioration of mathematical skills,
excluding mathematics students.
To combat this problem, we decided to exclude mathematicians
from our data and to repeat our statistics (see Figure 12). This ensured that
all respondents tested had all finished studying mathematics and so we could
give a full representation of deterioration of mathematics skills. From our new
calculations, we then produced a graph similar to our expectations, showing
that most people felt that their skills had somewhat deteriorated since they
had last used mathematics. This clearly agreed with our research, which
showed a strong difference between people who currently study mathematics
and those that had stopped. We could compare this with the same effect that
results from being unemployed showed us in our research. It was also similar
to the study on nurses [20], who performed worse on a similar test after
a gap year. However, our data consisted more of qualitative opinions than
quantitative results. This slight difference meant that we could not draw any
solid conclusions from comparing the two, but could, however, take note of
the strong similarities. One limitation of our data may have come from the
differing opinions on when participants classed themselves as having stopped
studying maths. Some students who study more scientific or quantitative
degrees may regard themselves as still using mathematics in their degree,
given that they use it regularly in their university work. While others will
claim not to study mathematics any more, since the subject itself is not
41
contained in their degree title. Despite this, we felt the discrepancy did not
impact our results too heavily, as such students would still have been likely to
be of the same opinion when it came to rating their mathematical ability, and
so we could still assess the difference. Another slight limitation in comparing
our data with literature was that in some similar studies, those tested had
been out of any form of study or work at the time, whereas all the students
in our survey are all still in academia. This would definitely have affected
the extent to which they felt their mathematics skills had deteriorated over
time, possibly making our results less pronounced than they otherwise would
have been.
In addition, our survey showed us that 67.5% of people (see Figure
13) believed numeracy and mathematics to be different things, which agreed
with much of our research regarding the Mathematics Vs. Numeracy debate.
This shows that the general consensus is that they are different disciplines
and require different skills, even if they technically overlap by definition. It
would have been beneficial to know why the students thought this, and if
they agreed with our research findings on potentially teaching them as two
separate subjects. However, due to the design of our survey we were limited
to a few set answers and so it is difficult to say how consequential these
results are. We attempted to overcome any potential gaps in a participant’s
knowledge by giving official definitions of both words, allowing them to make
a well informed decision, which may have helped to mitigate some of this
problem.
Figure 13: Pie Chart representing the opinion of participants on Mathematics
Vs. Numeracy.
42
Figure 14: Pie Chart representing how participants feel they learn best.
43
Since another large section of our research involved different kinds
of learners, we included questions on this in our survey. Our research showed
several of the learner types but we only choose to include the main three we
had focused on in the survey. The team found that the majority of partici-
pants fell into a set category, with less than 4% being unsure (see Figure 14).
The smallest proportion was of those who believed themselves to be audi-
tory learners; however, this was still over a fifth of respondents. The largest
section was of the visual learners with 41.8% of people placing themselves in
this category. We mitigated the risk of people not being aware of different
types of learning or what category they may fall into by getting people to
say which description fitted them best, instead of them picking from a list
of unfamiliar definitions. However, there was still scope for people to have
misunderstood and therefore picked a category despite not being sure, which
may limit the reliability of our data. Having said this, our research showed
that most people are a combination of these different learning techniques, so
some cross over was always expected. In terms of how the different learner
categories work, we believed that visual learners were likely to perform better
for our chosen type of online numerical reasoning test, leaving the others at a
disadvantage. When asked in our survey whether participants believed these
online tests cater for different learners, almost a third of them responded
negatively (see Figure 15). This helps to back up our research and hypoth-
esis by showing that many people do not feel that their learning abilities
are catered for. There is always the possibility of this proportion being over
estimated by people who do not perform well in these tests in general or feel
they should have performed better regardless of what type of learner they
are. Nevertheless, as we still have a strong majority this should not have had
a significant effect, and thus our data still shows that a significant amount
of people feel that they are not examined effectively in online tests. We were
able to test this further in the results from our own numerical reasoning test.
44
Figure 15: Pie Chart representing the opinion of participants on whether
Computer-based Assessments cater for different types of learners.
3.2 Test
Our numerical assessment consisted of 20 questions split into three
difficulty levels; KS2, KS3 and GCSE. The average mark achieved was 15.28.
From Figure 16, it can be seen that the majority of participants scored highly,
with over 50% achieving a score greater than or equal to 15. Figure 17
supports this, showing an interquartile range of 6 from a score of 13 to a score
of 19. The interquartile range shows a strong concentration of high scores.
There is a negative skewness in the results. The highest score achieved was
20, showing the ability to score full marks, whereas the lowest score achieved
was 5.
45
Figure 16: Histogram of Total Score.
Figure 17: Boxplot of Total Score.
46
Figure 18 supports the negative skewness of scores. There is an
overall bell shape, showing a normal distribution. The light deviation of the
peak to the right shows the negative skew.
Figure 18: Density Plot of our model.
To further analyse our data, we will break down the statistics into
the four research topics previously mentioned.
3.2.1 The Maths Vs. Numeracy Debate. Why is mathematics
important?
The initial hypothesis was that a participant’s score would deterio-
rate as the number of years since studying mathematics increased. Surpris-
ingly, Figure 19 shows no correlation between score and years since studying
mathematics as the line of best fit is a straight horizontal line about the
47
mean score. However, the correlation coefficient is −0.21, showing a small
negative correlation between the variables.
Figure 19: Scatter plot showing the total years since studying mathematics
vs the total score.
Furthermore, from our research into Numeracy vs Mathematics, it
is implied that numerical reasoning assessments do not test the skills which
participants learn at GCSE-level maths. Therefore, years since studying
mathematics has little effect on the score achieved. Our findings support
this argument. However, due to the fact that the average age of participants
in our numerical reasoning assessment was 20.24 and the average number of
years since studying mathematics was 1.68, this does not reflect the whole
population.
The correlation between GCSE mathematics grade and score is
shown by Figure 20. It shows that a higher grade achieved at GCSE resulted
in a higher score in our numerical reasoning test. The mean score achieved
by a participant with grade B at GCSE was lower than the mean score for
an A or A* candidate. The lowest score achieved by an A* grade participant
48
is higher than the lower quartile of A and B grade participants. The highest
score achieved by any B grade participant is lower than the average score
of an A* grade participant. From these findings, we can see that a strong
mathematical background can result in a significantly higher numerical rea-
soning test score. As the number of years since studying mathematics has
little correlation with the score achieved, this shows that mathematics GCSE
grade and actual mathematical ability affect a participant’s score more. This
is again supported by Figure 21, which shows participants studying a math-
ematical degree. It is assumed that these students have strong mathematical
abilities, and that this is the reason they achieved higher scores. We cat-
egorised’ ’mathematical degrees’ as Economics, Business, Medicine, Math-
ematics and Science. The lowest mean score was for participants studying
Humanity degrees. Interestingly, those studying a non-mathematical science
(such as Biology) scored higher on average than those studying a mathemat-
ical science. However, Figure 21 shows that these results are actually very
close. Therefore, we can interpret from this that all sciences require some
mathematical skills.
Figure 20: Boxplot of Test Scores and GCSE Mathematics Grade.
49
Figure 21: Boxplot of Test Scores and Degree.
3.2.2 Why do employers use numerical reasoning testing?
As stated above, the average score achieved was 15.28. The stan-
dard deviation of score results was 4.12. Standard deviation measures the
degree of spread of score results. Initial research into why employers use
numerical reasoning assessments showed that these tests filter out applicants
and help to differentiate between candidates with very similar applications.
As our lower quartile is 13, 75% of participants achieved a grade
higher than 13. If an employer had a filter that cut out candidates that
achieved a grade lower than 13, 25% of our participants would not have
passed the test. This shows that numerical reasoning tests can be a useful
tool to quickly remove weaker candidates from an application process.
The standard deviation of 4.12 indicates a large spread in scores.
This makes it a useful tool to differentiate between candidates, as score results
are varied and spread out over a wider range of values. Not all participants
will achieve similar scores. If everyone scored 15, they would all have to
complete further assessments to gage which was the best applicant. Having
varied scores reduces this problem.
50
Figure 21 shows that the majority of interquartile ranges of the dif-
ferent degree types are large. We see that applicants with similar degrees,
where one would expect similar mathematical ability, still have a varied range
of score results. This is useful for employers as they can use numerical rea-
soning assessments to differentiate between applicants with the same degree
title.
Initially, we wanted to look into whether people had repeated the
test to see if their score improved. This is because our research and survey
findings showed that feedback and practice of numerical tests should improve
your score.
The mean time to take the test was 19.47 minutes. This means on
average people took 57.81 seconds on each question. This justifies the reason
why employers enforce tight time limits on numerical reasoning assessments
(commonly a minute or less per question). This isn’t necessarily a method
to filter out participants, but as we can see from our timings, it means appli-
cants are put under pressure when completing the numerical reasoning test.
Employers are keen to find out if a potential employee can work under pres-
sure and in a set time frame. The level of difficulty of the numerical test can
also be adjusted by changing the time limit. If our numerical test had a time
limit of 15 minutes, less than 50% of participants would have been able to
finish the test. From initial research we found that numerical reasoning tests
are often used even in applications where numerical skills may not actually
be necessary. From our survey we found that 37.2% of people believed it
was unfair to be numerically assessed in their career job applications and
felt they were at a disadvantage to others because they were not ’good at
maths’ and ’had not studied it in a long time’. However, from our findings,
we can say that employers can increase the time limit on tests, for example
in our test to over 35 minutes, so that every participant is able to complete
the test in their own time and not miss questions because their time ran out.
This is concluded from the fact that box plot and whiskers in Figure 22, is
completely below 35 minutes, and only outlier times are above.
51
Figure 22: Boxplot of Time Taken to complete the Test.
3.2.3 Do different learners perform better on numerical reasoning
tests?
Figure 23 shows that visual learners on average achieved a higher
score than auditory or kinaesthetic learners. Visual learners taking our test
had the highest average and smallest range of scores. Literature research
done at the beginning of our project, along with our initial survey findings
suggests that numerical reasoning assessments used by employers online are
not catered to auditory or kinaesthetic learners, with 64.1% of people who
took our survey agreeing. The assessment being online, limits the ability
to make a numerical reasoning test practical and active to suit kinaesthetic
learners. Audio numerical reasoning tests are available, however are uncom-
mon and usually only used for participants in special circumstances (such as
visual impairments).
52
Figure 23: Boxplot of Test Scores and Learner Type.
Generally, people performed better in questions involving a visual
aspect, such as a chart or graph. The average pass rate on these questions
was 81.7%, whereas for text questions it was slightly lower, at 68.6%. This
may be because the image or table breaks down the information making it
easier for all learners to digest the figures, whereas paragraphs of text and
figures cater more towards visual learners.
3.2.4 How do people learn through computer-based assessments?
From our results, we can determine that the majority of participants
neglected to read the feedback provided. The average time taken on the
first four questions, were 6, 5, 9 and 5 seconds respectively. This is not
enough time to read, understand and learn from the feedback. Research
proposed that reading feedback improved score result, for example Rob Lowry
in ’Computer aided assessments - an effective tool’ [36]. Our initial survey,
Figure 24, also shows that 89.8% of people thought feedback would be a
useful tool in an online test. However, as our numerical reasoning assessment
was put forward as a ’test’ rather than a casual learning resource, people’s
priority could have been to finish the test rather than learn from it.
53
Figure 24: Bar chart of opinion on feedback from the survey.
If every multiple choice question was guessed, a participant would
have a 20% chance of getting each one correct, and hence we can statistically
approximate that they would receive 20% as their overall score. Therefore,
if a participant guessed all their multiple choice questions statistically they
would have achieved 2.4/12 on average on these questions. So the pass rate
when guessing multiple choice questions is 20% on average. Our results show
a pass rate of 81% on multiple choice questions. This is significantly higher
than 20%, suggesting that few (if any) candidates guessed all their results.
The average time taken and average pass rate for multiple choice
questions is 50 seconds and 81% respectively. For fill in the blank questions
the average time taken was 53 seconds and the average pass rate was 69.5%.
We can evaluate this to show that multiple choice questions are easier and
that a candidate has a stronger chance of a scoring higher. Simply put, if
their answer is not a multiple choice option, then they know it is wrong.
In addition, if their answer is similar to an option available in the multiple
choices, a participant can select this option and still have a change of getting
it correct. This is not possible to do in a ’fill in the blank’ type question.
This is supported by our survey, where 42.2% of people preferred multiple
choice questions out of 8 different methods.
Figure 25 shows the average time taken to complete each question
in our numerical reasoning assessment had no trend, as the line graph has no
pattern and looks random. If people learnt from the feedback provided, we
would expect time taken for each question to reduce as their understanding
of the questions asked increased. It became apparent that the feedback we
provided was not used, so we cannot support our initial thought. In addition,
the incorporation of three difficulty levels; KS2, KS3 and GCSE, could have
54
counterbalanced the decrease in time taken, as the questions should have
been getting more challenging.
Figure 25: Line graph of average time taken.
Furthermore, we looked at the average pass rate of the questions
in each level of difficulty category. We divided our test into 3 categories;
KS2, KS3 and GCSE. Figure 26 highlights that the average pass rate fell as
the level of difficulty increased from KS2 to KS3. The average pass rate for
KS2 level was 87.3%, whereas the pass rate for KS3 was 72.0%. The average
pass rate was consistent from KS3 to GCSE level, both being at 72.0%. Our
research supports the idea that employers can use numerical reasoning tests
of different difficulty level to cater for allowing varying numbers of applicants
through to the next stage of the application process. Participants taking a
KS2 level numerical reasoning test would achieve a higher grade than those
taking a GCSE or KS3 level numerical reasoning test.
55
Figure 26: Bar chart of question category and average pass rate on the
questions in that section.
3.2.5 Regression Modelling
The density plot in Figure 18 supports the hypothesis that score re-
sults follow a normal distribution (as previously stated this can be concluded
from the bell shaped figure). The first multiple linear regression model fitted,
involved the following variables: degree, years since studying mathematicss,
GCSE mathematics grade and type of learner. For research into our four
topic questions, we need to evaluate the effect all these variables have on the
overall score of the participant. The full summary of our regression model
used can be viewed in the appendix. As variables, degree, GCSE mathe-
matics grade and type of learner are categorical they are interpreted in R as
factors with levels.
The regression formula for this model is: Y = 19.084 − 0.501X1 −
3.266X2−2.254X3−0.291X4−0.006X5+3.004X6−3.817X7−2.486X8−
1.093X9 + 0.170W − 4.178Z1 − 3.060Z2 − 2.088K1 − 0.247K2.
Where Y is the test score. By using factors we limit the aux-
iliary variables X1, X2, X3, X4, X5, X6, X7, X8, X9, Z1, Z2, K1, K2 to bi-
nary (0,1). The X variables relate to degree, the W variable relates to years
since studying mathematics, the Z variables relate to GCSE mathematic
grade and the K variables relate to the type of learner.
The p-value for all the variables are: Degree= 0.061498, Years since
studying maths= 0.787063, GCSE mathematics grade= 0.007201 and Type
56
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final
final

Contenu connexe

Tendances

Professional Learning Communities for Teacher Development: The Collaborative ...
Professional Learning Communities for Teacher Development: The Collaborative ...Professional Learning Communities for Teacher Development: The Collaborative ...
Professional Learning Communities for Teacher Development: The Collaborative ...Saide OER Africa
 
Autonomous Linke Following buggy
Autonomous Linke Following buggyAutonomous Linke Following buggy
Autonomous Linke Following buggyVishisht Tiwari
 
A Research Base Project Report on A study on physical activity recognition fr...
A Research Base Project Report on A study on physical activity recognition fr...A Research Base Project Report on A study on physical activity recognition fr...
A Research Base Project Report on A study on physical activity recognition fr...Diponkor Bala
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
 

Tendances (7)

Professional Learning Communities for Teacher Development: The Collaborative ...
Professional Learning Communities for Teacher Development: The Collaborative ...Professional Learning Communities for Teacher Development: The Collaborative ...
Professional Learning Communities for Teacher Development: The Collaborative ...
 
Final Report
Final ReportFinal Report
Final Report
 
Autonomous Linke Following buggy
Autonomous Linke Following buggyAutonomous Linke Following buggy
Autonomous Linke Following buggy
 
TCSion PrepTest - Iqureka.com
TCSion PrepTest - Iqureka.comTCSion PrepTest - Iqureka.com
TCSion PrepTest - Iqureka.com
 
A Research Base Project Report on A study on physical activity recognition fr...
A Research Base Project Report on A study on physical activity recognition fr...A Research Base Project Report on A study on physical activity recognition fr...
A Research Base Project Report on A study on physical activity recognition fr...
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 
Slr kitchenham
Slr kitchenhamSlr kitchenham
Slr kitchenham
 

En vedette

Recomendaciones para elaborar presentaciones
Recomendaciones para elaborar presentacionesRecomendaciones para elaborar presentaciones
Recomendaciones para elaborar presentacionesBeatriz Pérez Escamilla
 
A Dive into the Mystic
A Dive into the MysticA Dive into the Mystic
A Dive into the Mysticivan bautista
 
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...Sean Bawden
 
Fashion diffusion and the role of media
Fashion diffusion and the role of mediaFashion diffusion and the role of media
Fashion diffusion and the role of mediajekyllisa
 
Mydoctool presention MeetUp #PHIM14
Mydoctool presention MeetUp #PHIM14Mydoctool presention MeetUp #PHIM14
Mydoctool presention MeetUp #PHIM14iPocrate
 
Cybercafe Case Studies
Cybercafe Case StudiesCybercafe Case Studies
Cybercafe Case StudiesKhaled Almusa
 

En vedette (12)

Kevin
KevinKevin
Kevin
 
Thirst
ThirstThirst
Thirst
 
Recomendaciones para elaborar presentaciones
Recomendaciones para elaborar presentacionesRecomendaciones para elaborar presentaciones
Recomendaciones para elaborar presentaciones
 
Naruto vol 01 cap 05
Naruto vol 01 cap 05Naruto vol 01 cap 05
Naruto vol 01 cap 05
 
Sample
SampleSample
Sample
 
A Dive into the Mystic
A Dive into the MysticA Dive into the Mystic
A Dive into the Mystic
 
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...
Decision to Breastfeed a “Personal Choice”, which Need Not be Accommodated: F...
 
Fashion diffusion and the role of media
Fashion diffusion and the role of mediaFashion diffusion and the role of media
Fashion diffusion and the role of media
 
Doencas trabalhadores
Doencas trabalhadoresDoencas trabalhadores
Doencas trabalhadores
 
Mydoctool presention MeetUp #PHIM14
Mydoctool presention MeetUp #PHIM14Mydoctool presention MeetUp #PHIM14
Mydoctool presention MeetUp #PHIM14
 
Cybercafe Case Studies
Cybercafe Case StudiesCybercafe Case Studies
Cybercafe Case Studies
 
Tecnicas de analise de risco ruido
Tecnicas de analise de risco   ruidoTecnicas de analise de risco   ruido
Tecnicas de analise de risco ruido
 

Similaire à final

A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...
A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...
A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...Joaquin Hamad
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability EvaluationJPC Hanson
 
E.Leute: Learning the impact of Learning Analytics with an authentic dataset
E.Leute: Learning the impact of Learning Analytics with an authentic datasetE.Leute: Learning the impact of Learning Analytics with an authentic dataset
E.Leute: Learning the impact of Learning Analytics with an authentic datasetHendrik Drachsler
 
Computational thinking v0.1_13-oct-2020
Computational thinking v0.1_13-oct-2020Computational thinking v0.1_13-oct-2020
Computational thinking v0.1_13-oct-2020Gora Buzz
 
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...DistPub India
 
Undergrad Thesis | Information Science and Engineering
Undergrad Thesis | Information Science and EngineeringUndergrad Thesis | Information Science and Engineering
Undergrad Thesis | Information Science and EngineeringPriyanka Pandit
 
Al-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectAl-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectLeila Al-Mqbali
 
Thesis Nha-Lan Nguyen - SOA
Thesis Nha-Lan Nguyen - SOAThesis Nha-Lan Nguyen - SOA
Thesis Nha-Lan Nguyen - SOANha-Lan Nguyen
 
Capstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Capstone Report - Industrial Attachment Program (IAP) Evaluation PortalCapstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Capstone Report - Industrial Attachment Program (IAP) Evaluation PortalAkshit Arora
 
Evidence report-35-role-of-career-adaptability
Evidence report-35-role-of-career-adaptabilityEvidence report-35-role-of-career-adaptability
Evidence report-35-role-of-career-adaptabilityDeirdre Hughes
 
25Quick Formative Assessments
25Quick Formative Assessments25Quick Formative Assessments
25Quick Formative AssessmentsVicki Cristol
 
Approaches and implications of eLearning Adoption in Relation to Academic Sta...
Approaches and implications of eLearning Adoption in Relation to Academic Sta...Approaches and implications of eLearning Adoption in Relation to Academic Sta...
Approaches and implications of eLearning Adoption in Relation to Academic Sta...Nancy Ideker
 
Individual Project
Individual ProjectIndividual Project
Individual Projectudara65
 

Similaire à final (20)

A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...
A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...
A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Suc...
 
gusdazjo_thesis
gusdazjo_thesisgusdazjo_thesis
gusdazjo_thesis
 
report
reportreport
report
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability Evaluation
 
Aregay_Msc_EEMCS
Aregay_Msc_EEMCSAregay_Msc_EEMCS
Aregay_Msc_EEMCS
 
E.Leute: Learning the impact of Learning Analytics with an authentic dataset
E.Leute: Learning the impact of Learning Analytics with an authentic datasetE.Leute: Learning the impact of Learning Analytics with an authentic dataset
E.Leute: Learning the impact of Learning Analytics with an authentic dataset
 
Computational thinking v0.1_13-oct-2020
Computational thinking v0.1_13-oct-2020Computational thinking v0.1_13-oct-2020
Computational thinking v0.1_13-oct-2020
 
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...
NMIMS Approved 2023 Project Sample - Performance Appraisal Method at UnitedHe...
 
Undergrad Thesis | Information Science and Engineering
Undergrad Thesis | Information Science and EngineeringUndergrad Thesis | Information Science and Engineering
Undergrad Thesis | Information Science and Engineering
 
final_3
final_3final_3
final_3
 
Al-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectAl-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research Project
 
Thesis Nha-Lan Nguyen - SOA
Thesis Nha-Lan Nguyen - SOAThesis Nha-Lan Nguyen - SOA
Thesis Nha-Lan Nguyen - SOA
 
Vekony & Korneliussen (2016)
Vekony & Korneliussen (2016)Vekony & Korneliussen (2016)
Vekony & Korneliussen (2016)
 
Capstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Capstone Report - Industrial Attachment Program (IAP) Evaluation PortalCapstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Capstone Report - Industrial Attachment Program (IAP) Evaluation Portal
 
Evidence report-35-role-of-career-adaptability
Evidence report-35-role-of-career-adaptabilityEvidence report-35-role-of-career-adaptability
Evidence report-35-role-of-career-adaptability
 
25Quick Formative Assessments
25Quick Formative Assessments25Quick Formative Assessments
25Quick Formative Assessments
 
ECE_OBE_BOOKLET_UG20_REGULATION.pdf
ECE_OBE_BOOKLET_UG20_REGULATION.pdfECE_OBE_BOOKLET_UG20_REGULATION.pdf
ECE_OBE_BOOKLET_UG20_REGULATION.pdf
 
Approaches and implications of eLearning Adoption in Relation to Academic Sta...
Approaches and implications of eLearning Adoption in Relation to Academic Sta...Approaches and implications of eLearning Adoption in Relation to Academic Sta...
Approaches and implications of eLearning Adoption in Relation to Academic Sta...
 
Individual Project
Individual ProjectIndividual Project
Individual Project
 
Student db-mgmt-system
Student db-mgmt-systemStudent db-mgmt-system
Student db-mgmt-system
 

final

  • 1. University of Exeter College of Engineering, Mathematics and Physical Sciences ECM3735 Mathematics Group Project Computer Assessment - The Challenges and Potential Solutions Authors: Candidate Numbers 003440, 035429, 006702, 000997, 019169, 008339, 011812, 006667. Advisor: Dr. Barrie COOPER
  • 2. College of Engineering, Mathematics and Physical Sciences Harrison Building Streatham Campus University of Exeter North Park Road Exeter UK EX4 4QF Tel: +44 (0)1392 723628 Fax: +44 (0)1392 217965 Email: emps@exeter.ac.uk December 7, 2015
  • 3. Abstract The purpose of this report is to explore the challenges and potential solutions of current computer-based assessments. With increasing numbers of applications for graduate jobs, there exists a growing pressure among applicants to succeed at online assessments set by employers. The vast number of applications received, compared to available positions, puts an even greater need for employers to de- velop effective and fair assessments. These can then identify the most appropriate candidates who are able to best demonstrate their abilities in numerical reasoning, which have been shown to be a reliable predic- tor of job performance. In our report we approach the four questions: How do people learn through computer-based assessment? Why is it important to study mathematics? The Numeracy Vs. Mathematics debate. Why do certain employers use numerical testing? Are certain types of learners better at numerical reasoning tests? By creating our own numerical reasoning test, we hoped to explore the factors that affect participant’s performance. The team carried out extensive sta- tistical analysis hoping to relate our findings back to our hypotheses. We found significant findings for all four of our proposed hypotheses. The overall findings of this report demonstrate that current numerical reasoning assessments and practice tests are potentially flawed. Our findings suggest that they fail to accommodate to all types of learners, and in most cases fail in providing comprehensive feedback. From our research and test findings we encourage companies and educational in- stitutions to take on board our recommendations, such as to improve both the feedback and preparation they offer to candidates.
  • 4. Contents 1 Introduction 3 1.1 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Preliminary Findings . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 The ’Mathematics Vs Numeracy’ debate. . . . . . . . . 5 1.2.2 Why is mathematics important? . . . . . . . . . . . . . 7 1.2.3 Do people forget mathematics skills as they get older? 8 1.2.4 Why do certain employers use numerical reasoning as- sessments? What skills do they think it will show? . . 9 1.2.5 How do people learn through computer-based assess- ment? What works and what does not? . . . . . . . . . 14 1.2.6 Will different types of learners (kinaesthetic, visual etc.) have different levels of numeracy? . . . . . . . . . 15 2 Methodology 16 2.1 Group Organisation . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 Meeting Times . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Communication . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Combatting Risk . . . . . . . . . . . . . . . . . . . . . 18 2.1.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Preliminary data collection . . . . . . . . . . . . . . . . 20 2.2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Test design, creation and analysis . . . . . . . . . . . . . . . . 23 2.3.1 Producing the Questions . . . . . . . . . . . . . . . . . 23 2.3.2 Programming the Test . . . . . . . . . . . . . . . . . . 25 2.3.3 Test distribution . . . . . . . . . . . . . . . . . . . . . 31 2.3.4 Test Analysis . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Report Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Skill development - Graduate Skills . . . . . . . . . . . 36 3 Findings 38 3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 1
  • 5. 3.2.1 The Maths Vs. Numeracy Debate. Why is mathemat- ics important? . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Why do employers use numerical reasoning testing? . . 50 3.2.3 Do different learners perform better on numerical rea- soning tests? . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.4 How do people learn through computer-based assess- ments? . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.5 Regression Modelling . . . . . . . . . . . . . . . . . . . 56 3.3 Feedback Findings . . . . . . . . . . . . . . . . . . . . . . . . 61 4 Conclusion 63 5 Evaluation 66 5.1 SWOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.4 Threats . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . 75 6 Bibliography 77 7 Appendix 81 2
  • 6. 1 Introduction In recent history our world has bared witness to some of the most revolutionary and exciting technological advances of all time. We live in an age where computers seem to hold a role in society on par with that of the basic necessities such as food or water. Our planet no longer revolves merely around the sun, but around all things computer related. These advancements have caused a great evolution in human society; we have gone from a very much physical world to a more paperless and virtual one. This is even true for assessments. Nowadays companies require candidates to be assessed via online tests as opposed to the traditional paper and pen exam. This report endeavoured to explore such computer-based assessments. In particular, we have looked at numeracy tests exploring both the challenges they face and the potential solutions. 1.1 Aims and Objectives 1.1.1 Aims Computer-based assessments are widely used by employers, govern- ment departments and educational organisations, the list nowadays is endless. The question is why? First and foremost, we have examined the aptitude of a candidate in a particular subject. Two common subjects that have been examined via online assessments are numeracy and literacy, which are two key skills for employability, the applied skills of the subjects - Mathematics and English respectively. There is a definite need for candidates to both learn and improve from these tests. Motivated by this fundamental necessity we have focused on creating our own form of computer-based assessment as part of our project. We have used this as a vehicle with which to answer several questions that target the way in which a person learns. This has been done by analysing data from a sample of people who had taken our test. We have carried out research around this subject area of computer-based assessments and have used our data to compare our findings with the current literature available and thus provided insight into online testing that can be useful to both universities and employers. 3
  • 7. 1.1.2 Objectives Our objectives were clear. Firstly, we have researched extensively into literature relating to education and current learning methods. This lit- erature has enabled us to understand the theory behind the ways in which people learn and improve. Our research on this was both general and specific to computer-based assessments. Furthermore, we have explored the types of learners that exist. Once these were clearly classified we were able to explore effective learning methods catered specifically to them. This acted as an im- portant step since the aim had been to create an assessment which provided effective and comprehensive feedback. It had been planned to develop a computerised numerical reasoning test with which we gathered the necessary data, in order to support or con- tradict the literature and current research hypotheses that exist within the community. We had planned to incorporate an adaptive feature into our as- sessment which has been crucial in equipping our assessment with the ability to tailor questions to an individual’s ability based on their performance in the previous questions. This has not only aided our statistical analysis but has also provided participants with the relevant practice and training that they require. It has also been planned to provide real time feedback which will allow participants to instantly identify where they went wrong and more importantly how they were able to correct this. The test was primarily a numerical reasoning test and so incorpo- rated numerical skills similar to those tested in graduate job applications. We therefore felt it would be beneficial to research the question “Why test mathematics?” These findings shed light on exactly why employers incorpo- rate such assessments into their application process and what they hope to discover through doing so. Age is a factor which uniquely defines a person. In our project, we have explored whether there is evidence to support the idea that mathemat- ics skills can deteriorate if you stop using them. Moreover, the idea that mathematics skills are not permanent and require regular revision in order to remain within one’s memory. This period of time could be in the years post GCSE or A Level. We have been able to examine whether people who study mathematics have a greater advantage compared to others. All of these findings have been compared against the current literature available and thus after analysis of the results of our test, we have been able to highlight any supporting or contradicting trends. 4
  • 8. There is a wide spread controversial debate, not only across the aca- demic community, but also across the world as to whether mathematics and numeracy are essentially the same thing. It poses the question of whether nu- meracy skills essentially rely on mathematical skills and vice versa, whether they are in fact completely separate disciplines. We felt this is a relevant area to explore since it there has been talk that the UK Government has plans to change the current mathematics GCSE by removing numeracy from mathematics and treating them as independent subjects as mentioned above. We hope to discover whether mathematics students actually have an advan- tage in numeracy tests, given that this is a skill that is not relevant to, nor practiced at degree level. We have modelled the data that we had collected from our test re- sults, with relevant graphs, and have further analysed it using appropriate statistical techniques. Modelling our data in appropriate mediums has en- abled us to efficiently compare our results to those found in the literature. Finally, we have aimed to assess how useful our findings are relative to our broader stakeholders. We have set out to measure to what extent, if any, we had been able to contribute to the current problem of learning using computer-based assessments. It has been an objective of our’s to both highlight the problems with what is currently available and possibly improve it, by providing solutions using our findings. We have planned to approach professionals and experts in this field with our results in order to get reliable feedback. 1.2 Preliminary Findings 1.2.1 The ’Mathematics Vs Numeracy’ debate. An ever greater need for both mathematical and numerical skills is constantly emerging. However, there is a big debate amongst society as to whether mathematics and numeracy should be considered to be the same thing, or whether numeracy should be a subject in its own right. There are current plans for GCSE Mathematics in the UK to split the subject into two separate, independent GCSE’s: Mathematics and Numeracy [60]. Mathematics is defined by Oxford Dictionary as “the abstract sci- ence of number, quantity and space, either as abstract concepts (pure math- ematics), or as applied to other disciplines such as physics and engineering (applied mathematics)” [18]. Moreover, numeracy is defined by Oxford Dic- 5
  • 9. tionary as “the ability to understand and work with numbers” [19]. It may be concluded from these two definitions that numeracy is a subset of mathe- matics. However, it can also be argued that numeracy is a subject in its own right and should be separated from mathematics as it is more applicable in society and workplace. Interestingly, a paper on Numeracy and Mathematics from the Uni- versity of Limerick, Ireland, contained no universally accepted definition of numeracy [42]. This is backed up by research from the University of Arizona which found that the difference between numeracy and elementary mathe- matics is analogous to the difference between quantitative literacy and math- ematical literacy. [29] More importantly, no universal definition for numeracy was agreed upon, although there was much overlap between current working definitions. The most important difference between the two forms of literacy is that quantitative literacy puts more emphasis on context, whilst mathe- matical literacy focuses on abstraction [29]. In a paper produced by the University of Stony Brook’s Applied Mathematics and Statistics Department, it is stated that all mathematics in- struction should be devoted to developing “deeper mastery” of core topics, through computation, problem-solving and logical reasoning - which is effec- tively what a numerical reasoning test examines. Simple proportion problems can be integrated into fraction calculations early on. In addition, the devel- opment of arithmetic skills in working with integers, fractions and decimals, should be matched with increasingly challenging applied problems, many in the context of measurement. Solving problems in different ways ought to be an important aspect of mathematical reasoning with both arithmetic and applied problems in order to ensure a sufficient level of numerical skills for further progression in society [56]. The Guardian Newspaper produced an article exploring a world- wide problem associated with the difference between mathematics in edu- cation and mathematics in the real world. The article states that all over the world we are mostly teaching the wrong type mathematics [28]. The Guardian then went on to describe how computers are primarily used for calculation purposes, despite the fact that we tend to train people for this use. This is true almost universally [28]. We are able to relate this to the context of the ’Mathematics Vs Numeracy’ debate, since generally, the math- ematics taught in education is too pure and distant from the real world, while on the whole the mathematics used in the everyday life is numeracy. Many companies require potential employees to sit a numeracy test 6
  • 10. before commencing employment, despite the fact they already hold nation- ally recognised qualifications in mathematics. An article from an electronic journal for leaders in education explores this. Findings show that although the term “numeracy” is not widely used across the world, there does ex- ist a strong consensus that all young people are required to become more competent and confident in using the mathematics they have been taught. Furthermore, numeracy is found to bridge the gap between school-learned mathematics and its applications in everyday life [1]. These findings sup- port companies in their efforts to use numerical reasoning testing as a way of seeing whether candidates can efficiently use their formal qualifications in a practical environment. A candidate may have achieved high results in their school exams, but this does not necessarily mean that they will be able to use the gained qualifications for practical problem solving which is recognised as the main use of mathematics [28]. An insufficient level of numeracy skills has been found to lead to unemployment, low wages and poor health furthermore highlighting the im- portance of numeracy [43]. The need for mathematics exists in all aspects of everyday life within the workplace and in other practical activities such as schools, hospitals, news and understanding any statistics [14]. 1.2.2 Why is mathematics important? The study of mathematics can lead to a variety of professional ca- reers such as research, engineering, finance, business and government services [14]. This is supported by the University of Arizona, Department of Mathe- matic’s who also added social sciences to the above fields [15]. It should be noted, however, that these careers are fundamental for the world’s economy. Therefore, it is important to ensure that people working within those fields have sufficient skills to ensure correct and efficient problem solving and pre- vent any detrimental consequences. Finally, it has been suggested that poor numeracy leads to depres- sion, low confidence and low self-esteem, leading to social, emotional and behavioural difficulties increasing social exclusion, truancy and crime rates [41]. With the digital age, 90% of new graduate jobs require a high level of digital skills which are built on numeracy. Although computers are able to reduce the need for human interaction in certain calculations, sufficient numeracy skills are required to enable efficient use [41]. 7
  • 11. 1.2.3 Do people forget mathematics skills as they get older? Research has found that a severe loss of both numeracy and lit- eracy skills often occur in adulthood, with 20% of adults experiencing dif- ficulties with basic skills needed to function in the modern day society [6] [20] [38]. Simple numerical calculations, such as percentages and powers, are found difficult despite being taught and tested to the government’s standard throughout education. The effect of being unemployed has been explored for both men and women and it has been found that numeracy skills get steadily worse the longer a person is without a job [6]. Interestingly, women experience a lesser effect than men due to their role in society being more diverse, hence requiring them to use their numeracy and literacy skills more frequently. It has also been found that the loss of skill largely depends on the starting level of knowledge and understanding, and that those who have poor skills to begin with experience a more severe deterioration. Furthermore, numeracy skills have a smaller presence in everyday life as more people find themselves reading than they do performing calculations. However, a decrease in liter- acy skills leads to an even further loss of numeracy skills as it increases the difficulty of understanding the posed question. Important findings have been made amongst a group of nursing stu- dents, who were asked to sit a numeracy test containing questions similar to those that they will have to answer as part of their future job [20]. The aver- age score was 56% with the most common types of errors being arithmetic. There is expected to be a significant difference in results between students with those who took a year out before commencing higher education. Those who started immediately score on average 70%, while those who didn’t aver- age only 47%. This shows that being in an environment that doesn’t require the use of numeracy skills, has a deteriorating effect, not only on the ability to perform simple calculations but also that of being able to extract relevant information to set up an equation. This means that even with the use of a calculator, these students are still likely to make mistakes. Students have also been found unable to identify errors in their work, even when the result found is unreasonable and unrealistic. Such results are potentially danger- ous, for example, nursing students must perform calculations such as drug dosage, which if incorrect will cost both the public in their suffering, and the employer in having to provide additional training. Due to the ever increasing importance of skills in the world of work, 8
  • 12. especially early on in a career, a lack of numerical competence has an un- desirable effect on employment of these individuals, which in turn affects their standard of living [6] [46] [38]. Such requirements are brought about by the recent changes to the labour market, with less semi- or unskilled manual jobs available due to technological developments [46]. Unskilled workers have difficulty in both gaining and retaining employment, and so are the first to suffer in the case of downsizing or a crisis [6]. A low level skill set also limits individuals to lower and middle range jobs (bottom 10% to 20%), preventing them from experiencing career growth, and leading to severe social exclusion [46] [6] [7]. This causes a downward cycle as low skill level is passed on from parents to children, therefore, accelerating unemployment through the gen- erations [7]. The government has recognised this problem and created a ’Skills for Life’ programme, which aims to provide basic skills to adults in order to help them gain employment [38]. Other solutions include on-the-job train- ing, or, as research suggests, we can even prevent such severe skill loss by ensuring pupils reach a certain skill level whilst still in education [6]. There are, of course, other factors which lead to a low level of nu- meracy skills, such as family background, learning environment and quality of education [6] [3]. However, in this report we concentrate on how the low level of demand for numeracy in everyday life affects a student’s performance in an online test. 1.2.4 Why do certain employers use numerical reasoning assess- ments? What skills do they think it will show? In addition to this, with a constantly changing and advancing busi- ness world, the way in which people are hired may be a natural result of shifts in the business environment and modern workforces. A number of studies mentioned in A. Jenkin’s, 2001 paper speculated that the increase in numerical tests is due to the greater professionalism of the human resource sector of many businesses, as well as the inclusion of standard selection pro- cedures in their business [33]. In the 21st century, Human Resources (HR) has evolved massively, and is now an integral part of most organisations [58]. All these factors may have led to the rise of assessment centres, due to a con- tinuous desire amongst companies to gain a professional edge. They do this by searching for alternatives to traditional methods of employment, much of which is done through HR. This greater reliance on HR as a business sector 9
  • 13. has led to the employment of much stronger recruitment methods, which (for reasons that will follow) enable them to meet legislation requirements, and promote a fair practice. In many work forces, it has been clear in recent years that employ- ability tests have been used for means other than just performance testing. It has enabled a platform that assesses based on merit rather than personal criteria reducing the impact of discriminatory practices [4]. Due to equal opportunity legislation in many countries, which is most commonly related to the differing proportions of ethnic groups hired, many employers could be vulnerable to prosecution [58]. These types of random psychometric tests can therefore be used as a way to reduce bias and discrimination [33]. One factor in explaining the increase in the use of these tests may therefore be as a prudential response to changes in hiring attitudes and legislature. On the other hand, the opposite has also been said - that companies need to keep legal compliance in mind when they use psychometric tests [12] so as not to offend candidates by using irrelevant tests. In addition, when using these tests, the role of bias has been explored as many psychologists and compa- nies note that testing is an intrinsically culturally biased procedure that can cause discrimination against ethnic minorities. This is as a result of cultural differences leading to consistently different answers across several different social groups. Although it can be noted that this applies more specifically to judgement and situational tests, and not to numerical and verbal listening tests that we are planning to test our research on [30]. The rise of these tests could also be attributed to the workplace’s lower regards for formal qualifications as a method of streaming candidates and predicting their future abilities [33]. This may be because young labour force entrants across the EU have much higher attainments than they previ- ously did, and hence it is harder to sort applicants out at the top end of the spectrum based on attainment than in the past. This may lead employers to screen applicants much more carefully [33]. Potentially this was caused by the previous decade of education being hailed as ’too easy’ [27], which caused achievements to be very high. Periods like this can have knock on effects on recruitment methods, as a reaction to these ’more qualified’ applicants fil- tering through the recruitment system and into the business environment. However, this may be subject to change given that recent education reforms claiming to ’toughen’ up the curriculum, have yet to see their full effect - particularly in terms of employment. Examples of a lack of belief in the edu- cation system can be seen by the movements of top employers. An example of 10
  • 14. this would be one of the ’big four’ professional services firms, Ernst & Young [26], who have recently changed their application rules so that educational attainments, such as degree class, are no longer taken into account. Instead they believe that their own in-house testing and assessment centres are a reliable enough indicator of whether candidates will succeed [26]. Another example of this was with the introduction of the Army’s own mathematics test for applicants. The reason for its development was the increasingly chal- lenging task of using GCSE mathematics results as a discriminator amongst applicants for technician roles [33]. If formal qualifications continue to be an insufficient indicator of applicant’s abilities, then companies will have to find new methods to screen them, as is happening already with the increase in psychometric testing. When beginning our research, we went down many different routes to get a broad range of information. Through emails and other means of correspondence, we identified a few problems that employers encounter with these psychometric tests. Firstly, they are not always sat in test centres, and many are done online. This always leaves the possibility that people may try to cheat on these tests and get other people to sit them on their behalf [4]. This is unfair on other candidates, as well as misrepresentative, causing people who may not be suited for a role to progress further in applications than they otherwise would. Having said this, most of these tests have been designed in such a way that they are fairly difficult to cheat on - for instance having time restraints [53]. We have also found that these tests are mostly used as a means of filtering candidates, so passing them doesn’t necessar- ily guarantee any further success. Secondly, some companies have said that tests may potentially be unrepresentative since people only get one chance to take them[53]. Due to many different circumstances, an employee may well underperform on the test, and so not demonstrate their full potential. This could cause companies to miss out on hiring perfectly well-suited candidates, in which case the tests would be causing a misallocation of their resources. Some companies have a validation test in place that allows people who got unexpected results to retake the test. However, obviously not all companies will guard against inconsistencies in this way [53]. On the contrary, many recruiters we spoke to stated that these tests and their scores are used only to assist in the recruitment process, and are not the sole factor for employ- ing people [51]. Instead they are used as a guidance to help make informed decisions on applicants, so a well rounded application is essential in addition to these tests [4]. 11
  • 15. “Numerical reasoning is the ability to understand, interpret and logically evaluate numerical information. Numerical reasoning is a major facet of general cognitive ability, the strongest overall predictor of job per- formance” [44]. Due to the numerical reasoning skills you exhibit when you take a numerical reasoning assessment, they are seen to be the ’best overall predic- tor of job performance’ [44]. Both numerical and verbal reasoning tests are combined to be an overall aptitude assessment that highlights the most well rounded, suited people for the job. Aptitude tests show employers skills that cannot be replicated in interviews, nor be observed by reading CVs and look- ing at past references. They are a true, accurate and quick assessment of how candidates perform on the spot in a pressured environment. The ’govern- ment mathematical report’ [25] alongside Careers’ websites, such as Assess- ment Day Ltd [37] and Inside Careers [31], agree that the only mathemati- cal abilities being tested on numerical assessments are addition/subtraction, multiplication, percentages, currency conversions, fractions and ratios. In addition, they are testing the ability to “interpret the tables and graphs correctly in order to find the right numbers to work with” [31]. Numerical reasoning tests are normally timed, in order to measure applicants’ ability to think on their feet and problem solve under time pressure. Prospects [45], a website designed to help people looking for jobs, stated that employers in most industries are looking for applicants with plan- ning and research skills, i.e. those applicants with the ability to find relevant information from a variety of different sources. Information can be presented in a variety of ways, such as with numbers, statistics or text in tables, graphs and reports. Employees need to be able to understand, analyse and interpret research and appropriately use it. Numerical assessments are testing these exact skills. In addition, tests can have varied levels of difficulty, to represent the levels of numerical skill that will be needed for the specific job. SHL Talent Measurement Assessments create a wide range of tests, ranging from aptitude and personality tests, to customised tests for individual companies [8]. They create a variety of tests appropriate for different job levels and industries. Numerical reasoning tests can be adapted to have more complex questions, requiring a more advanced level of numerical knowledge and skill. Another way of making them more challenging is to shorten the time avail- able to complete the test. SHL have quoted that their tests represent the ’level of work and academic experience’ [8] required for a specific job role. 12
  • 16. For example, SHL released an ’Aptitude: Identify the Best Talent Faster and at Less Cost’ brochure [9] stating that a Semi-Skilled staff job will require a VERIFY Calculation Test, where as a Director or Senior managerial job will need to be tested using the VERIFY Numerical Test, which is far more advanced. Furthermore, as numerical reasoning is just one aspect of an apti- tude assessment it means applicants applying for highly numerical jobs may also get asked to take a verbal reasoning test. In all jobs, an ability to com- municate with colleagues is essential. This reiterates the fact that aptitude tests are used to find the overall highest-calibre applicant. A job application process is not a simple task. For many job appli- cations candidates must spend hours researching the company, before writing the application form and preparing for interview. Practising the skills ex- amined in numerical tests is just another aspect of a job application that requires preparation. Does an applicant’s mark improve with practice? If so, then applicants can practise in order to achieve high results, no matter what degree they study or how long it’s been since they last studied mathemat- ics. For example, even an applicant that stopped studying mathematics at GCSE level can use the numerous online resources available to practice and prepare for numerical tests, and hence could easily ’revise’ for such a test and potentially perform very well. The overall consensus from our sources, is that numerical tests used by large companies (especially those with large numbers applicants), are gen- erally a candidate streaming process. With UK education standards rising and a larger number of students receiving higher education (In January 2015, 592,000 people had applied to University, up 2% from the year before) [11], more people are eligible to apply for graduate scheme jobs. High Fliers Re- search presented their findings in a report, ’The Graduate Market in 2014’ the Telegraph [49] and stated that graduate schemes now receive approxi- mately 39 applications to every available job. With the number of students applying to such schemes high and rising, it is extremely hard to differenti- ate between candidates who have all achieved high grades and well regarded university degrees. How do you select the ’best’ candidate from thousands of similar applications? Due to this difficulty, companies use these to reduce the number of applicants they consider in the next application step. Accord- ing to Personnel Today [47], 80% of companies use standard off-the-counter numerical tests provided by companies such as SHL. Only 18% use a test which they have tailored to measure unique, customised skills that they are 13
  • 17. looking for. Some would argue that since off the counter tests aren’t unique to a company, then such a numerical test will not truly assess competency for a specific job role. 1.2.5 How do people learn through computer-based assessment? What works and what does not? Another topic we explored was how people learn through computer- based assessment. There are many methods that aid learning on a com- puter. The most popular and commonly used forms of these are multiple choice or true/false questions, labelling images, rank ordering and gap fill- ing. Computer-based assessment can be very popular with both students and teachers. They increase student confidence and are liked by students due to the fact that they get rapid, if not immediate results. They can even be com- pleted in a student’s own time when they are ready to do so. A teacher is also likely to use these methods as a way of administering frequent formative or summative assessments, since less time is spent marking. Then not only can they spend more time adapting their teaching methods (depending on the results of these assessments), but they can do so reasonably soon after the test is taken [39]. Feedback is crucial to the learning process and, as mentioned, one of the advantages of immediate feedback is that the student receives their result straight away, rather than after they’ve moved on from a particular topic. A study conducted at the University of Plymouth [36] compared two groups of students; one using several online materials with two levels of feed- back and another using none of them, to see how they performed in an end of module summative assessment. The group using the available study ma- terials performed significantly better than the other group. Although computer-based assessments can greatly benefit a stu- dent’s learning, there are concerns that online tasks, especially multiple- choice questions don’t encourage deep thinking about a topic, and so don’t aid learning [34]. In order to be as beneficial as possible, these assessments need to both engage and motivate students. 14
  • 18. 1.2.6 Will different types of learners (kinaesthetic, visual etc.) have different levels of numeracy? Our final area of research was different learner types, and whether some of them would be better at numeracy than others. According to ESL kid stuff, there are many different types of learners, such as Tactile, Global and Analytic. However most people fall into at least one of the following three categories: Kinaesthetic, Visual and Auditory [52]. Katie Lepi [35] describes these types of learners in her article, “The 7 Styles of Learning: Which Works For You?”. She describes kinaesthetic (or physical) learners as people who prefer using their bodies, hands and sense of touch. Writing and drawing diagrams are physical activities, so this sort of activity really helps them learn. Role-play is another commonly used activity for these types of learners. They often have a ’hands-on’ approach, so learn best from partaking in physical activities. On the other hand, visual learners do better by looking at graphs, watching someone do a demonstration or simply by reading. Finally, auditory learners are the kind of people who would rather listen to something being explained to them than read about it themselves. A common way for them to study is to recite information aloud, or to listen to recordings. They also usually like to listen to music while they study [57]. There are many different types of learner styles, and even though most people use a combination of all three techniques, they usually have an idea of how they learn best. If you know what type of learner you are from a young age, then it puts you at an advantage. However, it is also important to adapt your learning techniques whilst you are young so that you are able to use each learning technique effectively [48]. Our aim is to see if there is a correlation between numerical ability (based on our test results) and type of learner. We understand that online computer-based assessments mainly cater for visual types of learner, and so we do not aim to change the online test in order to reflect this, but instead hope to test this theory as part of our analysis. 15
  • 19. 2 Methodology 2.1 Group Organisation In this section, we discuss how we took full advantage of the time given to complete this project, by organising the group members efficiently. 2.1.1 Meeting Times In order to make the most of our meetings, it was important to choose a suitable time for everyone. We decided it would best to meet 2-3 times a week, including a weekly meeting with our project advisor. We ini- tially discovered that there were not many slots in the week that we could all do, due to timetable clashes. To make things clearer we used the widely acknowledged online scheduling tool Doodle (see Figure 1), to pick a conve- nient time for all group members. The Doodle worked well as it was very efficient and quick to carry out, and prevented the confusion we found with suggesting times among ourselves. In the first few weeks of the project, we met a considerable amount, however as term progressed we had set times in which to meet every week; 15:30-16:30 on a Monday and 10:00-12:00 on a Wednesday. To make sure we had a private space for every meeting, we assigned one person to be responsible for booking rooms. During these meet- ings we would discuss development of the project, by updating each other on the progress of our individual responsibilities, and we would delegate future tasks. 16
  • 20. Figure 1: An example of us using Doodle to decide on suitable times for our group meetings. 2.1.2 Communication One in seven people now use Facebook to connect to their family and friends [32]. It is the most popular form of social media. As a result, we decided the best form of communication between group members, would be through Facebook. We created a closed group (see Figure 2) so that we could share files containing any work we had completed. We also exchanged numbers and created a ’Group Chat’ on Whatsapp, an instant messaging application. The team looked into using Google Documents to keep and edit our work. We found we were limited by this as the site required a Google ac- count which not all group members had. It also was more difficult to facilitate comments and project related discussions. In contrast, our Facebook group allowed all these things and it was quickly decided that this site would be our main form of communication, as no other platform worked more efficiently. 17
  • 21. Figure 2: Evidence that we created a closed Facebook group with all mem- bers. 2.1.3 Combatting Risk The decision to use Facebook as our main method of communication was ideal for our project. It minimised the possibility of losing files and data, which would have had a huge impact on our project. The use of a closed group meant every member of the group could access and upload documents quickly and efficiently throughout the project. So that the rest of the group could edit key information or findings, if necessary. We also decided to split into subgroups which combatted the risk of absence. If one member of a subgroup was not able to complete a certain piece of research, for example due to illness. The other members of the subgroup would be able to finish it, since they would also have a good understanding of the task, having been studying the same topic. 18
  • 22. Initially we went about identifying all the tasks and activities we wanted to complete throughout our group project. We were then able to create a critical path (see Figure 3) to see if we would be able to finish all these tasks within the time available. The critical path also allowed us to recognise what needed to be prioritised and what could be completed in parallel to one another. Figure 3: Our Critical Path Analysis. 2.1.4 Subgroups Once we had highlighted the key parts of our projects we decided that we would split into subgroups to spread the workload. This enabled us to undertake multiple tasks at once so that we can collaborate to meet our timeframe. The four groups were: writing the questions, programming, statistics and writing up of the report. When deciding whom to put in which subgroup we asked each individual what their strengths and weaknesses were, in order to best utilise our skills for instance, some members of the group preferred programming to statistics. 19
  • 23. Deciding who would be in each subgroup was not difficult. Some members of the team were interested in the creative nature of writing the questions. While others had enjoyed computer programming modules taken in previous years. We decided to put more people into the programming subgroup, having highlighted early on that this was probably going to be the most time consuming part of the project, and that there was not a lot of previous programming knowledge within in the group. Some members have already statistically analysed models in the past, so they formed a statistics subgroup. Then finally, another subgroup has put themselves forward for editing and compiling the final report, as they have experienced working with LaTeX and enjoyed editing the written information. Even though the final version of the report will be passed through this subgroup, everyone has taken a very active role in the write up of the report. 2.2 Data Collection 2.2.1 Preliminary data collection The next stage for our team was to gather preliminary data to aid our project - in particular with the development of our own online test. We started by doing some initial research around our topic, in order to find areas that we could look into further. After discussing our initial findings, we came up with four main topics that we would research further, as stated in our introduction. As a result, we had to forgo many other interesting areas, but we decided that these were the four most relevant areas on which to focus our objectives. We also felt that including any more areas of study would cause us to not have enough time to complete the project, nor would we be able to write about them in sufficient depth. We split up our team into four two-person groups and assigned a different area of research to each one, so as to manage our time and resources more efficiently. The only down side of this was that not everyone in the group was fully informed on every topic. However, this was easily overcome by compiling our research into one docu- ment, and making it available on every social platform that we were using. We went about our research in a variety of different ways. Firstly, using available literature such as reading papers, articles, books and web- sites to find evidence for or against our initial thoughts on each topic. This involved much information dissemination and analytical skills on the part of the researchers who had to read through huge amounts of information 20
  • 24. and extract the necessary details in an articulate way. In addition, we car- ried out primary data collection by emailing and contacting relevant sources, such as employers, online test providers and academics. For some of these we established individual contact, asking them specifically for advice or more information on our project, but for the bulk of employers and career websites, we generated a questionnaire to distribute to them. We decided to do it in bulk after the quick realisation that not many companies were responding to our emails. This could have been due to the fact that they were not in- terested in our group project, or some companies might have been too large to assign a contact or specific department to contact us. Using inputs from the separate research groups, so as to make the questionnaire as relevant and useful as possible, we asked a range of questions. This questionnaire was also in a far easier format for companies to respond to, as it saved them time and effort formulating unassisted responses. Bulk distribution ensured that we got as many responses as we could in the limited time frame we had to complete our research. Once the research stage of the project had been completed and we had all our necessary sources, we began to write it up. Within our subgroups, we compiled our best findings and formalised them for our report. We each wrote up our sections, complete with references, ready to be passed along to the editing team. With this, we also included a full write up of our reference information to go into our bibliography. 2.2.2 Survey Now that the research stage of our project had been completed, it was time to move forward with the creation of our own online resource to test our findings. After discussing it as a group we decided that one of the easiest and quickest ways to gather information was by creating an online survey. We felt that this was far quicker to distribute and analyse results with than other methods, such as focus groups, meaning we would have less of a time constraint. The aims of the survey were firstly, to test some of the conclusions and theories formed from our research and discuss what this showed and secondly, to help us create our computer-based assessment by finding out what students find most useful when they are learning. To do this, we asked several questions about learning techniques, types of learners and effective testing methods. We then passed this information on to the subgroup in charge of writing the questions for our online test. They used the 21
  • 25. survey feedback to help us create a test in response to what people preffered. We felt this would give us a more tailored test written in the most helpful way to students. The fact that the test was designed with student input in mind meant that we could try to benefit test participants, and hopefully improve on currently available tests. Figure 4: The first page of our survey. We created the survey using Google documents (see Figure 4) and set it up as a form. We looked into other online survey distributors but found Google documents to be the best platform as most required a payment to release the survey if it contained more than 10 questions. Google forms allowed us to make an unlimited amount of questions, was quick and easy to use and exported our data straight into Excel for us to analyse. Using contributions from all research groups we generated a draft survey. The sur- 22
  • 26. vey was then checked by the group to ensure it was appropriate before it was released. This allowed us to make a few necessary changes to the word- ing and remove overlapping questions in order to shorten the test. We were aware that people might be put off taking our survey if it was too lengthy and therefore time consuming. For this reason, we tried to ensure that most questions involved answering either with multiple choice or with a scale of agreement. We also tried to make sure that the survey took no longer than 15 minutes to complete. We then distributed the survey to the public so that we could anal- yse our results as quickly as possible, given that we were on a tight schedule. To take the survey all that was needed was the web link. We spread this link across as many social media platforms as we could, including Facebook and Whatsapp. We felt that this would be the quickest way to distribute our sur- vey as it would target our main audience, students, in a way that was easily accessible for them. The fact that the form was created online made analysis far easier as we could see responses as they came in, and so by keeping an eye on the data, we were able to start analysing the feedback as soon as we had a sufficient number of responses. After approximately a week, we had gathered a large amount of responses and when numbers began to plateau we decided to start reviewing the data. 2.3 Test design, creation and analysis 2.3.1 Producing the Questions While the programming group focused on the technical aspects of creating a computer based assessment, those tasked with writing questions for the test had to make sure they referred back to the information we had already collected about online assessments. We started off by looking at re- sults from our survey in order to determine what types of questions we ought to be asking. As found in our initial research, multiple choice questions were the preferred method of answering. The survey has shown us that gap filling was the least popular method, however we decided that we would still in- clude questions of this form in our test for two reasons. Firstly, it is the most accurate way of seeing if a student has really understood a question, since they can’t guess the answer, and secondly, we thought it would be interesting to include so that we could see if students tended to do worse in these types of questions, as we had hypothesised. 23
  • 27. The next stage was to decide what topics to base our questions on. We wanted to focus on the numerical reasoning style of questions just like on the currently available employability tests. We did this by researching these numerical reasoning tests and replicating their style of questions. This ensured our test was relevant and had the potential of preparing people for such testing. Some initial points raised focused on the types of questions we would have to ask, what topics we would focus on, and how many levels of difficulty we should have. It was also noted that our questions would have to be both realistic to program and relevant to research, in order for the results to pro- vide us with useful information that the statistics group would then be able to analyse. Each member of the subgroup was then tasked with a different research assignment. One member focused on how to effectively test different learner types, while the other two members focused on looking up example questions at different levels of difficulty. Having done this, it emerged that online tests naturally cater more for visual learners and not for the other two learner types [10] [40] [50]. We took the decision not to focus on this aspect when writing our questions, as we would not be able to created different types of questions for a specific learning style, other than visual. Having established that a variety of levels was essential to fulfil our aim of creating an adaptive test, it remained to decide which difficulties we would pick. Since we knew that all participants would have a minimum of GCSE-level mathematics or an equivalent qualification, but not necessarily any further qualifications, we decided to make this our top level of difficulty. However, after their preliminary discussion with statistician - Dr Ben Young- man, the statistics subgroup informed us that having any more than three levels of difficulty in our test would significantly hinder statistical analysis of data later on in the study. This is due to the fact we would be unable to create an effective model. On the other hand, we were concerned that this would reduce the range of results, as if we had six similar questions at the same level, and it was likely that if a participant could answer one question correctly, then they could complete them all. For this reason, we decided to incorporate both KS2, KS3 and GCSE level mathematics. The final element of the decision making process involved reading through the current curricu- lum for KS2, KS3, as well as GCSE-level Mathematics. The final element of the decision process involved reading through the curriculum for Key Stages 2 and 3, as well as GCSE-level mathematics, in order to single out the recurring, most important topics so that we could 24
  • 28. base our test questions around them[22] [24] [23]. The final decision we made was to write two questions for each of ’percentages’, ’ratios’ and ’algebra’ at Key stages 2 and 3, and then to write six statistics GCSE questions, which would incorporate these topics. In this way, we would have 3 multiple choice, and 3 gap-fill questions at each level of difficulty.? Once we?d made all the relevant choices, it was time to write the questions. We found examples of questions on the topics we were focusing on, by looking at teaching resources websites, such as TES [2] [54]. We then adapted these to suit our own needs ? not only did we want to model questions to resemble currently available online assessments, we also had to generate wrong answers for every question that was to be multiple choice. This was the hardest element of the process, as it involved deliberately mak- ing common mistakes with the aim of generating possible wrong answers. Luckily, this was achievable, and due to diligently writing down our thought processes, we were able to relate how we?d created these wrong answers to the programming team, so that they had an algorithm to use in the randomi- sation of questions later in the process. 2.3.2 Programming the Test In this section, we discuss the writing of our online test. Our test acted as a vehicle to provide relevant data in order to help answer the theories we had posed from our research. This meant it was an integral part of our project outcome and was therefore very important to us. We began meetings regarding the creation of the test very early on as we were aware that it would be a very time consuming part of our project. In these, we discussed how we were going to approach the programming aspect. Firstly, we had to choose the programming languages that we would use. We looked into a few different methods. Our first idea involved using the Exeter Learning Environment so that all Exeter students would be able to easily access the test. We thought this would help with distribution, as this website is used by all students at the university, however, the programming behind the website was far too restricting in terms of what we had planned with regards to coding. It also presented the problem that our results would be restricted to one university. Another language we could have used was a version of Maple that would both code and present our questions. It became apparent that it wouldn’t facilitate certain aspects of our test, such as feedback and randomisation. After exploring these different options with 25
  • 29. our project advisor, we decided it was best to use the popular server sided language PHP, HTML to code the questions, and to store data on a MySQL database. We choose PHP as it is a relatively simple language that integrates easily with HTML, which is the main language used in the appearance of web pages and was what our questions would need to be written in. It was also the most flexible language so would not restrict us in the design of our test and would enable us to create dynamic webpages involving randomised variables. This was very important as many of our test aims involved randomisation and forms, something that PHP would facilitate and so it would enable us to move information on and off our database effectively. The only limitation of this was that before we had access to an online server we would find it difficult to practice running our code. This was overcome by using XAMPP, a free software that replicates the process of using a server but can be done offline. This meant that we could run our test as it was developed, in order to check its appearance at every stage of developing the test. 26
  • 30. Figure 5: Above is an example of PHP code which we used to generate Question 1 in our Numerical Reasoning Assessment. 27
  • 31. Figure 6: Above is an example of HTML code being echoed in PHP which we used to submit Question 1 in our Numerical Reasoning Assessment. The subgroups had researched the current curriculums, decided on the layout and the contents of the test, so the next step was for the program- ming team to create it. Firstly, we familiarised ourselves with both PHP and HTML and got used to writing functions. We used a variety of resources from the library [13] and the internet [59], as well as using our own previously acquired skills. We aimed to understand how to print text, show images and generate tables using HTML so that we could write a well-presented and pro- fessional looking test. We also had to learn how to interact with our online database, move data on and off it and store our results. Following this, we split up the workload between five people, each person being in charge of cer- tain questions and aspects of the test. The limitations we came across were 28
  • 32. the time restraints for programming, because of the short 10 week period. Due to our initially low level of programming skills, a significant amount of time was spent on familiarising ourselves with the languages and understand- ing the capabilities of the chosen languages. The starting page of our test provided some preparation informa- tion on materials the participant would require, as well as explaining the procedure of the test. The voluntary nature of the test had been specified to ensure the participants did not feel pressured and could terminate at any point. The second page of the test was dedicated to data collection, gath- ering information on age, gender, subject area, GCSE mathematics grade as well as how long it had been since they had last studied mathematics. We also included their university ID,as one of the variables which was then used as an identifier. This was in case a participant chose to sit the test more than once, so we would be able to determine whether an improvement in the mark occurred. The scores awarded were also linked to this identifier so only rows with a matching identifier would be changed, with a score of one for a correct answer and a zero otherwise. We also chose to ask the participants what type of learner they thought they were by providing relevant descriptions, in order to aid us in determining whether that had an effect on their mark in later data analysis. Another piece of data collected throughout the test was the time it took the participants to complete each question, we did this using timestamps in PHP. This helped to determine whether any cheating took place, as well as determining the questions which were found most difficult. As all of us already had sufficient understanding of the code for the write up of the questions, as they were only tables and simple text so were quick to do, allowing us to concentrate on the more complex parts of the programming as described below. Some of our questions (please refer to the Figures 39 to 81 in the Appendix Section for print screen shots of our Numerical Reasoning Assess- ment) included images, such as pie charts and stick diagrams to cater for different types of learners, as mentioned earlier on in this report. Initially we attempted to code them instead of just inserting the images themselves so that we would be able to adapt them, but soon realised it would be an unrealistic target for the short time and limited skills we had. As a group, we made the decision to include them as static JPEG images instead, deciding that the impact of this would be very small. In certain instances we could avoid this limitation, as we were still able to randomise the questions. For others, we decided it was more important to meet our time constraints and 29
  • 33. generate our statistics than to worry about randomisation. As we wanted to produce questions with both multiple choice an- swers and manual input answers, two types of code had to be written. The approach to writing the multiple choice questions was more complex and time consuming, as realistic answers had to be developed in order to make mistakes believable and the correct answer was not too obvious. However, recording both types of answers as either right or wrong used the same pro- cedure of defining a correct answer and comparing an answer given to it, therefore assigning a value of zero or one. One of the main aims of our project was to build a test that provided immediate feedback, in order to help students improve as they went along and provide understanding if they made any mistakes. Therefore, following every question there was a separate page with a full step by step solution to show how it should have been approached. Another goal of ours was to randomise all of our questions. This involved randomising any values that were used within the questions, so that although the approach and the formula were the same, the question values and answers would be different every time the page was opened. We chose to do this to prevent people from cheating if sitting the test with other people. Also, it enabled us to see more accurately if people’s performance improved, in case they sat the test more than once. The process of randomisation made creating false multiple choice answers and providing feedback more complex. Multiple choice false answers were created using formulas covering the com- mon mistakes, the values used within that had to be fetched and carried through to the PHP page that submitted scores. The same page also pro- vided the feedback, so values were carried to the formula explanation. Another one of our initial aims was to make the test adaptive, so that the next question depended on whether or not you had got the previous one correct. The purpose of this was to enable people to reach an under- standing of a topic before moving to a more difficult question. The team began looking into various methods that would allow us to create banks of questions with varying difficulties. However, when our statistics team con- sulted with our statistical advisor, he advised us that this would be far too hard to model, as we would have many different categories within our vari- ables. Without the model we would then be unable to analyse our statistics well and gain any consequential evidence from them that we could compare to our research. Also with our insufficient programming skills, this would have taken far too long to complete within the time frame. Being such an 30
  • 34. unrealistic target we decided to exclude it, allowing us to concentrate more on our other objectives. Despite time constraints and possibly insufficient skills, a test capa- ble of gathering the required data was developed within the timescale. The next step was making our test live for participants to sit. We looked into some different ways of doing this, but settled for uploading it using our university’s servers. This meant that anyone with the web link would be able to access and sit our test, giving us the most opportunity for people to sit it. One other option we explored was to pay for an online server but this would have been costlier, which was unnecessary when we had free resources. Another option was to use our university college intranet servers, but this would have limited respondents as our test would then only be accessible for CEMPS (College of Engineering, Mathematics and Physical Sciences) students. 2.3.3 Test distribution To ensure that we achieved statistically significant data analysis, our statistics subgroup required a minimum of 40 responses to the test. We were aware that we had a short amount of time available to distribute our tests and that there were many potential difficulties associated with getting enough participants. As a result, we made a very concerted team effort to distribute the test widely, and as quickly as possible. We did this using a variety of social platforms such as Facebook and Whatsapp in order to raise awareness about the project, and to provide a web link for people to take our test. A leaflet was also created to inform people about our test and the benefits it could provide, which we distributed on campus to encourage a wider spread of participants in terms of demographic such as degree type and age (see Figure 7). 31
  • 35. Figure 7: A leaflet promoting our Numerical Assessment. 2.3.4 Test Analysis The first task for the statistics team was to identify what type of analysis we wanted to carry out on our test data. This needed to be completed at an early point in the project so we could relay this to the programming team. The relevant questions were then programmed into the test. We went about this task by breaking down each of the research sections, reading all the research findings, and then deciding the relevant statistics we needed to look into. 1. Why is mathematics important? The Mathematics vs Numeracy De- 32
  • 36. bate. (a) Look at the correlation between test score and GCSE mathematics performance, degree and time since studying maths to see if any of these affects the score. 2. Why do employers test for numeracy skills? (a) What was the average score? What was the range of scores? (b) What was the standard deviation of scores? This can identify whether numerical reasoning tests are able to differentiate between people. (c) What is the standard deviation in the score achieved by people studying the same degree? (d) Did anybody resit the test? Did they achieve a better score the second time? (e) What was the range, standard deviation and mean time taken to complete the test? 3. Do different learners perform differently on numerical reasoning tests? (a) Look at the correlation between score and type of learner. (b) Break down the questions categorically into charts, tables and text questions. Which type of question got the best score? (c) Look at the correlation between the type of learner and the score. Do some types of learners perform better than others? 4. How do people learn through computer-based assessments? (a) Did people read the feedback? What was the average time taken between the questions, on the feedback page? Plot the frequency of time. (b) Did people perform better on the multiple choice questions or the manual input questions? (c) Did people speed up as they took the test? 33
  • 37. ’Practical Regression and Anova using R’, [21] stated that regression analysis is beneficial because firstly, predictions of future observations can be made. Secondly, the relationship and effect of multiple variables can be assessed and finally, a general understanding of structure of all the data analysed can be gathered. Therefore, for all the statistics we required in each research topic, it was necessary to make a regression model for the test scores. The same article also identified that the steps taken in regression analysis are: 1. Identifying the distribution of the data. 2. Identifying the initial regression model. 3. Carrying out an initial assessment of the goodness of fit of the model. This would be through hypothesis tests on the variables and numerous diagnostic plots. 4. Using methods to identify the best model fit. ’Applied Regression Analysis’ [5] proposed using stepwise regres- sion to achieve the ’best’ regression fit. This is because working with more variables than necessary is avoided whilst still improving the regression fit. Stepwise regression starts with a regression model with one variable. It sub- sequently adds and removes various variables until the largest coefficient of determinant is achieved. Hence, the model with the largest significance is identified. After this best regression is found, we will be able to identify which variables have the most significant effect on test scores. This is vital for answering our four research topics. We will also be able to make pre- dictions on future scores, such as what score would a ’visual learning girl, studying law, with a grade B in GSCE mathematics and who hasn’t studied mathematics since GCSE’ achieve? It was realised that we would need to collect as many responses from our test as possible. We posed a question to ourselves, ’How many peo- ple need to take our test in order for the results to be significant?’. Having spoken to Dr. Ben Youngman, a University of Exeter Statistic Lecturer, we agreed we cannot create a ’minimum number’ and that the distribution of the scores will depend on the scores of those who take the test. It was clear that as little as four scores would be insufficient to create strong arguments from our findings so, as a group we made a personal aim to get at least 60 entries. 34
  • 38. We decided to use R-statistic to run all of our statistic analysis. R-statistic is a leading tool for statistics and data analysis. It very efficiently performs the type analysis that we required, such as producing correlation matrices and modelling data. R-statistic also easily integrates with other packages such as Microsoft Excel. Hence, making it easy for us to export our MySQL database, containing all the test data into an Microsoft Excel spreadsheet and perform our analysis using R-statistic from that. Output in R-statistic is presented in a very clear way that is easy to interpret. Our final reason for using R-statistic was that everyone in the statistics team had used it before, making us very familiar with the built in functions and pro- gramming language. Additional reading in ’Practical Regression and ANOVA using R’ [21] was also used to refresh and improve our R-statistic knowledge. Figure 8: Above is an example of R-statistic code. 2.4 Report Feedback As a group, we recognised the importance of getting external feed- back on our report. Our project’s main aim was not just to create a test, but also to see how our findings related to literature and to observe their poten- tial impact on future students. Receiving opinions on our results would give us a more comprehensive view of our work and would enable us to perform a 35
  • 39. more thorough and independent evaluation. We decided to contact experts via email as we thought this would be the most efficient form of communi- cation. Our first thought was to seek a statistician - we needed someone to evaluate our model and give feedback on our findings. We met with the same person who had advised us earlier on in our project, Dr Ben Youngman. We hoped that he would be able to advise us on anything we may have missed. We also sent our report to Rowanna Smith, the lead Careers’ Con- sultant for the College of Engineering, Mathematics and Physical Sciences, based in the Career Zone at the University of Exeter. We wanted to find out whether, based on our findings, the university would consider using a similar test as a resource made available to students in preparation for job application tests. We also wanted to find whether our results were significant enough for the University Career Zone to consider a change in the advice they currently offer students with regards to preparing for these kinds of as- sessments. Our final point of call was SHL, a provider of numerical reasoning tests. We wondered if they would consider changing their test writing meth- ods based on our own assessment and its findings; for instance inclusion of feedback. We also questioned whether they would consider taking into ac- count different learner types by adapting their tests to suit a wider range of people and their learning habits. 2.4.1 Skill development - Graduate Skills The project we undertook led us to develop a variety of skills as well as gain new ones. As the project involved a very tight time frame, a large amount of time management and task delegation had to take place to ensure all the different sections of the project came together effectively and on time. To enable this to happen the project was broken down into separate sections, which helped us stay on track. These enhanced skills will prove very useful in later life, as many graduate roles will require efficient management of many different tasks, most likely with tight deadlines. Not only did we have to manage our time well by setting realistic targets, but we also had to adapt to changes and challenges that occurred along the way. Over the course of the project, this enabled group members to become more flexible, something required in all future aspects of life. Working in a team has been an essential part of this project, without 36
  • 40. which our outcome would have been completely unattainable. The ability to work in a team is an invaluable skill for later life and prepares us for situations both in and out of the workplace. The ability to communicate effectively with the other members was crucial in enabling the team to stay on track and be transparent so that we could be aware of any potential problems. As a grad- uate, this is vital in order to be able to be part of a working society. Another skill acquired during this project was the ability to research quantitatively and qualitatively as well as to disseminate information and synthesise oth- ers’ ideas. This process was approached in different ways, including a vast amount of reading and contacting both employers and academic members of staff, resulting in a well-rounded background for the report. Research skills are essential to many roles, either directly for graduates in technical roles, or indirectly as transferable skills by improving general analytical and summarising abilities. Designing the test to collect our data developed the team’s problem solving skills as we had to explore several ways to achieve our programming criteria. It also gave us all a basic understanding of one of the most popular scripting languages on the web, an invaluable skill to many employers. The team has also acquired skills in data collection and statistical analysis in order to understand and present the project’s findings, something that many employers look for and value highly. A large aspect of our project involved presentation, both as small progress reports and as a final summary of our report. Through this, all group members had a chance to present their work to an audience, gaining beneficial speaking and performance practice, something we get very little chance to do due to the nature of our degree. This enables people to gain the vital social skills that employers hold in high regard and make up a large component of job applications. 37
  • 41. 3 Findings 3.1 Survey When it came to collecting survey results, it was reasonably simple to analyse our data. Due to having created the survey in Google Forms, we could monitor responses as they came in. Google Forms also produced some basic statistical representation for us, so immediately we had an overview of the key information. Overall, we gained 79 responses which was much higher than our aim of a minimum of 40 respondents. In terms of demographics, we noticed that we had a higher number of female participants with over 70% being women. Also, almost 70% of our respondents were in their third year of university and so dominated our responses (see Figure 9). This was likely due to the fact that our own group was made up of a group of third year students, who were predominantly female. However, due to the nature of our survey and the questions asked we did not feel that this would cause any issue. Especially considering the fact that third year students are the most likely to have come into contact with employability tests, and should also have a good idea of how they learn best at this stage in their education. Figure 9: Pie Chart of Gender and Year of Study of participants in the Survey. 38
  • 42. Figure 10: Bar Charts of responses in two survey questions. The first set of questions in the survey gave us information on the different ways in which people like to learn and to be tested. The survey worked in two ways. Firstly, it acted as preliminary data for our research, through gathering more information and current opinions on online tests. We planned to compare it to our test findings later on in the process, in order to make comparisons. Secondly, the survey provided new data for us to compare with what the group had already learned from the research carried out. We found that the majority of people preferred multiple choice questions on online assessments, concurring with our research findings that this is a popular, commonly used method. It is worth noting that since possible answers are always provided, it means that these questions do not require as much original thought on the part of the student. It also means that students already have a percentage probability of selecting the correct 39
  • 43. answer, in our case on average 20%, something that may influence people’s preference for this style, based on perceived comparative ease. The fact that this style was preferred was passed along to the subgroup tasked with writing the online assessment questions, so that this could be taken into account. It was also seen that people feel they benefit significantly from feedback. This matches the opinion we found when conducting research, based on a Plymouth study [36]. This suggests that not only do people want feedback, but that a student’s results can improve significantly as a result of it. This confirmed our decision to include feedback as a major component of our own online test, to ensure that people would be able to learn from their mistakes in previous questions. In terms of Mathematics vs. Numeracy, there was a mixture of results. There were originally mixed opinions from people when asked if they believed their mathematical skills had deteriorated since they had stopped studying mathematics, with the majority of people taking a neutral stance (see Figure 11). The second largest response was ’slightly agree with the statement’, implying that slightly more people may feel this to be true. This may be slightly skewed, as people still currently studying mathematics are likely to strongly disagree that their abilities have deteriorated, due to the fact they are still using them. This surpassed the purpose of the question - to investigate people who have stopped studying maths, and consequently do not use it as often. This may have been the reason for the large spike in people strongly disagreeing with the statement, which made it harder to analyse how people perceived their maths skills, as many of the results shown were not relevant. Figure 11: Bar Chart of responses on deterioration of mathematical skills. 40
  • 44. Figure 12: Bar Chart of responses on deterioration of mathematical skills, excluding mathematics students. To combat this problem, we decided to exclude mathematicians from our data and to repeat our statistics (see Figure 12). This ensured that all respondents tested had all finished studying mathematics and so we could give a full representation of deterioration of mathematics skills. From our new calculations, we then produced a graph similar to our expectations, showing that most people felt that their skills had somewhat deteriorated since they had last used mathematics. This clearly agreed with our research, which showed a strong difference between people who currently study mathematics and those that had stopped. We could compare this with the same effect that results from being unemployed showed us in our research. It was also similar to the study on nurses [20], who performed worse on a similar test after a gap year. However, our data consisted more of qualitative opinions than quantitative results. This slight difference meant that we could not draw any solid conclusions from comparing the two, but could, however, take note of the strong similarities. One limitation of our data may have come from the differing opinions on when participants classed themselves as having stopped studying maths. Some students who study more scientific or quantitative degrees may regard themselves as still using mathematics in their degree, given that they use it regularly in their university work. While others will claim not to study mathematics any more, since the subject itself is not 41
  • 45. contained in their degree title. Despite this, we felt the discrepancy did not impact our results too heavily, as such students would still have been likely to be of the same opinion when it came to rating their mathematical ability, and so we could still assess the difference. Another slight limitation in comparing our data with literature was that in some similar studies, those tested had been out of any form of study or work at the time, whereas all the students in our survey are all still in academia. This would definitely have affected the extent to which they felt their mathematics skills had deteriorated over time, possibly making our results less pronounced than they otherwise would have been. In addition, our survey showed us that 67.5% of people (see Figure 13) believed numeracy and mathematics to be different things, which agreed with much of our research regarding the Mathematics Vs. Numeracy debate. This shows that the general consensus is that they are different disciplines and require different skills, even if they technically overlap by definition. It would have been beneficial to know why the students thought this, and if they agreed with our research findings on potentially teaching them as two separate subjects. However, due to the design of our survey we were limited to a few set answers and so it is difficult to say how consequential these results are. We attempted to overcome any potential gaps in a participant’s knowledge by giving official definitions of both words, allowing them to make a well informed decision, which may have helped to mitigate some of this problem. Figure 13: Pie Chart representing the opinion of participants on Mathematics Vs. Numeracy. 42
  • 46. Figure 14: Pie Chart representing how participants feel they learn best. 43
  • 47. Since another large section of our research involved different kinds of learners, we included questions on this in our survey. Our research showed several of the learner types but we only choose to include the main three we had focused on in the survey. The team found that the majority of partici- pants fell into a set category, with less than 4% being unsure (see Figure 14). The smallest proportion was of those who believed themselves to be audi- tory learners; however, this was still over a fifth of respondents. The largest section was of the visual learners with 41.8% of people placing themselves in this category. We mitigated the risk of people not being aware of different types of learning or what category they may fall into by getting people to say which description fitted them best, instead of them picking from a list of unfamiliar definitions. However, there was still scope for people to have misunderstood and therefore picked a category despite not being sure, which may limit the reliability of our data. Having said this, our research showed that most people are a combination of these different learning techniques, so some cross over was always expected. In terms of how the different learner categories work, we believed that visual learners were likely to perform better for our chosen type of online numerical reasoning test, leaving the others at a disadvantage. When asked in our survey whether participants believed these online tests cater for different learners, almost a third of them responded negatively (see Figure 15). This helps to back up our research and hypoth- esis by showing that many people do not feel that their learning abilities are catered for. There is always the possibility of this proportion being over estimated by people who do not perform well in these tests in general or feel they should have performed better regardless of what type of learner they are. Nevertheless, as we still have a strong majority this should not have had a significant effect, and thus our data still shows that a significant amount of people feel that they are not examined effectively in online tests. We were able to test this further in the results from our own numerical reasoning test. 44
  • 48. Figure 15: Pie Chart representing the opinion of participants on whether Computer-based Assessments cater for different types of learners. 3.2 Test Our numerical assessment consisted of 20 questions split into three difficulty levels; KS2, KS3 and GCSE. The average mark achieved was 15.28. From Figure 16, it can be seen that the majority of participants scored highly, with over 50% achieving a score greater than or equal to 15. Figure 17 supports this, showing an interquartile range of 6 from a score of 13 to a score of 19. The interquartile range shows a strong concentration of high scores. There is a negative skewness in the results. The highest score achieved was 20, showing the ability to score full marks, whereas the lowest score achieved was 5. 45
  • 49. Figure 16: Histogram of Total Score. Figure 17: Boxplot of Total Score. 46
  • 50. Figure 18 supports the negative skewness of scores. There is an overall bell shape, showing a normal distribution. The light deviation of the peak to the right shows the negative skew. Figure 18: Density Plot of our model. To further analyse our data, we will break down the statistics into the four research topics previously mentioned. 3.2.1 The Maths Vs. Numeracy Debate. Why is mathematics important? The initial hypothesis was that a participant’s score would deterio- rate as the number of years since studying mathematics increased. Surpris- ingly, Figure 19 shows no correlation between score and years since studying mathematics as the line of best fit is a straight horizontal line about the 47
  • 51. mean score. However, the correlation coefficient is −0.21, showing a small negative correlation between the variables. Figure 19: Scatter plot showing the total years since studying mathematics vs the total score. Furthermore, from our research into Numeracy vs Mathematics, it is implied that numerical reasoning assessments do not test the skills which participants learn at GCSE-level maths. Therefore, years since studying mathematics has little effect on the score achieved. Our findings support this argument. However, due to the fact that the average age of participants in our numerical reasoning assessment was 20.24 and the average number of years since studying mathematics was 1.68, this does not reflect the whole population. The correlation between GCSE mathematics grade and score is shown by Figure 20. It shows that a higher grade achieved at GCSE resulted in a higher score in our numerical reasoning test. The mean score achieved by a participant with grade B at GCSE was lower than the mean score for an A or A* candidate. The lowest score achieved by an A* grade participant 48
  • 52. is higher than the lower quartile of A and B grade participants. The highest score achieved by any B grade participant is lower than the average score of an A* grade participant. From these findings, we can see that a strong mathematical background can result in a significantly higher numerical rea- soning test score. As the number of years since studying mathematics has little correlation with the score achieved, this shows that mathematics GCSE grade and actual mathematical ability affect a participant’s score more. This is again supported by Figure 21, which shows participants studying a math- ematical degree. It is assumed that these students have strong mathematical abilities, and that this is the reason they achieved higher scores. We cat- egorised’ ’mathematical degrees’ as Economics, Business, Medicine, Math- ematics and Science. The lowest mean score was for participants studying Humanity degrees. Interestingly, those studying a non-mathematical science (such as Biology) scored higher on average than those studying a mathemat- ical science. However, Figure 21 shows that these results are actually very close. Therefore, we can interpret from this that all sciences require some mathematical skills. Figure 20: Boxplot of Test Scores and GCSE Mathematics Grade. 49
  • 53. Figure 21: Boxplot of Test Scores and Degree. 3.2.2 Why do employers use numerical reasoning testing? As stated above, the average score achieved was 15.28. The stan- dard deviation of score results was 4.12. Standard deviation measures the degree of spread of score results. Initial research into why employers use numerical reasoning assessments showed that these tests filter out applicants and help to differentiate between candidates with very similar applications. As our lower quartile is 13, 75% of participants achieved a grade higher than 13. If an employer had a filter that cut out candidates that achieved a grade lower than 13, 25% of our participants would not have passed the test. This shows that numerical reasoning tests can be a useful tool to quickly remove weaker candidates from an application process. The standard deviation of 4.12 indicates a large spread in scores. This makes it a useful tool to differentiate between candidates, as score results are varied and spread out over a wider range of values. Not all participants will achieve similar scores. If everyone scored 15, they would all have to complete further assessments to gage which was the best applicant. Having varied scores reduces this problem. 50
  • 54. Figure 21 shows that the majority of interquartile ranges of the dif- ferent degree types are large. We see that applicants with similar degrees, where one would expect similar mathematical ability, still have a varied range of score results. This is useful for employers as they can use numerical rea- soning assessments to differentiate between applicants with the same degree title. Initially, we wanted to look into whether people had repeated the test to see if their score improved. This is because our research and survey findings showed that feedback and practice of numerical tests should improve your score. The mean time to take the test was 19.47 minutes. This means on average people took 57.81 seconds on each question. This justifies the reason why employers enforce tight time limits on numerical reasoning assessments (commonly a minute or less per question). This isn’t necessarily a method to filter out participants, but as we can see from our timings, it means appli- cants are put under pressure when completing the numerical reasoning test. Employers are keen to find out if a potential employee can work under pres- sure and in a set time frame. The level of difficulty of the numerical test can also be adjusted by changing the time limit. If our numerical test had a time limit of 15 minutes, less than 50% of participants would have been able to finish the test. From initial research we found that numerical reasoning tests are often used even in applications where numerical skills may not actually be necessary. From our survey we found that 37.2% of people believed it was unfair to be numerically assessed in their career job applications and felt they were at a disadvantage to others because they were not ’good at maths’ and ’had not studied it in a long time’. However, from our findings, we can say that employers can increase the time limit on tests, for example in our test to over 35 minutes, so that every participant is able to complete the test in their own time and not miss questions because their time ran out. This is concluded from the fact that box plot and whiskers in Figure 22, is completely below 35 minutes, and only outlier times are above. 51
  • 55. Figure 22: Boxplot of Time Taken to complete the Test. 3.2.3 Do different learners perform better on numerical reasoning tests? Figure 23 shows that visual learners on average achieved a higher score than auditory or kinaesthetic learners. Visual learners taking our test had the highest average and smallest range of scores. Literature research done at the beginning of our project, along with our initial survey findings suggests that numerical reasoning assessments used by employers online are not catered to auditory or kinaesthetic learners, with 64.1% of people who took our survey agreeing. The assessment being online, limits the ability to make a numerical reasoning test practical and active to suit kinaesthetic learners. Audio numerical reasoning tests are available, however are uncom- mon and usually only used for participants in special circumstances (such as visual impairments). 52
  • 56. Figure 23: Boxplot of Test Scores and Learner Type. Generally, people performed better in questions involving a visual aspect, such as a chart or graph. The average pass rate on these questions was 81.7%, whereas for text questions it was slightly lower, at 68.6%. This may be because the image or table breaks down the information making it easier for all learners to digest the figures, whereas paragraphs of text and figures cater more towards visual learners. 3.2.4 How do people learn through computer-based assessments? From our results, we can determine that the majority of participants neglected to read the feedback provided. The average time taken on the first four questions, were 6, 5, 9 and 5 seconds respectively. This is not enough time to read, understand and learn from the feedback. Research proposed that reading feedback improved score result, for example Rob Lowry in ’Computer aided assessments - an effective tool’ [36]. Our initial survey, Figure 24, also shows that 89.8% of people thought feedback would be a useful tool in an online test. However, as our numerical reasoning assessment was put forward as a ’test’ rather than a casual learning resource, people’s priority could have been to finish the test rather than learn from it. 53
  • 57. Figure 24: Bar chart of opinion on feedback from the survey. If every multiple choice question was guessed, a participant would have a 20% chance of getting each one correct, and hence we can statistically approximate that they would receive 20% as their overall score. Therefore, if a participant guessed all their multiple choice questions statistically they would have achieved 2.4/12 on average on these questions. So the pass rate when guessing multiple choice questions is 20% on average. Our results show a pass rate of 81% on multiple choice questions. This is significantly higher than 20%, suggesting that few (if any) candidates guessed all their results. The average time taken and average pass rate for multiple choice questions is 50 seconds and 81% respectively. For fill in the blank questions the average time taken was 53 seconds and the average pass rate was 69.5%. We can evaluate this to show that multiple choice questions are easier and that a candidate has a stronger chance of a scoring higher. Simply put, if their answer is not a multiple choice option, then they know it is wrong. In addition, if their answer is similar to an option available in the multiple choices, a participant can select this option and still have a change of getting it correct. This is not possible to do in a ’fill in the blank’ type question. This is supported by our survey, where 42.2% of people preferred multiple choice questions out of 8 different methods. Figure 25 shows the average time taken to complete each question in our numerical reasoning assessment had no trend, as the line graph has no pattern and looks random. If people learnt from the feedback provided, we would expect time taken for each question to reduce as their understanding of the questions asked increased. It became apparent that the feedback we provided was not used, so we cannot support our initial thought. In addition, the incorporation of three difficulty levels; KS2, KS3 and GCSE, could have 54
  • 58. counterbalanced the decrease in time taken, as the questions should have been getting more challenging. Figure 25: Line graph of average time taken. Furthermore, we looked at the average pass rate of the questions in each level of difficulty category. We divided our test into 3 categories; KS2, KS3 and GCSE. Figure 26 highlights that the average pass rate fell as the level of difficulty increased from KS2 to KS3. The average pass rate for KS2 level was 87.3%, whereas the pass rate for KS3 was 72.0%. The average pass rate was consistent from KS3 to GCSE level, both being at 72.0%. Our research supports the idea that employers can use numerical reasoning tests of different difficulty level to cater for allowing varying numbers of applicants through to the next stage of the application process. Participants taking a KS2 level numerical reasoning test would achieve a higher grade than those taking a GCSE or KS3 level numerical reasoning test. 55
  • 59. Figure 26: Bar chart of question category and average pass rate on the questions in that section. 3.2.5 Regression Modelling The density plot in Figure 18 supports the hypothesis that score re- sults follow a normal distribution (as previously stated this can be concluded from the bell shaped figure). The first multiple linear regression model fitted, involved the following variables: degree, years since studying mathematicss, GCSE mathematics grade and type of learner. For research into our four topic questions, we need to evaluate the effect all these variables have on the overall score of the participant. The full summary of our regression model used can be viewed in the appendix. As variables, degree, GCSE mathe- matics grade and type of learner are categorical they are interpreted in R as factors with levels. The regression formula for this model is: Y = 19.084 − 0.501X1 − 3.266X2−2.254X3−0.291X4−0.006X5+3.004X6−3.817X7−2.486X8− 1.093X9 + 0.170W − 4.178Z1 − 3.060Z2 − 2.088K1 − 0.247K2. Where Y is the test score. By using factors we limit the aux- iliary variables X1, X2, X3, X4, X5, X6, X7, X8, X9, Z1, Z2, K1, K2 to bi- nary (0,1). The X variables relate to degree, the W variable relates to years since studying mathematics, the Z variables relate to GCSE mathematic grade and the K variables relate to the type of learner. The p-value for all the variables are: Degree= 0.061498, Years since studying maths= 0.787063, GCSE mathematics grade= 0.007201 and Type 56