Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis

Assessment & Evaluation in Higher Education
Vol. 31, No. 3, June 2006, pp. 287–301
ISSN 0260-2938 (print)/ISSN 1469-297X (online)/06/030287–15
© 2006 Taylor & Francis
DOI: 10.1080/02602930500352857
Assertion-reason multiple-choice
testing as a tool for deep learning:
a qualitative analysis
Jeremy B. Williams*
Universitas 21 Global, Singapore
Taylor and Francis Ltd
CAEH_A_135268.sgm
10.1080/02602930500352857
Assessment & Evaluation in Higher Education
0260-2938 (print)/1469-297X (online)
Original Article
2005
Taylor & Francis
31
3
000000June 2005
JeremyWilliams
Universitas 21 Global5 Shenton Way, #01-01 UIC Building068808Singaporejeremy@u21global.com
This paper reflects on the ongoing debate surrounding the usefulness (or otherwise) of multiple-
choice questions (MCQ) as an assessment instrument. The context is a graduate school of business
in Australia where an experiment was conducted to investigate the use of assertion-reason questions
(ARQ), a sophisticated form of MCQ that aims to encourage higher-order thinking on the part of
the student. It builds on the work of Connelly (2004) which produced a quantitative analysis of the
use of ARQ testing in two economics course units in a flexibly-delivered Master of Business Admin-
istration (MBA) program. Connelly’s main findings were that ARQ tests were good substitutes for
the more conventional type of multiple-choice/short-answer type questions and, perhaps more
significantly, ARQ test performance was a good predictor of student performance in essays—the
assessment instrument most widely favoured as an indicator of deeper learning. The main focus of
this paper is the validity of the second of these findings, analysis of questionnaire data casting some
doubt over whether student performance in ARQ tests can, indeed, be looked upon as a sound
indicator of deeper learning—student reactions and opinions suggesting instead that performance
might have more to do with one’s proficiency in the English language.
Introduction
Since the time they were first proposed by Arthur Otis, and used on a large scale by
the US Army to measure the abilities of new recruits around the time of the First
World War (Caruano, 1999), the efficacy of multiple-choice type questions (MCQs)
as an assessment tool has attracted considerable debate. As Berk (1998) quips, the
MCQ format ‘holds world records in the categories of most popular, most unpopular,
most used, most misused, most loved, and most hated’. Indeed, the volume of liter-
ature that has been generated on the subject is quite voluminous, the bulk of it
appearing during the 1990s.
*Universitas 21 Global, 5 Shenton Way, #01-01 UIC Building, Singapore 068808. Email:
jeremy@u21global.com

288 J. B. Williams
Following in the footsteps of the United States, many countries around the world
have embraced MCQ testing as the foundation of their testing systems. The main
advantage of MCQ testing is its versatility. There are significant cost savings—
particularly where large numbers are involved—and it is a format that can provide
precision where other measurement options may be lacking (e.g. observing perfor-
mance or interviewing). Criticisms of MCQs, on the other hand, tend to centre upon
unreliability due to random effects (e.g. Burton, 2001), the inequity of the format in
terms of its bias towards certain socio-economic or ethnic groups (e.g. De Vita,
2002), and also the depth of learning the format engenders (or lack thereof) (e.g.
Leamnson, 1999, p. 111).
This paper will reflect on this ongoing debate in the context of a graduate school of
business in Australia where an experiment was conducted in the use of assertion-
reason questions (ARQ), a sophisticated form of MCQ that aims to encourage higher
order thinking on the part of the student. The paper will first provide a brief insight
into the background to the project, before reporting on the results of evaluations
undertaken to date. In particular, it extends the quantitative study of one of the
collaborators on the project (Connelly, 2004), who found ARQ test performance to
be a good predictor of student performance in essay work which, appropriately struc-
tured, is the assessment instrument most widely favoured as an indicator of deeper
learning (Brown et al., 1997; Haladyna, 1999). The findings of this paper do not
contradict the conclusion drawn by Connelly, but suggest that student performance
in ARQ tests may have as much to do with their linguistic skills and the time taken to
process complex prose as with their conceptual understanding and problem-solving
ability. The paper concludes that an ARQ format, although not widely used, consti-
tutes a useful assessment tool and one that appears to be superior to the traditional
MCQ format in terms of student learning outcomes. Importantly, though, to ensure
equitable treatment of student groups, the questions need to tested and carefully
edited, and might be better utilised for formative purposes, as an online, self-paced
learning device. This conclusion is equally applicable to other disciplines and is not
exclusive to economics and other business subjects.
The context
The Brisbane Graduate School of Business (BGSB) is one of six schools in the
Faculty of Business at Queensland University of Technology (QUT), and was formed
in 1995 to administer the MBA—a full-fee-paying program. Commencing in 1999,
an innovative new MBA course structure was introduced offering prospective
students greater flexibility and choice, through seven-week, half-semester-long course
units. Since this time, student numbers have trebled at the same time as course fees
have more than doubled, and entry standards have been raised. The BGSB currently
has around 1,000 students in the MBA and associated programs. Around three
quarters of these students are enrolled for part-time study. These students are almost
exclusively Australian residents. The majority of full-time students are international
in origin, recruited from 35 different countries, nearly all of whom speak English as

Assertion–reason multiple–choice testing 289
their second language. The average age of BGSB students is 33 years old, and male
students outnumber females by a ratio of 3:2.
Funded entirely from student fees, the BGSB, like other institutions in the same
position, is very sensitive to market perceptions of its services. One strategy actively
pursued by the School has been to gain an international reputation for the flexible
delivery of its programs. Flexible delivery is, by definition, a client-oriented approach
because it is a commitment, on the part of the education provider, to tailor courses to
meet the various individual needs of its students. Furthermore, it is tacit recognition
of the fact that the student profile has changed quite dramatically—socially, culturally,
economically—and that, pedagogically, there is a need to cater for this increasingly
diverse student body.
The essence of flexible delivery is that it provides students with a number of differ-
ent options for study. It is not prescriptive in the sense that one approach to study is
identified as being superior to another. A student can chart a route through a degree
that is most compatible with their social, family and working lives, and their
preferred learning style. In short, flexible delivery is non-discriminatory, catering
equally for an international student, a single parent working part-time, or a business
executive travelling regularly overseas and interstate.
At the heart of this strategy of flexible delivery has been the development of online
learning and teaching (OLT) sites. The framework for OLT sites varies from course
unit to course unit but, typically, there is a download facility where students can
access PowerPoint lecture slides, solutions to problems, past examination papers and
the like, discussion forums, chat rooms, and discipline-relevant web links. Until
recently, however, little attention has been devoted to assessment, and how this might
be integrated with the OLT system.
The project
Commencing in 1999, funding was secured to investigate the use of computer-
assisted learning in the form of optional weekly, timed, MCQ tests, which could be
accessed remotely, or on campus. These traditional MCQ tests were being
conducted in class at this time as part of formal assessment, and invigilated in the
standard way. By putting them online, the idea was to enhance flexibility by provid-
ing opportunity for students unable to attend class to complete the tests, while at the
same time freeing up class time for interaction and discussion. By September 2000,
two economics course units within the MBA (GSN411: Economics of Strategy, and
GSN414: Business Conditions Analysis) were trialling online MCQ tests, albeit with
modified objectives.
The tests, accessible via the OLT system, are marked automatically, providing
students with instant feedback on their progress. Early in the trial, a student received
a mark for participation (5% for the completion of five tests, as long as they scored
70% or above in each test) rather than a mark for performance. Even this low weight-
ing was subsequently removed. The decision to go along this path arose largely
because, despite the best efforts of the project team, no solution could be found to the

290 J. B. Williams
problem of invigilation. Quite simply, a test involving the use of ‘point-and-click’
radio buttons was an open invitation for students to cheat if they were unsupervised.
Thus, in the absence of any cheap and readily accessible devices for online test super-
vision, the project team elected to use the test banks they had developed primarily for
formative assessment purposes where they would be used online. However, where
class tests would continue to be held, ARQs would be used in preference to the
traditional MCQ format.
The questions
Although the MCQ format has been criticised almost since the time of its inception,
it perhaps met with its most formidable challenge during the 1990s as an increasing
number of educationalists, guided by the constructivist theories proffered (most
notably) by Marton & Säljö (1976a,b), Entwistle (1981), Biggs (1987, 1993), and
Ramsden (1992), argued for teaching and assessment methods that encouraged
higher order thinking skills. As Steffe and Gale (1995) point out, while constructiv-
ism offers no unitary theoretical position, whichever strand of constructivism one
adheres to, most constructivists would agree that, essentially, learners arrive
at meaning by actively selecting and constructing their own knowledge through
experience (both individual and social).
A key criticism of MCQ has been that real world tasks are not usually multiple-
choice tasks, and passing an MCQ test is not equivalent to mastering a real world
skill. In short, MCQs suffer from lack of authenticity (Wiggins, 1990). Ideally, say the
critics, instructors ought to be asking questions that go beyond the mere memorisa-
tion of facts, encouraging students to apply, analyse and synthesise their knowledge.
Hakel (1998), for example, makes the point that recognising a correct response from
a list does not demonstrate students can construct that response themselves,
the narrowness and appearance of precision in MCQs inhibiting other information
relevant to decision-making.
In response to the constructivists, there has been a steady stream of research work
from the statistics community (see, for example, Case & Swanson, 1996 and
Haladyna, 1999) that has tested a variety of MCQ structures for measuring complex
cognitive outcomes. A strong case for the continued use of MCQs is also advanced
by those advocating computer-assisted learning.
Bracey (1998), a known critic of MCQs, opines that when teaching is lecturing and
testing is multiple choice, one can never know for sure whether students have really
understood what you were trying to teach them. However, he does concede that, at
graduate school level, stems can be more complex and questions more subtly worded
such that understanding is demonstrated. He also acknowledges the great promise of
technology. The uptake of computer-assisted assessment has, indeed, been gathering
pace and if the number and quality of international conferences dedicated to the
subject is anything to go by, there is every chance that, if implemented with pedagog-
ical (as well as technical) awareness, it will serve the educational sector well. This was
certainly the feeling of the project team when it introduced the online ARQ tests.

There is nothing novel about the ARQ format. Heywood (1999) observes that
ARQs first appeared in UK ‘A-level’ secondary school examinations in the 1960s,
although it would seem the format was used even earlier than this in US medical
exams (see Moore, 1954, in Hubbard & Clemans, 1961). It is quite surprising, there-
fore, that the academic literature on the subject is quite sparse. Connelly (2004)
provides an overview of the existing published material (see Newble et al., 1979;
Skakun et al., 1979; Fox, 1983; and Scouller & Prosser, 1994) before going on to
extol the main virtue of the ARQ test item; viz. that ‘its structure facilitates the
construction of questions that test student learning beyond recall. In particular,
higher level thinking and application of key concepts may sometimes be more easily
constructed using this format, than by using a conventional multiple-choice approach
alone’ (Connelly, 2004, p. 362).
A key concern of the project team, mindful of the criticisms made of MCQs, was
to develop question sets that would test reasoning (procedural knowledge) rather
than recall (declarative knowledge). In terms of Bloom’s taxonomy (Bloom, 1956),
the goal was to focus on the highest levels of learning within the cognitive domain:
analysis, synthesis and evaluation (see Figure 1). Carneson et al. (n.d.) in their appli-
cation of Bloom’s taxonomy to different types of MCQs identify ARQs as belonging
to the very highest level in the cognitive hierarchy because they contain elements of
all the other categories, and the fact that ‘one is asked to pass judgement on, for
Figure 1. Bloom’s taxonomy

292 J. B. Williams
example, the logical consistency of written material, the validity of experimental
procedures or interpretation of data’ (Carneson et al., n.d., Appendix C).
Figure 1. Bloom’s taxonomy
An example question is illustrated in Figure 2. Like traditional MCQs, ARQs
present students with a number of possible solutions. In contrast to traditional
MCQs, however, ARQs also include a true/false element (CAA Centre, 2000).
Specifically, each item consists of two statements, an assertion and a reason, that are
linked by the word ‘because’. The student then selects from a multiple-choice legend
after proceeding through a number of steps. First, he or she must determine whether
the ‘assertion’ is true or false, and then whether the ‘reason’ is true or false. If one, or
both, of the statements is deemed false, then the answer will be alternative (c), (d), or
(e) accordingly. If, on the other hand, both statements are deemed true, a third step
is required whereby the respondent must determine whether the second statement
provides an accurate explanation for the first.
Figure 2. Example ARQ question
Traditional MCQs usually test only one issue/concept. ARQs, on the other hand,
test two per question (the assertion and the reason statements) plus the validity of the
‘because’ statement in the event assertion and reason are both correct statements. On
the basis that judging the correctness of two statements must be harder than judging
the correctness of one, it would follow that ARQs present more of an intellectual
challenge than traditional MCQs. One might put forward the case that because
options (a) and (b) require a third step of reasoning (only two steps being required if
the learner correctly identifies a false statement), questions with correct answers of
(c), (d) or (e) may be less effective in terms of learning outcomes. That is, assuming
all answers (a)–(e) occur in roughly equal proportions in an exam, the ‘because’
statement would only be tested in 40% of the questions, the depth of learning being
relatively less the remaining 60% of the time. However, this is hardly a reason for not
using ARQs, as a two-step question is still preferable to a single-step traditional
Assertion Reason
In a small open economy, if the
prevailing world price of a good is
lower than the domestic price, the
quantity supplied by the domestic
producer will be greater than the
domestic quantity demanded,
increasing domestic producer surplus.
BECAUSE In a small, open economy, any
surplus in the domestic market will
be absorbed by the rest of the world.
This increases domestic consumer
surplus.
(a) True; True; Correct reason
(b) True; True; Incorrect reason
(c) True; False
(d) False; True
(e) False; False
(The correct answer is (d).)
Figure 2. Example ARQ question

MCQ. If one wanted to reward ARQ test candidates in accordance with their depth
of learning per question, then one possibility would be to assign a proportionately
higher weighting to questions with solutions of (a) or (b). The project team elected
not to proceed along these lines, preferring, instead, to view the learning experience
of ARQ tests in their totality.
Presented with ten such questions per test, the students involved in the trial were
able to take each test as many times as they wished without penalty. Initially, to be
eligible for the 5% credit allowed for completion of these tests, students were required
to get at least 70% on each of the tests. To encourage the students to persist until they
have got all the questions right, question feedback is generated after each attempt at
the test, without explicitly presenting the student with the correct solution.
The results
As Connelly (2004, p. 363) observes, the writing of ARQ test items for the two
course units trialling the assessment instrument was not particularly difficult, but it
took some time for the students to become accustomed to the format. Where the
questions were used offline in class, more time was required by the students to
compute the solutions than had been the case with traditional MCQs, and average
test scores were significantly lower. These outcomes notwithstanding, student evalu-
ation of the ARQ format has been generally encouraging. Table 1 through to 5
present the results of an online questionnaire administered during March of 2001
that was open to all students who had been enrolled in the two economics course
units over the previous six months. There were 69 respondents, which equates with
around 15% of the total number of students enrolled in these course units during
this time. (Note: some of these students took invigilated ARQ tests in class as well
as online.)
Table 1 clearly illustrates that this form of MCQ tests students’ intellect consider-
ably more than traditional MCQs. Importantly, 64% of students were of the view that
the learning outcomes associated with ARQ tests were superior to those associated
with traditional MCQs, only 16% adjudging them to be inferior (Table 2). Examples
of detailed student comments support this aggregate picture:
Table 1. Level of intellectual challenge presented by ARQ
Question 1: In terms of the intellectual challenge it presented, how did you find
the assertion-reason format? Score %
A. It was very challenging. 30 44
B. It was moderately challenging. 34 49
C. The challenge it presented was no different to any other type of multiple-choice
testing I have encountered.
5 7
D. It was moderately easy. 0 0
E. It was childishly easy. 0 0

294 J. B. Williams
It forced you to learn as you progressed through the unit. It is a very good idea. [serial no. 69]
I liked it because it made you think rather than just match the best one. [serial no. 72]
I thought it was a good tool to help study content with. [serial no. 81]
To make the best use of these quizzes it is necessary to have at least read some of the
material being tested. This is a useful mechanism for slow starters. [serial no.26]
Mimics real world decision making. Useful for understanding concepts. Anything to get
away from rote learning testing, which is non-productive. [serial no. 129]
They [the assertion-reason format] are a much more challenging format from the conven-
tional multiple-choice questions [sic]. Hence seem more suited to a masters level. [serial
no. 34]
The frequency of the online tests (one per week) also receives the resounding
support of the students (Table 3), and while 45% of students felt the mark 5% for
participation (given earlier in the trial) was about right, a total of 42% felt a larger
weight should be attached (Table 4). This sentiment is probably a reflection of the
amount of time and effort required on the part of the student to complete the tests.
However, with a pass mark of 70% and the student able to have as many attempts as
they like with no invigilation, the project team was loath to give a higher weighting.
(Indeed, for reasons explained earlier, the 5% participation mark has since been
removed altogether.)
The results from Table 5 also give the project team some cause for optimism, 56%
of respondents stating that they think the instrument could be used on other course
Table 2. Learning outcomes produced by ARQ
Question 2: In terms of learning outcomes, how did you find the assertion-
reason format compared to the more traditional multiple-choice format? Score %
A. The assertion-reason format produced far superior outcomes. 15 22
B. The assertion-reason format produced moderately superior outcomes. 29 42
C. The learning outcomes were more or less the same. 14 20
D. The assertion-reason format produced moderately inferior outcomes. 8 12
E. The assertion-reason format produced significantly inferior outcomes. 3 4
Table 3. Frequency of ARQ tests
Question 3: How useful did you find the quizzes in terms of their frequency? Score %
A. There were way too many. 1 1.5
B. There was a few more than necessary. 3 4.5
C. One a week is just about right. 61 88
D. There could have been a few more. 0 0
E. There could have been a lot more. 1 1.5
F. No answer. 3 4.5

units, and a further 25% saying that ARQ tests should be a feature of all course units
in the MBA. Only 1.5% of respondents called for the idea to be abandoned.
On the downside, some student comments suggest that caution should be exercised
in the preparation of ARQ questions, implying that the degree of difficulty had a lot
to do with semantics and one’s mastery of the English language. Examples include:
Most questions were very good. Sometimes the wording was quite tricky … the assistance
given after each attempt … helped in the understanding of the subject. [serial no. 8]
… semantics seem to play a large role in determining the correct answer. [serial no. 67]
It was challenging due to ambiguity rather than degree of difficulty. [serial no. 33]
I found the assertion-reason format overly difficult in relation to testing your English skills
rather than knowledge of the subject being studied. i.e. you had to pick up on little idio-
syncrases (spelling?)[sic] on the way in which the questions were worded. This is a difficult
and onerous thing to do when already nervous and on edge in a test environment. [serial
no. 90]
To get an answer right doesn’t necessarily show you are a better economist than someone
who gets it wrong: I think what it shows is that you are better at logically answering struc-
tured problems. [serial no. 26]
With the array of options it is necessary to make sure there are no ambiguous questions or
facts [sic] stated within those questions. I guess this is the same with any multiple choice
exam. [serial no. 26]
Table 4. Weighting of ARQ assessment item
Question 4: Given that there is no mechanism for guarding against student
cheating, what proportion of the course unit marks do you think should be
allocated for these quizzes? Score %
A. The proportion of total marks ought to be increased significantly. 2 3
B. The proportion of total marks could be increased slightly. 27 39
C. 5 per cent is just about right. 31 45
D. The proportion of total marks could be reduced slightly. 1 1.5
E. No marks should be allocated. The assessment should be entirely formative. 7 10
F. No answer 1 1.5
Table 5. Relative merit of ARQ assessment instrument
Question 5: Would you like to see this kind of on-line, formative assessment
used more widely? Score %
A. Yes, it’s a great idea—all course units should have quizzes like these on their
OLT sites.
17 25
B. Yes, it could work well for some course units. 39 56
C. I don’t feel strongly either way. 6 8.5
D. No, the assessment type is ok, but it shouldn’t be used on-line. 4 6
E. No, it’s a waste of time—abandon the idea. 1 1.5
F. No answer. 2 3

296 J. B. Williams
Eighteen months after this initial survey, another online questionnaire was circu-
lated, this time to all MBA students. The survey focused on assessment practices in
general, rather than ARQ tests in particular, but one of the five questions was
dedicated to ARQ questions and their relative merits when compared to traditional
MCQs. A total of 187 students responded which corresponded with approximately
20% of enrolled students. The responses to this question are presented in Table 6.
On this occasion, while the students are still showing a preference for ARQ over
traditional MCQs (39% compared to 29%), this inclination appears less pronounced
than it was 18 months previously. One possible explanation for this is that as ARQ
tests became more widely used in the School, quality control diminished. When the
project team piloted the new assessment instrument, great care was taken to avoid
ambiguity and overly complex language, and as the comments above suggest, they
were not always successful. Upon close inspection of individual comments from
students in the second survey, specifically in relation to Question 6, and in discussing
the matter with a student focus group of nine students (including eight international
students), there is considerable evidence to support this hypothesis. A selection of
student comments is detailed below:
Assertion reason are useful, however are often worded very poorly or ambiguously. They
should be used only when the question can be framed clearly, without ambiguity. [serial
no. 625]
I have been surprised by the extensive use of MC at BGSB. I believe they can be useful as
self-paced learning tools, however I don’t believe that should contribute substantially to
the overall mark. While I understand the theory behind using assertion-reason MC, I
Table 6. The role of multiple-choice type assessments in a business school
Question 6: Which of the following options best describes your view on the role
of multiple-choice type assessments? Score %
A. The more traditional type of multiple-choice questions is a useful means of
assessment because they help me learn. The assertion-reason type of multiple-
choice questions is less useful.
30 16
B. The assertion-reason type of multiple-choice questions is a useful means of
assessment because they help me learn. The more traditional type of multiple-
choice questions is less useful.
42 22
C. The more traditional multiple-choice type questions are a useful means of
assessment because they help me learn, but they should be located on OLT
sites for formative assessment (self-paced learning) purposes only.
24 13
D. The assertion-reason type of multiple-choice questions is a useful means of
assessment because they help me learn, but they should be located on OLT
sites for formative assessment (self-paced learning) purposes only.
31 17
E. There is no place for multiple-choice type questions in a Master’s level
course.
33 18
F. No answer. 27† 14
†The high proportion of people who elected to submit no answer can explained by the fact many students had
had no exposure to ARQ questions and therefore were not in a position to comment.

sometimes feel that the correct answer is too much about interpreting the phrasing of the
question, not about whether I understand the material. [serial no. 634]
Assertion-reason MC test are good for black/white subjects like Economics. In soft
subjects like Entrepreneurship they are very subjective. [serial no. 781]
The assertion-reason questions should not play so much on double negatives as this does
not assist in learning. [serial no. 887]
MCQ questions will work perfectly with the absolute and exact answer questions. i.e.
Finance, Accounting or any exact answers. … For some argueable [sic] answers, these
kind of assessments create confusion and upset since we were forced to accept what the
lecturer thinks is the right answer. Indeed, at Masters’ level, the argument about what we
believe is more important than the right or wrong answer. [serial no. 945]
All the MCQ tests I have done to date—and especially the assertion reasoning ones. Those
currently being used for the 2a Entrepreneurship … unnit [sic] are a classic example: They
often show insufficient attention both to the logic behind the question, and fail particularly
from the use of imprecise language to express the question. This leads to ambiguity and
frustration: what I call speed camera questions! [serial no. 977]
Using multiple choice questions that relate to trickery rather than encouraging learning are
pointless. This has been experienced in a core subject that I have completed and is shared
among other students. [serial no. 1216]
In my personal opinion the assertion-reason questions are very good, but they are much
more difficult for international students, who do not speak English as their first language.
[serial no. 1581]
The assertion reasoning type questions should not be used a summative assessment,
because it takes up too much time especially in strict examination type setting. But it is
useful as a learning tool. [serial no. 1756]
Assertion-reason is an excellant [sic] study tool, however I have found that certain lectur-
ers use vague or misleading statements that do not serve the purpose of assertion reason.
This is the major fault in this type of assessment. [serial no. 1994]
It is not useful when some methods of assessment are unrealistically strict on time e.g.
assertion and reasoning multiple choice questions—I had an exam that was 20 minutes all
up including perusal and there were 15 very complex questions and perusal time end was
not announced. The whole exercise was a waste of time due to unrealistic timing— this
left 1 minute per question and the wording was so complex it took 1 minute to just read
the question. [serial no. 2024]
MCQ’s have an important role to play as they can differentiate understanding and
application of what is read and applied. They should test knowledge and not require you
to have a Masters in English to remove ambiguity. [serial no. 1197]
Most multiple choice type questions end up in debate over English rather than the true
purpose of the question. In real life, where are they used? [serial no. 1364]
Summary and conclusions
This paper began by pointing out that, for many, the MCQ test is an economical and
versatile assessment instrument capable of providing the necessary precision required
to measure learning outcomes. Critics of MCQs, meanwhile, question its validity in
certain settings. Typically, criticisms fall into one of three categories: those that

298 J. B. Williams
concentrate on unreliability arising from random effects such as guessing, those that
focus upon the inequity of the format in terms of its inherent bias towards certain
socio-economic or ethnic groups, and those that question the depth of learning the
instrument is capable of producing.
Mindful of these philosophical positions, an experiment was conducted with
graduate level business students that focused primarily on the third category of criti-
cisms. The aim, simply, was to investigate the robustness of an ARQ test format to
determine whether it was possible to assemble questions that induced the kind of
higher-order thinking generally required of graduate students. The first part of this
experiment was essentially quantitative in nature, and results of regression analysis
showed ARQ test performance to be a good predictor of student performance in
essays; the assessment instrument most widely favoured as an indicator of deeper
learning (Connelly, 2004). The second part of the experiment was to analyse the qual-
itative data collected by the project team to ascertain whether this lent support to the
hypothesis that there is a positive correlation between student ARQ test performance
and their performance in essays.
Analysis of the qualitative data collected in this second phase of the project reveals
that student performance in ARQ tests—in this experiment, at least—may have as
much to do with a student’s linguistic skills and the time taken to process complex
prose than their conceptual understanding and problem-solving ability. This does beg
the question, of course, as to whether students’ essay performance is also largely
determined according to one’s proficiency in the English language. Assuming an
essay is structured to facilitate deep learning in the first place and not to simply regur-
gitate text (Williams, 2004), this ought not be the case as a student is able to construct
a response to an essay question and convey meaning; the structure and internal
consistency of this response being more important than the mastery of the finer points
of English grammar. Such an active role is not possible in an ARQ test setting where
the student role is more passive.
This is not to reject the ARQ model because there is certainly sufficient positive
comment from students to suggest that ARQ tests are capable of producing useful
learning outcomes, especially if there is some interaction as a consequence of their
being online. Importantly, though, for this type of assessment instrument to be truly
effective, psychometric editing of questions is a must. This could be said of all MCQs,
of course, but given the additional complexity of ARQs, it is even more important that
the meaning is clear and the wording free of ambiguity. This means beta testing—not
just to check test design and administration are functional—but to ensure that test
items are constructed in accordance with accepted standards and practices for ‘high-
stakes’ testing. Statistical analysis of the results of the beta test will reveal which ques-
tions are too hard or too easy, which discriminate among more knowledgeable and
less knowledgeable candidates, which show evidence of not being clearly understood,
and so on. Those items that demonstrate good psychometric performance can remain
and those that do not may be edited to improve communication, be it a case of
correcting grammar or spelling, maintaining a consistent style, or removing poten-
tially offensive or biased language.

In the absence of such intervention, one will inevitably be subject to criticism from
those parties who maintain that MCQs produce inequitable outcomes for certain
student cohorts—in this case, those for whom English is a second language (see
Paxton, 2000). For an institution like the BGSB, with its public commitment to flex-
ible delivery, this kind of criticism is particularly unpalatable.
In the context of the two economics course units at the centre of this project, the
professionalopinionofanacademiclinguistandethicist(Gesche,2003)isunequivocal:
For a native English speaker, your ARQ questions are very elegant, precise, concise and
logical. Many of the ARQ questions for GSN411 are linguistically and conceptually just
beautiful. I am not surprised that they received some critical acclaim. … However, for a
non-English speaking person (NESB) this lack of ‘redundancy’ (or economy of words)
plus a selection of relatively uncommon words (for a NESB person) can cause tremendous
problems. … I think it was a good idea to take the quizzes out of any timed, classroom
assessment. …
In conclusion, while the depth of learning is unlikely to parallel that emanating
from more authentic learning tasks such as case study analysis or some other aspect
of a problem-based curriculum (Williams, 2004), the experience with ARQs in this
experiment would suggest that learning outcomes are likely to be superior to those
produced by traditional MCQs which tend to focus on recall rather than reasoning.
An important lesson to be learnt, however, is that there is absolutely no margin for
error when it comes to the authoring of questions. It is conceivable that in disciplines
other than business (science or mathematically-oriented subjects, perhaps), asser-
tion-reason statements may be linguistically more straightforward in which case this
will be less of an issue. However, irrespective of the discipline, if ARQs are to be used
effectively they need to be psychometrically tested prior to use. This would reduce the
likelihood of any criticism from an equity point of view. ARQs might also be more
appropriately utilised in an online environment for formative assessment purposes
only, where there is no time constraint and where there is ample opportunity for
students to master any linguistic intricacies.
Acknowledgments
The project on which this paper reports was made possible through a Small Teaching
and Learning Grant provided by Queensland University of Technology (QUT). The
author wishes to acknowledge the contribution of co-collaborator, Dr Luke Connelly,
and the assistance of Elizabeth Heathcote from the Software, Multimedia and Inter-
net Learning Environments (SMILE) section within Teaching and Learning Support
Services (TALSS) at QUT. The author also gratefully acknowledges the comments
of two anonymous referees on an earlier draft of this paper. Any remaining errors or
inaccuracies are the responsibility of the author, and the author alone.
Note on the contributor
Jeremy Williams is currently Director of Pedagogy and Assessment and Associate
Professor in E-Learning at Universitas 21 Global (U21G), and Adjunct Professor

300 J. B. Williams
in Economics at the Brisbane Graduate School of Business (BGSB). At U21G,
he is responsible for the oversight of all aspects of pedagogy and assessment,
specifically in relation to quality control and the application of best practice. One
of Jeremy’s main research interests is the question of authentic assessment and
the ways in which assessment items might be contextualised to promote greater
student engagement and deeper learning. In addition to his work in the e-learning
area, he has spent the last two decades teaching, researching and consulting in
the field of economics, with work experience in several countries including
Australia, the United Kingdom, France, Singapore, Malaysia and India. Before
joining Universitas 21 Global in 2003, Jeremy was Teaching Fellow and Director
of the MBA program at the BGSB.
References
Berk, R. A. (1998) A humorous account of 10 multiple-choice test-item flaws that clue testwise
students, Electronic Journal on Excellence in College Teaching, 9(2). Available online at: http://
ject.lib.muohio.edu/contents/article.php?article=170 (accessed 31 December 2004).
Biggs, J. (1987) Student approaches to learning and studying (Hawthorn, Victoria, Australian Council
for Educational Research).
Biggs, J. (1993) What do inventories of students’ learning process really measure? A theoretical
review and clarification, British Journal of Educational Psychology, 83, 3–19.
Bloom, B. S. (Ed.) (1956) Taxonomy of educational objectives: the classification of educational goals:
handbook I, cognitive domain (London, Longman Group).
Bracey, G. (1998) Put to the test: an educator’s and consumers guide to standardized testing (Bloom-
ingto, IN, Phi Delta Kappa).
Brown, G., Bull, J. & Pendlebury, M. (1997) Assessing student learning in higher education (London,
Routledge).
Burton, R. F. (2001) Quantifying the effects of chance in multiple choice and true/false tests: question
selection and guessing of answers, Assessment and Evaluation in Higher Education, 26(1), 41–50.
Computer Assisted Assessment (CAA) Centre (2000) Designing and using objective tests (University
of Luton, CAA Centre).
Carneson, J., Delpierre, G. & Masters, K. (n.d.) Designing and managing multiple choice questions.
Available online at: http://web.uct.ac.za/projects/cbe/mcqman/mcqman01.html (accessed 31
December 2004).
Caruano, R. M. (1999) An historical overview of standardised educational testing. Available online at:
http://www.gwu.edu/∼gjackson/caruano.PDF (accessed 31 December 2004).
Case, S. M. & Swanson, D. B. (1996) Constructing written test questions for the basic and clinical
sciences (Philadelphia, PA, National Board of Medical Examiners).
Connelly, L. B. (2004) Assertion-reason assessment in formative and summative tests: results from
two graduate case studies, in: R. Ottewill, E. Borredon, L. Falque, B. Macfarlane & A. Wall
(Eds) Educational innovation in economics and business VIII: pedagogy, technology and innovation
(Dordrecht, Kluwer Academic Publishers), 359–378.
De Vita, G. (2002) Cultural equivalence in the assessment of home and international business
management students: a UK exploratory study, Studies in Higher Education, 27(2), 221–231.
Entwistle, N. (1981) Styles of learning and teaching: an integrated outline of educational psychology for
students, teachers and lecturers (Chichester, John Wiley).
Fox, J. S. (1983) The multiple choice tutorial: its use in the reinforcement of fundamentals in
medical education, Medical Education, 17, 90–94.
Gesche, A. (2003) Personal e-mail (April).

Hakel, M. D. (Ed.) (1998) Beyond multiple choice: evaluating alternatives to traditional testing for
selection (Mahwah, NJ, Lawrence Erlbaum Associates).
Haladyna, T. M. (1999) Developing and validating multiple-choice test items (2nd edn) (London,
Lawrence Erlbaum Associates).
Heywood, J. (1999) Review: assessing student learning in higher education, Studies in Higher
Education, 24(1), 133–134.
Hubbard, J. P. & Clemans, W. V. (1961) Multiple choice questions in medicine: a guide for examiner
and examinee (Philadelphia, PA, Lea and Febiger).
Leamnson, R. (1999) Thinking about teaching and learning (Sterling, VA, Stylus Publishing).
Marton, F. & Säljö, R. (1976a) On qualitative differences in learning—1: outcome and process,
British Journal of Educational Psychology, 46, 4–11.
Marton, F. & Säljö R. (1976b) On qualitative differences in learning—2: outcome as a function of
the learner’s conception of the task, British Journal of Educational Psychology, 46, 115–127.
Moore, R. A. (1954) Methods of examining students in medicine, Journal of Medical Education,
29(1), 23–27.
Newble, D. I., Baxter, A. & Elmslie, R. G. (1979) A comparison of multiple-choice and free-
response tests in examinations of clinical competence, Medical Education, 13, 263–268.
Paxton, M. (2000) A linguistic perspective on multiple choice questioning, Assessment & Evaluation
in Higher Education, 25(2), 109–119.
Ramsden, P. (1992) Learning to teach in higher education (London, Routledge).
Scouller, K. M. & Prosser, M. (1994) Students’ experiences in studying for multiple choice
question examinations, Studies in Higher Education, 19, 267–279.
Skakun, E. N., Nanson, E. M., Kling, S. & Taylor, W. C. (1979) A preliminary investigation of
three types of multiple choice questions, Medical Education, 13, 91–96.
Steffe, L. P. & Gale, J. (Eds) (1995) Constructivism in education (Hillsdale, NJ, Erlbaum).
Wiggins, G. (1990) The case for authentic assessment, Practical Assessment, Research & Evaluation,
2(2). Available online at: http://pareonline.net/getvn.asp?v=2&n=2 (accessed 31 December
2004).
Williams, J. B. (2004) Creating authentic assessments: a method for the authoring of open book
open web examinations, in: R. Atkinson, C. McBeath, D. Jonas-Dwyer & R. Phillips (Eds)
Beyond the comfort zone: proceedings of the 21st ASCILITE Conference (vol. 2). Available online
at: http://www.ascilite.org.au/conferences/perth04/procs/pdf/williams.pdf (accessed 31
December 2004).

Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis

Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis

Recommended

Recommended

More Related Content

Similar to Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis

Similar to Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis (20)

More from Pedro Craggett

More from Pedro Craggett (20)

Recently uploaded

Recently uploaded (20)

Assertion Reason Multiple Choice Testing As A Tool For Deep Learning A Qualitative Analysis