Writing Listening Speaking in the California Framwork
Annotated Bibliography
1. Annotated Bibliography
The Missing Link: Student Learning Outcomes and Language Proficiency Assessment
TESOL 2010 Boston, March 26 2010
Kevin B. Staff
Alvarez, I. (1987). A rationale for discrete-point proficiency/placement testing in the
Southwestern College bilingual office administration program. Unpublished
master’s thesis, SDSU.
The first of several master’s theses by SDSU students cited. After a review of the literature,
featuring the work of Henning, Oller, and Spolsky, Alvarez demonstrates that a discrete-point
multiple choice test can be an adequate assessment instrument in lieu of the time and labor
intensive process of obtaining and evaluating writing samples for placement purposes in one
particular program.
ASCCC (2009). Coding the student progress pathway through basic skills English, ESL,
mathematics and reading courses in California community colleges. Sacramento:
Academic Senate for California Community Colleges.
As part of the Basic Skills Initiative for California Community Colleges, a set of rubrics have been
developed and discussed in committees and one-day conferences among teachers of ESL,
English, mathematics, and reading. These serve as metrics describing a standardized set of
expected outcomes for basic skills courses that can be used to determine equivalencies across
the various campuses of the California Community College system. In the case of ESL, both a
credit and non-credit rubric—with many similarities to each other—are now in place to describe
six levels and outcomes for their corresponding courses that will bring a student’s language
proficiency up to transfer level for “freshman English.”
Ashwell, T. (2000). Patterns of teacher response to student writing in a multiple-draft
composition classroom: Is content feedback followed by form feedback the best
method? Journal of second language writing 9.3.227-257.
Ashwell finds that students seem to rely more on form feedback, i.e. error correction, than on
content feedback. No significant differences were found when one form of feedback was provided
before the other.
Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford
University Press.
Probably the ultimate introductory textbook to testing methods. Bachman covers the uses to
which language tests might be put, a model of communicative language ability that builds on the
well-known Canale & Swain competencies, item and task selection, statistical methods, validity
and reliability, and “persistent problems” that—nearly twenty years after the book’s publication—
are still just as persistent.
Ballard, B. & Clanchy, J. (1991). Assessment by misconception: Cultural influences and
intellectual traditions. In L. Hamp-Lyons (Ed.), Assessing second language
writing in academic contexts (Pp. 19-35). Norwood, NJ: Ablex Publishing Corp.
2. An interesting analysis of the problems of second language writing in academic contexts in terms
of three factors: (1) Language itself, (2) The structuring of ideas, or rhetoric, (3) Attitudes toward
knowledge, or epistemologies. The authors speculate that the latter can be divided into three
kinds of approaches to knowledge: (1) Reproductive, strongly identified with education in Asian
cultures, (2) Analytical, i.e. critical thinking, (3) Speculative. The latter is strongly emphasized in
education in Australia, though Asian students tend initially to find the approach, with its deliberate
searching for new possibilities, pointlessly argumentative.
Blumner, J. (1999). Authority and initiation: Preparing students for discipline-specific
language conventions. In W. Barnett & J. Blumner (Ed.’s), Writing centers and
writing across the curriculum programs (Pp. 33-44). Westport, CT: Greenwood
Press.
A discussion of WAC (Writing Across the Curriculum) programs, which seek to teach students
how to produce “appropriate discourse”. The authors conclude that much of the knowledge
necessary to write in specific disciplines comes from reading and in fact requires knowledge of
content rather than simply language itself. These kinds of advanced writing skills are much more
important for graduate students than for undergraduates, who are required primarily to “relay
information rather than create knowledge”, though undergraduate study would seem an excellent
time to raise awareness of discipline-specific conventions.
Brown, J.D. & Hudson, T. (1998). The alternatives in language assessment. TESOL
Quarterly 32.4.653-675.
A short, concise identification of the assessment options available to language teachers and
programs. The article points out the significance of “washback”, the positive effect assessment
can have on program objectives and instruction. The authors also point out the importance of
using a variety of measures in assessment.
California Community Colleges Chancellor’s Office (2000). California pathways: The
second language student in public high schools, colleges, and universities.
Glendale: CATESOL Publications.
A landmark document that has become influential in the formulation of educational policy,
particularly at the community college level. It includes language proficiency descriptors for the
four skills, based on the ACTFL scale. The latter has become the reference guide for the CB-21
coding of ESL courses, a common description of equivalencies for courses below the “freshman
English” level. Also includes one of the earliest discussions of the “Generation 1.5” phenomenon.
Carlson, S. (1991). Program evaluation procedures: Reporting the program publicly
within the political context. In Hamp-Lyons (Pp. 293-320).
The first of several articles on the political and public policy aspects of language proficiency
assessment. The author prescribes several considerations to limit controversy and maximize the
perceived fairness of evaluation: (1) Assessment instruments that test writing ability in specific
genres and types of writing, (2) Advance notice and preparation in these genres and in the types
of tasks that will form the basis of evaluation. Carlson advises that in some cases, “teaching to
the test” is not necessarily a bad thing. Recurring questions that users of writing assessment
instruments periodically have to address include (1) How can a student who receives good
grades fail the writing test? (2) Why might there be a discrepancy between a writing test and a
writing sample from another situation? (3) Why do readers of an assessment instrument need
training when they ought to be able to recognize “good writing?” (4) How can in-class timed
3. writings be reliable and valid instruments if scores assigned by readers are discrepant? (5) How
can papers contain errors and still receive high scores? (6) Do superficial characteristics in the
writing unduly influence scores? (7) Why is a “top paper” in one program not so in another?
CASAS (2003). CASAS skill level descriptors for ESL & ABE. San Diego: CASAS.
CASAS is a non-profit organization that provides a comprehensive evaluation system, also
helpful in the development of instruction. It is used extensively in non-credit adult education ESL
programs. The two CASAS documents, ESL and Adult Basic Education (ABE), show the
influence of the ACTFL scale, and incorporate skills from SCANS (Secretary’s Commission on
Acquiring Needed Skills). The reading/writing descriptors emphasize skills needed to function in
everyday life, with little reference to academic skills other than the very general “can read and
interpret most non-simplified materials.” This approach to written language demonstrates a very
early “split” between basic life skills that involve reading/writing and the needs of students in
secondary or post-secondary academic programs.
Cummin, A. et al (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing
tasks: An investigation into raters’ decision making and development of a
preliminary analytic framework. Princeton: Educational Testing Service.
The full version of an article that appeared in the Fall 2009 issue of the TESOL Quarterly. The
authors identify a total of 29 strategies and decision-making behaviors employed by 10
experienced ESL/EFL instructors assessing 60 TOEFL essays. The behaviors were categorized
under the three macro-considerations of: Self-monitoring focus, task fulfillment (rhetorical and
ideational) focus, and language focus. Under each macro-consideration, the strategies were
further categorized as either interpretation strategies or judgment strategies. While the list of
behaviors itself is probably too large to be digestible by the average person trying to assess a
given piece of writing, it provides the strongest descriptive framework of what goes on in a rater’s
mind of any reference in this bibliography. The full report is available at:
http://www.ets.org/portal/site/ets/menuitem.c988ba0e5dd572bada20bc47c3921509/?vgnextoid=4
d21af5e44df4010VgnVCM10000022f95190RCRD&vgnextchannel=d35ed898c84f4010VgnVCM1
0000022f95190RCRD
Cuyamaca College (2009). SLO assessment plan, 2009-2014. El Cajon: Department of
Communication Arts.
Documentation of the English as a Second Language section of the college’s English Department,
showing progress on the development of SLOs and a timeline for assessing them.
Damrau, A. & Price-Machado, D. (1998). Integrating SCANS skills in the ESL classroom.
Workshop presented at Palomar College, 2/27/98.
A demonstration that I attended many years ago, when SCANS was the “hot topic”. The
Secretary’s Commission on Acquiring Needed Skills developed a list of skills that could be
incorporated into most educational settings, regardless of discipline. In an adult education context,
the foundation skills are identified as (1) Basic skills such as reading, writing, and quantitative
operations, (2) Thinking skills such as making decisions and reasoning, (3) Personal qualities
such as responsibility and honesty, (4) Resource management, which includes allocating time,
money, and resources, (5) Interpersonal skills such as working in teams and in a culturally
diverse setting, (6) Information management, which includes acquiring facts and interpreting
information, (7) Systems management, which includes understanding of social organization and
technological systems, (8) Technology, i.e. using computers for simple tasks.
4. Donigan, L. (2009). Community college rap session: CB-21 codes and ESL rubrics.
Collaborative workshop at CATESOL ’09, Pasadena.
A very ambitious yet successful session where a large group of community college ESL
instructors evaluated and chose to adapt the proficiency scale from the California Pathways
documentation as a set of rubrics for CB-21 coding, the descriptors of ESL course equivalencies
below the “freshman English” level.
Elbow, P. (1996). Writing assessment: Do it better; do it less. In E. White et al (Ed.),
Assessment of writing: Politics, policies, practices (Pp. 120-134). New York: The
Modern Language Association of America.
The author argues—in his inimitable style—that portfolio assessment is the only fair and
professional way to evaluate student writing, citing 19 articles and studies critical of holistic
assessment. He refers to a holistic score as “nothing but a single point on a yea-boo applause
meter.” In the end, however, he acknowledges that a limited amount of holistic scoring in a timed
test situation may be needed, though he much prefers multiple trait scoring where practicable.
Ferris, D. & Hedgecock, J. (1998). Teaching ESL composition. Mahwah, NJ: Lawrence
Erlbaum Associates.
The authors provide a persuasive argument that reading proficiency is a good—but not perfect—
indicator of writing ability. Reading consists largely of constructing meaning through schemata, i.e.
using knowledge to build knowledge. Writing is an improvable skill, best learned by doing. Ferris
& Hedgecock also devote more attention than most authors in this bibliography to the problem of
“authenticity” in portfolio assessment, though on the balance they feel that a portfolio approach
provides a good learning experience in “process”.
Forstrom, J. (2009). Assessing English literacy civics. CATESOL News 40.3.1-5.
This article provides a good overview of the federally funded grant that connects non-credit ESL
classroom-based learning with student success in the community. In California, evaluation is
conducted through pre- and post-CASAS testing as well as EL Civics assessments developed
locally. There is some reference and use of the U.S. Department of Education’s SCANS
(Secretary’s Commission on Acquiring Needed Skills) in describing desired outcomes, and
though EL Civics probably comes as close as anything in an American educational context to a
standardized national curriculum, students can be surveyed for their interests, with lessons and
assessments developed around the needs of specific educational contexts. This is particularly so
when EL Civics is used in conjunction with CBET classes. The focus of instruction in ESL Civics
is distinctly adult education for practical purposes rather than for acquiring academic skills, and
provides an interesting alternate view of what it means to “know a language.”
Forstrom, J. et al (2009). Teaching writing across the levels : Pre-assessment,
implementation, and evaluation. Workshop presented at CATESOL ’09,
Pasadena.
A very practical and concise approach to getting a handle on teaching and evaluating writing at
various levels, including the selection of tasks as learning experiences and evaluation
instruments. Though not identified as such in the materials, some of the writing tasks come from
the EL Civics curriculum, and entail practicing real-life writing tasks such as reporting an accident.
5. Gearhart, M. (1994). Toward the instructional utility of large-scale writing assessment:
Validation of a new narrative rubric. Project 3.1. Studies in improving classroom
and local assessments. Portfolio assessment: Reliability of teachers’ judgments.
Los Angeles: National Center for Research on Evaluation, Standards, and
Student Testing.
A report on the Writing What You Read (WWYR) rubric developed for assessing the writing of
elementary school students. An interesting contrast to the issues of assessing adult ESL learners’
abilities, with good discussion of rubrics in general for the assessment of writing, their purposes
and shortcomings, and their three main types: Holistic, primary trait, and analytical. The WWYR is
analytical, and used to rate the quality of narrative writing specifically. Its categories are:
(1) Theme, (2) Character, (3) Setting, (5) Plot, and (5) Communication. The mechanics of
punctuation, grammatical accuracy and such are not addressed in the WWYR.
Greenberg, Ingrid (1993). Building on the past, looking toward the future: An ESL
teacher reference for writing instruction in adult education. Unpublished master’s
thesis, SDSU.
Another gem among the master’s theses in the SDSU Library. Excellent literature review and
summary of recurring issues and insights, with a discussion of why writing skill has been so often
de-emphasized in adult education ESL. Greenberg advocates the “process-based” approach,
with free expression followed by revisions. Lots of good information, but not really applicable to
the kind of “classroom genres”, such as responding to essay prompts, that students encounter in
an academic context. One interesting insight is the fact that writing instructors often encourage
free expression, then grade primarily on surface-level features.
Hamp-Lyons, L. (1996). The challenges of second-language writing assessment. In
White et al (Pp. 226-240).
Hamp-Lyons, editor of the single most useful book in this bibliography, Assessing second
language writing in academic contexts, contributes a chapter to White’s Assessment of writing:
Politics, policies, practices. She cites studies showing that university faculty are in general more
tolerant of errors in writing by nonnative speakers of English than of natives, and also points out
that rhetorical styles are a strong influence on the judgment of writing quality. This means that an
instructor used to working with Japanese students might become more tolerant of errors and
unconventional usages common to Japanese students than they would be toward nonnative
students from other language backgrounds. The article includes a discussion of TOEFL scores
and the TWE (Test of Written English) portion of the TOEFL, which is not always considered in
the admissions process.
Hamp-Lyons, L. & Henning, G. (1991). Communicative writing profiles: An investigation
of the transferability of a multiple-trait scoring instrument across ESL writing
assessment contexts. Language Learning 41.3.337-373.
The article features a rubric called the New Profile Scale (NPS) used to assess 91 essays written
for the Test of Written English section of the TOEFL and 79 essays written for the University of
Michigan Writing Assessment. The authors found it to be reliable in composite assessment, but
also found little psychometric support for assessing certain individual components of the rubric.
There is some discussion of “unidimensionality”, the assumption that a composite profile
operationally defines a single latent continuum of ability. The rubric evolved from Hamp-Lyons’s
work with the British Council, and has nine bands or levels. The seven components were based
on observations by readers rather than on an underlying linguistic theory. They are
6. (1) Communicative Quality, (2) Interestingness, (3) Referencing, (4) Organization, (5)
Argumentation, (6) Linguistic Accuracy, and (7) Linguistic Appropriacy.
Higgs, T. (1984). Language teaching and the quest for the holy grail. In T. Higgs (Ed.),
Teaching for proficiency, the organizing principle (Pp. 1-9). Lincolnwood, IL:
National Textbook Co.
A classic article by the late Ted Higgs, building on the previously published The push toward
communication with Ray Clifford while the latter was dean of the Defense Language Institute.
Provides a description of the ACTFL and ILR (Interagency Language Roundtable) scales and the
kinds of generalized behaviors exhibited at each level of proficiency. ILR Level 2+, called
“Superior”, is referred to as an “instructional ceiling”, beyond which the language probably must
be “lived” for proficiency to continue to improve. Probably most applicable to oral language, but
provides an excellent introduction to the nature of these important general proficiency scales.
Hirsch, E.D. (2010). Creating a curriculum for the American people. American Educator
33.4.6-13.
A well-written critique of the progressive movement, or “anti-curriculum movement” that took hold
th
in public secondary education in the latter half of the 20 century. The author argues that shared
knowledge is essential to language comprehension as well as sense of community, and laments
the emphasis of the movement on critical thinking skills rather than facts. For the author, resisting
a rigorous academic curriculum in favor of encouraging children to develop their skills using
whatever content they find engaging is contrary to a large body of cognitive science research,
and has resulted in a reduction in shared knowledge among the populace and a surprising
ignorance of what several generations ago would have been regarded as common knowledge.
His proposals for implementing a “common core curriculum” are not unlike the description of EL
Civics administration in Forstrom (2009), allowing for local autonomy and a variety of forms of
instruction while providing a guiding structure and central core elements common to all citizens.
Horowitz, D. (1991). ESL writing assessments: Contradictions and resolutions. In
Hamp-Lyons (Pp. 71-85).
My favorite article by a late acquaintance who passed long before his time was due. Horowitz
asks whether a common core of academic writing ability might exist, when writing tasks vary
greatly both by discipline and by genre. He poses the laugh-out-loud rhetorical question of
whether any writing test can claim validity unless it is written for a particular individual in a
particular course in a particular program at a particular time. Inherent contradictions include the
tendency of test designers to seek generality, i.e. trying to mitigate differences in background
knowledge, while the designers of academic tasks seek specificity, i.e. trying to find evidence of
mastery of a body of knowledge. By way of solutions, Horowitz argues that both timed essay
exams and out of class writings with editing and revision are needed for assessment, and cites
the TOEFL’s TWE section as an admirable attempt to provide two generalized writing tasks that
minimize cultural and knowledge bias.
Hyland, K. & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly
41.2.235-253.
The short answer to the question they pose is “No.” The authors cite corpus research of the
widely used AWL (Academic Word List) to demonstrate that lexical items occur and behave
differently across disciplines. Well… they do occur, don’t they?! The article also seems to carry a
7. bit of knowing smugness at the fact that systematic analysis of text using modern computational
methods often turns widely held presumptions about language behavior on its head.
James, M. (2009). “Far” transfer of learning outcomes from an ESL writing course: Can
the gap be bridged? Journal of second language writing 18.32.69-84.
In this study, 30 advanced ESL undergraduates enrolled in a one-semester academic writing
class were interviewed on their use of learning outcomes from the class in performing a writing
task on a science article they had read. It was found that over half of the students did not
purposely or consciously make use of the learning outcomes, and the author poses the question
of how transfer can be achieved most effectively. Perhaps a better question would be whether
specific outcomes/strategies need to be purposely employed.
Jeffrey, J. (2009). Constructs of writing proficiency in U.S. state and national writing
assessments: Exploring variability. Assessing writing 14.1.3-24.
A very comprehensive analysis of prompt/genre demands and assessment scoring criteria in the
nationally administered ACT, SAT, and NAEP exams as well as 41 state writing exams for
secondary school students. The prompts for the state exams were categorized by genre, with the
number of states employing each genre in at least one of the writing tasks for their exams, as:
Persuasive (24), Argumentative (18), Narrative (10), Explanatory (10), Informative (3), and
Analytic (3). Wisconsin is the only state that provides information on the theoretical underpinnings
of the tasks and assessment criteria. The writing tasks on the nationally administered exams are
described as reflecting greater consciousness of genre and more coherent conceptualizations of
proficiency than are nearly all of the state exams.
Jeffries, M. & Youngjoo, Y. (2008). Relationship between spoken and written discourse
of a generation 1.5 student in a college ESL composition class. The CATESOL
Journal 20.1.65-81.
This is a case study of a German speaking “Generation 1.5” ESL student in a college composition
class. Like many Generation 1.5ers, the student writes as she speaks and has difficulty producing
appropriate academic discourse. Explicit instruction was found to be partially effective. The
authors identify three categories of revision suggestions used to guide the student to producing
more appropriate discourse: (1) Sentence-combining revisions, (2) Use of formal rather than
informal language, and (3) Additions of connectors and explanations, due to the nature of writing
as a medium with greater “distance” between the writer and the intended audience. It is cautioned
that these types of errors or inappropriate usages are not unique to ESL students. “Focused
reading” is recommended as a means of explicit instruction, though in my own experience it is
one of many partially effective techniques that some students “get” while others don’t.
Johns, A. (1991). Faculty assessment of ESL student literacy skills: Implications for
writing assessment. In Hamp-Lyons (Pp. 167-179).
Here Ann wrestles with the problems of how to instill academic literacy in a group of student
writers who seem to lack not only language skills but the background knowledge needed to
succeed in an undergraduate political science class. She describes activities to provide students
with a “sense of audience”, having them answer questions about the intended audience for a
given piece of writing, the prospective readers’ academic background, biases, and knowledge of
the world. She shares Horowitz’s concern about how to construct generalized writing tasks useful
for academic writing practice, and proposes two genres with wide applicability: (1) Argumentation,
taking the form of claim/warrant/data, and (2) Problem/solution.
8. Johns, A. (1995). Teaching classroom and authentic genres: Initiating students into
academic cultures and discourses. In D. Belcher & G. Braine (Ed.’s), Academic
writing in a second language: Essays on research and pedagogy (Pp. 277-291).
Norwood, NJ: Ablex Publishing Corp.
The author provides a brief history of recent trends in the teaching of writing, including the
“process movement”, manifested in two distinct approaches: (1) Expressivism, or free writing as a
means of eliciting a quantity of output, and (2) Cognitivism, based on pre-planning and thoughtful
revision. The approach entailed no conscious awareness-raising of genre, and the author
perceives a need to go beyond such an approach even at the undergraduate level, through the
introduction of “classroom genres” that don’t necessarily resemble real-world writing tasks but
nonetheless provide an introduction to genre awareness. An ATP (Academic Task Portfolio) is
proposed, consisting of five types of tasks: (1) Data-driven writing, based on an interview with a
subject matter expert, (2) Library assignment, where students synthesize insights from various
sources, (3) Abstract writing, the summary of an article, (4) An extended essay, written out of
class with revisions, (5) An in-class writing, as response to an exam prompt.
Kawaguchi, L. (2009). What does proficiency look like on the ACCJC rubric? Rostrum,
September 2009, 6-7.
A good institutional overview of the development of SLOs and their importance in the
accreditation process. ACCJC is the Accrediting Commission for Community and Junior Colleges,
one of three commissions under the larger entity of Western Association of Schools and Colleges
(WASC). ASCCJC is responsible for the accrediting of associate degree granting institutions in
California, Hawaii, and the former Pacific Trust Territories. While the federal government’s
Department of Education has an interest in the development of SLOs in higher education, there is
a lot of local autonomy and very little “enforcement” other than the authority of non-governmental
commissions such as WASC to bestow or withhold accreditation based on an educational
institution’s progress in developing and assessing course level, department level, and degree
level SLOs, and eventually reaching the goal of “Sustainable Continuous Quality Improvement.”
Kermane, B. (2009). The broken window syndrome: Bad spelling, poor grammar? No
problem! Questions on evaluating student writing. Paper presented at
CATESOL ’09, Pasadena.
Less a research project than a discussion session early one morning at the conference, a small
group of attendees compared notes on the challenges of teaching Generation 1.5 students and
the pressure to show measurable results with a student population that often just doesn’t seem to
“get it.” The question of portfolio vs. in-class timed writing was re-visited, with consensus that the
latter is probably a more accurate measure of actual proficiency. The first, however, provides
opportunities for learning experiences that might lead to improvements in overall language
proficiency and the production of appropriate academic writing. The problem is with assigning a
meaningful grade to such projects.
Kovach, C. (1992). Understanding essay prompts: Suggestions for teaching English for
academic purposes. Unpublished master’s thesis, SDSU.
In this third SDSU master’s thesis the author, an instructor at San Diego City College, explores in
detail the problem of developing appropriate essay prompts for in-class timed writings in content-
area classes. Second language students often have trouble with this particular “classroom genre”,
particularly when content-area instructors scrutinize closely spelling and grammatical errors. Lack
9. of a “sense of audience” is a recurring theme, particularly for students schooled in the process
approach. Major stumbling blocks in the essay prompts include the use of metaphor and idiomatic
expressions unfamiliar to many second language students, linguistically complicated sentences
or the use of more than one sentence, and the use of vague instructional verbs.
Larson, J. & Jones, R. (1984). Proficiency testing for the other language modalities. In
Higgs (Pp. 113-138).
Most notable for its dearth of advice on the testing of writing proficiency, the authors begin by
drawing a distinction between communicative competence and accuracy of usage, suggesting
that the latter is a more appropriate definition for most contexts that entail daily interaction with
native speakers of the language. The high intercorrelation of test components among large test
populations provides strong evidence for the interrelationship of the four skills, and Oller’s “unitary
factor hypothesis” is briefly resurrected. The discussion of writing skill begins by stating that
“there is a much greater difference in ability among both first- and second-language users in
writing than in any of the other modalities.” Five general types of writing tasks are identified: (1)
Correspondence, (2) Providing essential information, (3) Completing forms, (4) Taking notes, and
(5) Formal papers. The latter, obviously, is the most difficult and the most diverse across genres
and disciplines. Larson & Jones suggest that writing, like speaking in the OPI (Oral Proficiency
Interview) be tested directly and evaluated according to a proficiency description.
Liesberg, H. (1999). A comparative analysis of English placement tests: Computer
adaptive vs. traditional methods. Unpublished master’s thesis, SDSU.
In a study similar to that of Alvarez, the author concludes that the LOEP (Levels of English
Proficiency) test, a computer adaptive instrument that adjusts item difficulty to student responses,
is an adequate assessment instrument for placement purposes in lieu of eliciting and evaluating
writing samples. The study was conducted at Grossmont College.
Liskin-Gasparro, J. (1984). The ACTFL proficiency guidelines: A historical perspective.
In Higgs (Pp. 11-42).
The author traces the evolution of the guidelines from their intial development in the 1950s at the
U.S. Foreign Service Institute. Some earlier history of teaching and proficiency assessment in
government language-teaching programs, including the roots of the audiolingual movement, are
also covered, traceable to a pre-WWII intensive language project developed by the ACLS
(American Council of Learned Societies) on a Rockefeller Foundation grant. Since 1968, the
government’s version of the general proficiency scale has been known as the ILR (Interagency
Language Roundtable) definitions. The ACTFL guidelines are the result of a U.S. Department of
Education study entitled “A Design for Measuring and Communicating Foreign Language
Proficiency.” They are intended as an organizing principle, around which various methods,
approaches, materials, and curricula might be reconciled.
Lutz, W. (1996). Legal issues in the practice and politics of assessment in writing. In
White et al (Pp. 33-44).
The author addresses the important issue of legal implications in the use of assessment
instruments. While courts have shown a self-imposed restraint on second-guessing professional
educators in the public sector, challenge is possible under two main bases: (1) Title VI of the Civil
th
Rights Act, and (2) The Equal Protection and the Due Process clauses of the 14 amendment of
the U.S. Constitution.
10. Macken-Horaik, M. (2002). Something to shoot for: A systemic functional approach to
teaching genre in secondary school science. In A. Johns (ed.) Genre in the
classroom: Multiple perspectives (Pp. 17-42). Mahwah, NJ: Lawrence Erlbaum
Associates.
Eight key genres used in the teaching of writing across the curriculum in Australia are identified,
using a systemic functional linguistics approach. This differs from the English for specific
purposes (ESP) approach in its concern with “elemental genres in society” rather than with
discourse communities. The key genres are categorized as: (1) Recount, (2) Informational report,
(3) Explanation, (4) Exposition, (5) Discussion, (6) Procedure, (7) Narrative, (8) News story.
McDonald, M. (2002). Systematic assessment of learning outcomes: Developing
multiple-choice exams. Sudbury, MA: Jones and Bartlett Publishers.
The author demonstrates the usefulness of multiple-choice exams for measuring learning
outcomes for the training of nurses. The exams measure acquisition of very specific information
with clear right/wrong answers. The contrast with language training is clear, and the inadequacy
of such exams for ESL purposes, especially if used alone, becomes apparent, The author draws
an interesting distinction between formative and summative evaluation, i.e. how the student is
progressing vs. what the student knows.
Mowry, M. (1996). Thirty years of first and second language composition theory and its
relevance in the contemporary composition classroom. Unpublished master’s
thesis, SDSU.
The final master’s thesis cited here, the author provides a rich review of the rise and fall and
resurrection of various approaches, with the interesting perspective of an English major rather
than an ESL or applied linguistics major. Good discussion of the relationship of L1 to L2 writing
theories. The author advises students to be aware that “school, work, and community are
different domains of literacy.”
Nam, M. et al (2008). Writing socialization for South Korean graduate students in a
North American academic context. The CATESOL Journal 20.1.49-64.
A non-empirical review of literature and studies concerning the difficulty of teaching appropriate
academic writing to students who lack a background in performing academic writing tasks even in
their native culture. Uses a contrastive rhetoric approach to explain some difficulties, such as the
tendency of a thesis statement to appear at the end of an article in Korean writing. Includes
several hardly surprising insights, such as that language socialization (LS) and legitimate
peripheral participation (LPP)—i.e., acquiring a sense of appropriateness from observation—is
key to socialization into the target academic community.
North, Brian (2000). The development of a common framework scale of language
proficiency. New York: Peter Lang Publishing.
A reference I’d have missed if it weren’t sitting next to my master’s thesis in the SDSU Library.
Describes early work on developing the 6-level Common European Framework of Reference for
Languages through the auspices of the Council of Europe. More updated information and
applications to specific languages, including English through Cambridge ESOL, is available online.
The link on my presentation outline goes to the latter. A more general description of the
framework is available at: http://www.coe.int/T/DG4/Portfolio/?M=/main_pages/levels.html/
11. Palomar College (2007). Course outlines for ESL levels 1-6. San Marcos: Palomar
College ESL Department.
The internal departmental documentation for each level in the college’s ESL program follows a
format similar to that of other institutions, specifying (1) The catalogue description, (2)
Prerequisites, (3) Entrance skills, (4) Course content, i.e. skills to be addressed and developed,
(5) Course objectives, (6) Method of evaluation, (7) Special materials required of the student, (8)
Minimum instructional facilities, (9) Method of instruction, (10) Texts and references, and (11) Exit
skills. Numbers 5 & 6 are in the process of being subsumed under the category of Student
Learning Outcomes… So, where do Exit Skills (#11) fit into this new way of looking at things in
terms of SLOs?
Palomar College (2009). Assessment tools. Documentation from Palomar College
Learning Outcomes Council summer workshop.
The most salient point in the workshop is the need for “triangulation” in SLO (Student Learning
Outcomes) assessment, i.e. using a variety of different tasks to assess. Several dichotomies in
types of assessment data and assessment methods are defined: (1) Direct/Indirect data, or
measurement of an exact value vs. evaluation of a trait (2) Qualitative/Quantitative data, or
descriptive information vs. numerical/statistical values, (3) Formative/Summative assessment, or
feedback for development vs. final determination, (4) Criterion-/Norm-referenced assessment, or
scoring according to standards vs. ranking among individuals, (5) Embedded/Standardized
assessment, or assessment that occurs within regular class activity vs. tests developed for broad
public usage and data comparison.
Perkins, K. (1983). On the use of composition scoring techniques, objective measures,
and objective tests to evaluate ESL writing ability. TESOL Quarterly 17.4.651-71.
The author identifies four main types of assessment instruments for evaluating writing ability: (1)
Holistic, or a single score based on a scale or descriptive rubric, (2) Analytical, or a series of
scores usually based on a rubric with several categories, (3) Primary trait scoring, where a piece
of writing is evaluated for a single feature with other features not taken into consideration, and (4)
Objective, i.e. a multiple choice test. Perkins feels the literature supports the conclusion that
objective measures, even though they do not evaluate writing directly, work well much of the time.
Pike, J. & Weldele, C. (2009). Generation 1.5 students: Diverse avenues to academic
literacy. Paper presented at CATESOL ’09, Pasadena.
Perhaps the best of several presentations at the conference on “Generation 1.5” students, the
children of immigrants who are functionally bilingual in oral language but lack academic skills,
particularly in writing. BSI (the Basic Skills Initiative) was implemented largely with these kinds of
students in mind, and necessitates a heavy dependence on content area instructors to recognize
the language needs of second language students and adjust instruction accordingly by providing
a form of “sheltered immersion”. A laudable idea, but will content area instructors embrace it?
Richards, J. (1985). The context of language teaching. Cambridge: Cambridge
University Press.
A collection of previously published papers by The Old Master. Particularly insightful is Chapter 3,
“The secret life of methods”, in which Richards argues that broad issues of curriculum
development and evaluation should take precedence over the comparison of particular
12. methodologies. Chapter 10, “The status of grammar in the language curriculum”, provides
support for my view that a test of discrete-point grammatical knowledge should be a component
of SLOs measurement. Though skeptical in some of his writings of the usefulness of general
proficiency guidelines—at least beyond the lowest levels—his interest in outcomes is not
inconsistent with the concerns of the “proficiency movement”.
Ruth, L. & Murphy, S. (1988). Designing writing tasks for the assessment of writing.
Norwood, NJ: Ablex Publishing Corp.
The most comprehensive book in the literature, citing many psychometric studies. Contains the
maxim: “If specifying form, leave content open. If specifying topic, liberate form.” The authors
specify that any task should: (1) Be interesting to the writer, (2) Be interesting to the evaluator, (3)
Furnish data to start the task from, (4) Be meaningful within the writer’s experience, (5) Elicit a
specific response and place limits on content or form, (6) Suggest an audience, and (7) Have
more than just a title as guidance.
Ryan, B. (2004). Advanced composition for ESL students. Durham, NC: Carolina
Academic Press.
A textbook for teaching, Ryan designs projects around eight specific tasks or genres:
(1) Narratives, (2) Description of processes, (3) Description of people, places, and things,
(4) Comparison and contrast, (5) Evaluation, i.e. describing and comparing, (6) Problem/solution,
(7) Cause and effect, (8) Research.
Scott, C. (2009). Issues in the development of a descriptor framework for classroom-
based teacher assessment of English as an additional language. TESOL
Quarterly 43.3.530-535.
This rather concise article in a special issue of the TQ concerned with classroom-based teacher
assessment identifies factors that make the use of a common framework or a single scale for
describing the understanding and use of language problematic. The four main issues concern:
(1) Different learner groups, including children at different stages of cognitive development,
learners with different levels of formal education and acculturation, and learners whose native
language does not use the Roman alphabet; (2) Proper categorization of descriptors in terms of
the 4 skills vs. the genre/field/tenor/mode categories of the systemic functional approach;
(3) Organizing the descriptors by level while taking into account the different learner groups; and
(4) The cognitive-affective dimension, meaning fatigue (or extreme hesitation) due to language
overload.
Song, B. & August, B. (2002). Using portfolios to assess the writing of ESL students: A
powerful alternative? Journal of second language writing 11.1.49-72.
A fairly recent article that I’d be remiss in not citing, but proof that there’s really nothing new under
the sun. A re-visit to the arguments in favor of portfolio assessment as an alternative means of
assessing writing proficiency.
Stevens, D. & Levi, A. (2005). Introduction to rubrics. Sterling, VA: Stylus Publishing.
A practical guide to developing and using rubrics in various disciplines. The author identifies the
four components of a rubric as: (1) The task description, including a descriptive title for the task,
(2) The scale, generally with four possible levels of achievement that correspond to grades A-D,
(3) Dimensions, which outline the skills and knowledge involved in task accomplishment,
13. (4) Specific feedback, in the form of descriptors for each level of performance on the scale. A
website to accompany the book: styluspub.com/resources/introductiontorubrics/aspx.
Tedick, D. & Matheson, M. (1995). Holistic scoring in ESL writing assessment: What
does an analysis of rhetorical features reveal? In D. Belcher and G. Braine (Pp.
205-230).
The authors in general dislike holistic scoring, feeling it too susceptible to cultural values in the
evaluator. They identify two main criteria in the evaluation of rhetorical features: (1) Framing, or
the way the writer sets the scene for the rest of the exposition, (2) Elements of task compliance,
or the somewhat arbitrary way a rubric or set of evaluation criteria are analyzed by the evaluator.
A writing sample with good framing might be evaluated highly even if the elements of task
compliance are weak, simply because the writing gives a good first impression. Writers from
some cultural backgrounds, where acceptable rhetorical style encourages minimal framing with
discussion saved for the end, might be at a disadvantage in such an evaluation even though the
other elements of task compliance are strong.
TESOL (1999). Position statement on the acquisition of academic proficiency in English.
Alexandria: Teachers of English to Speakers of Other Languages, Inc.
Among TESOL’s periodic pronouncements, perhaps one of the few that has had a discernable
impact on the educational community. Its main tenets are that: (1) Language acquisition is a long-
term process, (2) There is a clear distinction between social language and academic language (It
is implied but not stated that acquisition of “fluency” in everyday situations does not equal
acquisition of appropriate language for academic success), (3) Students need to attain rigorous
standards for the use of culturally appropriate English in both social settings and academic
content areas, (4) Students are heterogeneous in background, with variations in learning, (5)
There exist identifiable predictors of student success, most specifically including content-based
instruction.
Turner, C. & Upshur, J. (2002). Rating scales derived from student samples: Effects of
the scale marker and the student sample on scale content and student scores.
TESOL Quarterly 36.1.49-70.
The authors’ main argument is that any rating scale based on general theory will not be
appropriate for assessing performance on any specific task. Their study found that specific writing
samples used to train for interrater reliability tend to take on a life of their own and have a greater
effect on ratings than the scale itself.
Vaughn, C. (1991). Holistic assessment: What goes on in the rater’s mind? In Hamp-
Lyons (Pp. 111-125).
Not unlike Turner and Upshur, Vaughn finds that rater fatigue can have a deleterious effect on
ratings, stating that inevitably a large number of papers become “one long discourse” and come
to be compared with each other rather than with the rubric or scale criteria. With this in mind, the
author feels strongly that holistic scoring by “trained experts” should not replace the judgment of
the classroom teacher who works on a regular basis with the students being evaluated. Again
and finally, an argument that portfolio assessment should be at least a supplement to formal
evaluations. Favorite quote from the article: “Holistic assessment is a lonely act.”
14. U.S. Department of Education (2006). A test of leadership: Charting the future of U.S.
higher education. Jessup, MD: Education Publications Center.
A report commissioned by the Secretary of Education, containing various recommendations for
the improvement of higher education in the U.S. Clarion call for the student learning outcomes
movement. Available online at: http://ed.gov/about/bdscomm/list/hiedfuture/reports/final-report.pdf
Xu, Y. and Lin, Y. (2009). Teacher assessment knowledge and practice: A narrative
inquiry of a Chinese college EFL teacher’s experience. TESOL Quarterly
43.3.493-513.
This second article from the special TQ issue on classroom-based teacher assessment is a case
study of the problems of assessment/decision making when teachers’ own judgment conflicts with
external demands such as social realities and power arrangements. It makes the argument that a
teacher’s personal judgment is important, however much guidance is provided by rubrics or
training. The conflict between organizational expectations and personal judgment is referred to as
“sacred stories” vs. “secret stories.” Notable quote: “…the interactive and context-dependent
nature of teacher-based assessment suggests that teachers need space and resources to
develop their own interpretations and adjustments of rubrics according to their students’ learning,
even though a common understanding has been considered a prerequisite for valid assessment.”
Conclusions Drawn From the Bibliography
●General proficiency scales work best at the lower levels. The description of writing proficiency is
more problematic than the description of oral proficiency. Nonetheless, it is possible to use a general
proficiency scale as a “chassis” on which to build content-specific descriptions applicable to specific
contexts.
●The traditional six level program carries a certain amount of psychological reality for placement
and assessment purposes, dividing learners or candidates into distinct groups with many similarities in
general proficiency.
●For the assessment of writing, portfolio vs. timed essay have long battled for supremacy as the
most desirable means of determining students’ true ability. My own conclusion is that portfolios are fine
for formative assessment and as learning experiences, but timed essays are better summative measures.
●Indirect measures of writing ability, such as multiple choice tests measuring discrete point
grammatical knowledge, work reasonably well, especially for placement purposes. However, when
practicable a direct measure—writing sample—is preferable.
●Writing effectively in different genres is acquired primarily through exposure, i.e. through reading
in different genres. The specific teaching of genre awareness is more effective with some individual
students than with others.
●Regardless of the amount of standardization training, quality of rubrics, or ethical and legal
implications of the decisions made, the assessment of writing always has an element of subjectivity. High
inter-rater reliability is often possible nonetheless.
●Whenever practicable, “triangulation” through the use of multiple assessment instruments is
desirable, keeping in mind that students’ levels of oral and written proficiency may be very different.