This manuscript describes a two-phase process to redesign a university graduate college's program evaluation measures and methods. In phase one, the team conducted a needs analysis through literature review, focus groups, and interviews. Based on stakeholder feedback, they redesigned the evaluation instruments and conducted alpha testing. In phase two, they revised the instruments based on alpha testing data and conducted beta testing with over 2,000 students. The final redesigned evaluation system replaces a paper exit questionnaire with multi-event, online assessments aligned with organizational goals and reporting needs. It provides longitudinal data to support program improvement at the graduate college and department levels.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Redesigning Measures and Methods for Evaluating Graduate College Experiences
1. Beyond instrumentation: redesigning measures
and methods for evaluating the graduate college
experience
Patricia L. Hardré & Shannon Hackett
Received: 19 December 2013 /Accepted: 21 September 2014 /Published online: 5 October 2014
# Springer Science+Business Media New York 2014
Abstract This manuscript chronicles the process and products of a redesign for
evaluation of the graduate college experience (GCE) which was initiated by a univer-
sity graduate college, based on its observed need to reconsider and update its measures
and methods for assessing graduate students’ experiences. We examined the existing
instrumentation and procedures; met with and interviewed staff and stakeholders
regarding individual and organizational needs; collected systematic questionnaire data
on stakeholder perceptions; and then redesigned, developed, and tested new evaluation
instruments, systems, and procedures. The previously paper-based, one-time global exit
questionnaire was redesigned into a digitally administered, multi-event assessment
series, with content relevant to students’ incremental academic progress. Previously
discrete items were expanded into psychometrically coherent variable scales in parallel
forms to assess change over time (entry, mid-point, exit, post-graduation). They were
also strategically designed as stable and independent enough so administrators could
vary the timing and sequence of administration to fit their ongoing needs The team
conducted two testing cycles, gathering pertinent information on the redesigned as-
sessment and procedures (N=2,835). The final redesigned evaluation serves as an
exemplar of evaluation that enhances assessment quality including psychometric prop-
erties and multiple stakeholder validation, more effectively addresses the organization’s
incremental evaluation needs, increases timeliness of data collection, improves reach to
and participation of distributed students, and enables longitudinal data collection to
provide ongoing trajectory-of-change evaluation and a research data stream. Product
and process analysis informs strategies for more effectively and dynamically assessing
graduate education.
Keywords Graduate experience . Assessment design, development, and testing .
Program evaluation . Higher education . Graduate education
Educ Asse Eval Acc (2015) 27:223–251
DOI 10.1007/s11092-014-9201-6
P. L. Hardré (*) :S. Hackett
Department of Educational Psychology, Jeannine Rainbolt College of Education, University of
Oklahoma, 820 Van Vleet Oval, ECH 331, Norman, OK 73019-2041, USA
e-mail: hardre@ou.edu
2. This project involved entirely reconceptualizing, extending, and expanding a research
university graduate college’s program evaluation measures and methods. The redesign
process elicited original need assessment and ongoing feedback from students and
faculty across the university’s graduate programs. The team tested and refined instru-
mentation and system designs iteratively, based on stakeholder perspectives. The new
instruments and system were designed to provide direct information needed by the
Graduate College and also to provide data-driven feedback to graduate departments and
programs to support their continual program improvement. This 2-year-long systematic
design and development process replaced a one-page, qualitative exit questionnaire
with a multi-event, systematic design; digital, online administration; and psychometri-
cally sound instruments, aligned with current organizational goals and reporting needs.
We went beyond instrumentation, to include redesign of the administration media,
timing, and other systemic features.
This manuscript will first present a review of the relevant foundational and current
research and evaluation literature. Then, it will present an overview of the project
methods, over phase I (needs analysis, instrument and systems redesign, and alpha testing)
and phase II (iterative revision and beta testing). Following the overview, each phase of
the process and instrumentation will be broken down into sequential, detailed procedures
and specifications, with results of each analysis and implications leading to the next phase
or to final recommendations as appropriate. It will conclude with both evaluation lessons
learned and principles supported and the important contributions of this work to academic
program assessment and more general evaluation research and practice.
1 Literature review
More than 1.5 million people are enrolled in graduate programs in the USA each year
(Gardner and Barnes 2007; Allum et al. 2012) and many times that number worldwide
(Council of Graduate Schools 2012). Contrary to popular belief, many major research
universities enroll more graduate students than undergraduates (US Department of
Education 2005). Yet, relatively little systematic research is conducted that informs
more than a very small subset of those who teach, manage, and make policy to support
graduate students (Nesheim et al. 2006).
2 Studies of the graduate experience
A number of studies have been done focused on various elements of the graduate
college experience (GCE). Some of these studies have been localized, focused on a
single discipline or program (e.g., Benishek and Chessler 2005; Coulter et al. 2004;
Gardner and Barnes 2007; Hegarty 2011; Schram and Allendoerfer 2012). Other
studies have focused on very specific groups, such as alumni, dropout, or non-
attendees, and only addressed a few key variables such as why they chose to leave or
not attend (e.g., Belcher 1996; Delaney 2004; Lipschultz and Hilt 1999). Some studies
conducted internationally have combined disciplinary and institutional factors with
broader cultural factors, generating deeply contextualized data to inform local needs
(e.g., Kanan and Baker 2006).
224 Educ Asse Eval Acc (2015) 27:223–251
3. Others have attempted to reach more broadly but faced low return on the population
sampled, raising questions about their representativeness (e.g., Davidson-Shivers et al.
2004; Farley et al. 2011). In each of these cases, different methods and instruments
have been used and different construct and characteristics studied, making it difficult to
compare findings. The generally discrete nature of the samples has made it difficult
even to synthesize the findings in ways that inform graduate education. In many
universities, each college or department devises its own measures, making comparisons
even within the institution problematic. The body of research on the GCE could be
more effective and productive across universities, if there was accessible, consistent,
and comparable instrumentation to measure some common characteristics and goals of
graduate programs and institutions.
In spite of the lack of comparability across these studies, a few principles are clear,
both from the collection of findings and from the more global literature on the
psychologies of adult education and human experience. Major changes of context
and experience, such as going to graduate school, cause people to go through identity
transitions and experience dramatic change in their self-perceptions and how they
understand themselves and others (Austin et al. 2009; Chism et al. 2010; Hephner
LaBanc 2010), often including very strong self-doubt and anxiety (Gansemer-Topf
et al. 2006; Brinkman and Hartsell-Gundy 2012). Graduate education involves
redirecting cognitive attention and emotional energy in ways that can impact key
relationships and cause family and emotional crisis (Baker and Lattuca 2010).
Success in graduate school depends on interpersonal and social relationships, as well
as on intellectual mastery (Cicognani et al. 2011). Being back in acadème after years
away can be a tremendous adjustment, which is amplified when the return is to a
different discipline, culture, and context, requiring substantial reacculturation and
socialization (Fu 2012; Hardré et al. 2010b).
3 Need for graduate-level information and feedback
Various sources cite attrition from graduate programs as high as 50 % or more (Lovitts
2001; Offstein et al. 2004). Given the life changes attributable to returning to graduate
education, it is easy to understand that many students might not make those shifts easily
without substantial support. Graduate education is a huge investment of time, funding,
and expertise, by faculty, departments, and institutions (Stone et al. 2012; Smallwood
2004). Institutions, research units and policy-making bodies need clear, useful infor-
mation about graduate education (Gansemer-Topf et al. 2006).
Much research and scholarly attention on the graduate experience has been focused
on academic abilities and aptitudes (Golde 2000), and success has been largely
attributed to academic preparation (Fu 2012). Popular measures of these characteristics
include (1) standardized tests (such as the graduate record examination (GRE), required
by most graduate programs nationally) and (2) grade point averages (GPAs) from
previous and current coursework. These measures are easy because they are simple,
quantified, and standardized, and thus comparable and generalizable.
However, academics are only part of the story that explains graduate students’
academic success. Interacting with them are numerous other elements of graduate life,
such as scholarly and professional development, personal satisfaction, identity, stress
Educ Asse Eval Acc (2015) 27:223–251 225
4. and anxiety, social support, peer relationships and community, and overall well-being
(Gansemer-Topf et al. 2006; Offstein et al. 2004). Some studies have addressed
socialization into graduate school and into the scholarly culture and values of students’
disciplines and professions, generating sets of factors that influence these processes
(e.g., Gardner and Barnes 2007; Weidman et al. 2001). However, it is unclear how the
characteristics and circumstances of an increasingly diverse and ever-changing profile
of the graduate student interacts with both institutional constants and discipline-based
cultural nuances to support their learning and professional development (see also
Hardré 2012a, b; Hardré and Chen 2005, 2006).
This information needs to include insight into the current and authentic nature of the
graduate college experience, its impacts on students, other impacts on students’ success
within it, and students’ perceptions of their journeys. Perceptions are important in any
novel experience and particularly in transitions, as the nature and impacts of transition
depend less on the actual, measurable events than on the participants’ individual and
collective perceptions of those events (Hardré and Burris 2011; Schlossberg et al. 1995;
Bloom et al. 2007). Stress is a core component of the graduate experience, and people
handle stressful circumstances very differently (Offstein et al. 2004; Williams-Tolliver
2010). Goals and goal attainment have tremendous impact on how people work and learn
(Kenner and Weinerman 2011). We have seen goals and expectations studied among
higher education faculty, showing significant effects (e.g., Hardré et al. 2011), yet little
systematic research has included the goals and expectations that graduate students bring
into their educational experience and the reasons why they make choices along the way.
Some theorists and practitioners have called for more concerted institutional efforts at
understanding and supporting graduate students’ experiences and success, similar to
those traditionally focused on undergraduates (e.g., Gansemer-Topf et al. 2006;
Hyun et al. 2006).
4 Need for instrument and system design and fit
Various efforts have been made to produce standardized measures and create national
databases of information on graduate students. More than a decade ago, the National
Doctoral Program Questionnaire, funded by the Alfred P. Sloan Foundation, was
heralded as a grassroots attempt to use data about the graduate experience to improve
graduate education nationally (Fagen and Suedkamp Wells 2004). The Higher
Education Research Institute (HERI) project and the American Educational Research
Association (AERA) graduate questionnaire project strove to generate data via ques-
tionnaire instruments for comparing student experiences and faculty perceptions of
their work climates (HERI 2012). However, in centralized systems such as these,
neither the measurement parameters (instruments, participants, sampling, timing) nor
the resulting raw data sets are directly accessible to, or controlled by, potential users
(researchers or institutions), which severely limits their utility.
Researchers and administrators in graduate education need instruments that gener-
alize and transfer across institutions and contexts (Hyun et al. 2006). Having adaptive,
useful, and efficient tools to investigate the graduate experience in higher education
could help address the need for more scholarly research in this critical area for higher
education (Gardner and Barnes 2007). Having the right tools and information could
226 Educ Asse Eval Acc (2015) 27:223–251
5. help administrators assess and address issues with attention to specialized local needs
(Nesheim et al. 2006). It is clear that a need exists for systematically designed and well-
validated tools for assessing a range of dimensions of the graduate experience, to
address issues relevant to graduate program development and improvement, as seen
through graduate students’ perspectives. Beyond instrumentation, graduate institutions
need insight into administrative systems, timing, and related strategies to support
optimal assessment.
5 Method
5.1 Context and reflexivity
This project occurred in a public research university in the Southwestern USA. The
Graduate College is more than 100 years old and enrolls over 4,000 advanced degree
students annually. It confers doctoral and masters degrees in hundreds of academic
majors, both traditional programs and continuing education degree programs and
certificates. Some programs are very structured with students in cohorts, while others
are more fluid and adaptive, allowing students to cover curricula on their own pace and
schedule, supported by their academic advisors. The institutional culture gives auton-
omy to colleges and departments to determine graduate academic program require-
ments, and the graduate college oversees curriculum revisions, monitors progress, and
maintains accountability. The graduate student body is 70 % US domestic and 30 %
international from 42 countries; ages range from 21 to 90, and it is about evenly divided
by gender. Full-time students make up 60 % of the graduate populations, and the
remaining 40 % attend part-time; many graduate students also work outside of school
and have families.
The evaluator and assessment designer was a senior graduate faculty member in the
university, with specialized training and expertise in this area, who also did external
evaluation design and consulting professionally. The Graduate College Dean invited
the faculty member to take on the evaluation redesign project, based on the advice of
the university Provost. The evaluator worked on this project without personal financial
compensation, but with the understanding that she could use the data gathered for
research presentation and publication. The Graduate College did provide one graduate
assistantship (0.5 fte) to assist with the primary project tasks. The evaluator also utilized
a team of other graduate assistants on particular components of the project.
5.2 Process and procedures overview
5.2.1 Phase I: needs analysis, redesign, and alpha testing
Invited by the Graduate College Dean to redesign its assessment of the graduate
experience, the team reviewed the relevant literature to gain a general scope of coverage
and variables of interest. Consistent with evaluation standards, we also involved others
with interest in the outcomes (faculty and administrative stakeholders) to define the
evaluation (Yarbrough et al. 2011). We conducted focus groups and interviews and
administered generative, paper-based instruments with students, faculty, and
Educ Asse Eval Acc (2015) 27:223–251 227
6. administrators. The goal at this early stage was to determine the most appropriate
variables and indicators and to include nuanced information for client and program
needs.
Based on this information, the team designed and developed the first (alpha) version
of the GCE assessment instrument. Given the need to reach a distributed group of
technology-active participants with multiple tools, it was decided to use online admin-
istration and the first (alpha) instruments were developed with the SurveyMonkey®
administrative software. At this stage, three initial versions of the instruments were
developed. Over 500 students completed the alpha test instruments, producing data
adequate to demonstrate generally good psychometric characteristics and also deter-
mine refinements necessary to improve on the GCE assessments.
5.2.2 Phase II: revision and beta testing
Following the analysis of the development and alpha test data, the evaluation team
generated a revised version of the GCE instrument. During the alpha testing, the team
recognized relevant limitations in the original (SurveyMonkey®) digital administration
system. In consultation with the Graduate College administration, it was decided to develop
the beta instrument with the more adaptive Qualtrics® digital administration system.
In its beta versions, the GCE evaluation contained refined scales and items. It was
also extended to include forms for five participant groups, the original three (entrance,
mid-point, and exit) plus two additional (non-attendees and alumni). These additional
forms extended the range of information the evaluation package provided to the GC
client. Over 2,000 student participants completed the beta instrument. In addition to the
student respondents, the evaluation team sent the beta instrument to faculty who
instruct and mentor graduate students across all academic colleges for feedback on its
fit and relevance to their program evaluation and improvement needs. This strategy was
based on a general interest in faculty perceptions (as key stakeholders), plus the
Graduate College’s organizational goal of producing data useful to graduate programs.
The beta data yielded additional information for further refining all five forms of the
instruments and baseline findings for the GC clients. These data were analyzed in two ways
for two types of outcomes: instrument performance and participant response patterns.
6 Phase I: needs analysis, redesign, and alpha testing
6.1 Needs analysis
The purpose of the needs assessment and analysis was to determine how students,
faculty, staff, and administrators defined the nature, parameters, and goals of the
graduate experience. The results of this process provided information to guide the
scope, definitions, and instrument development, as well as the testing plan.
6.1.1 Participant stakeholder groups
Four stakeholder groups were identified to provide input for the redesign and data
testing: 13 graduate students, 23 faculty, 10 staff, and 5 administrators of the Graduate
228 Educ Asse Eval Acc (2015) 27:223–251
7. College. A convenience sample was drawn from a list of individuals generated
by the Graduate College and evaluation team. All of the identified members of
the stakeholder groups participated in focus groups and some in additional
follow-up interviews to inform needs and determine the scope and content of the
GCE instruments.
Graduate students and graduate college assistants The sample group to determine the
definition of the graduate experience was derived from the pool of graduate students at
the university. This sample included graduate assistants working in the graduate
college, and members of an interdisciplinary group of graduate student program
representatives, along with students they recruited. Diverse graduate students partici-
pated at this stage in the process to help frame instrument language appropriate across
all groups.
Faculty, staff, and administrators Faculty, staff, and administrators at the univer-
sity have unique perspective on the role of the Graduate College and concepts
of the graduate experience. To better understand these issues, the evaluators
solicited feedback from graduate program professors and administrators from
various colleges.
6.1.2 Procedure
To define and clearly identify components of the graduate experience, the
evaluation team used focus groups, interviews, and open-ended questionnaire
instruments. Due to their explanatory nature and the designers’ developmental
interest in dialogue with stakeholders, these first questionnaires were paper-
based. Responses were transcribed and coded in analysis. Participants were
recruited through targeted e-mails and mailings using contact lists of current
graduate students, faculty, staff, and administrators provided by the Graduate
College.
Focus groups Focus groups (of six to ten participants) discussed issues related
to the graduate experience (time ≈60 min). The format was semi-structured with
some direct questions available to guide the meeting and address relevant goals.
Sample question was “What events and activities are part of the graduate student’s
experience at [univ]?”
Interviews Each individual interview was conducted in a semi-structured format
(time ≈60 min). Each interview concerned either feedback on instrument development
or more detailed understanding of issues raised in a previous focus group. Twenty-two
questions were created as options to ask the interviewee concerning the graduate
experience. Sample question was “Please define for you what constitutes the graduate
experience at [univ].”
Open-ended questionnaires Participants completed a 12-question (≈30-min) question-
naire. Sample question was “What is your perception of the Graduate College?”
Educ Asse Eval Acc (2015) 27:223–251 229
8. 6.1.3 Results of needs analysis
Data from focus groups, interviews, and open-ended questionnaires provided the
definition of the scope and terms to develop the GCE alpha questionnaire assessment
instruments. The information from these participants generated the scales and language
used for the first formal questionnaire items.
From the needs analysis and developmental data, the following points were clear:
& All of the stakeholder groups agreed that a new and more comprehensive assess-
ment of the GCE was needed.
& There were differences among groups as to what should be included and what
should be emphasized in the new instrument.
& There were, however, enough points of convergence and even consensus to draft
and test an instrument that reflected both the client’s interests and the breadth of
other stakeholders’ needs. The single-event, end-of-degree administration via paper
questionnaire needed to be redesigned and replaced with methods more attentive to
current activities, goals, and needs.
Based on these results, the evaluators proceeded with designing and testing a new
assessment instrument and system.
7 Redesign of administration timing, media, and methods
Parameters of this redesign needed to be accessible and salient for students. To meet
this need, the redesign included various administrations spread over students’ graduate
experience (which lasted from 2 to 10 years). A challenge of timing (given the
variability in duration across degrees and among full-time and part-time stu-
dents) was identifying the key points in progress at which students would
receive each instrument. Program faculty and department administrators need
prompt, timely feedback to support program improvement. This could be
achieved in part by the multi-event, incremental assessment design, and further
enhanced by creating parallel forms of instruments that offered data on developmental
change over time. Based on client and stakeholder input, the following potential
improvements were indicated.
& Appropriateness of item content for participant users could be improved by
the incremental administration redesign so students received questions at
times more proximate to their actual experiences. Utility and primary inves-
tigator potential for administrative users (the Graduate College, academic
programs) could be improved by the incremental administration redesign, so
they received data before students graduated, making responsive improve-
ments more timely and meaningful.
& Administration efficiency and utility for the client could be improved by digital
administration that eliminated manual data entry. Administration rescheduling and
timeliness of access for users could be improved by online digital administration
that they could access from remote and distributed sites.
230 Educ Asse Eval Acc (2015) 27:223–251
9. & Administration potential to share data with academic programs (a stated goal) and
ability to track change over time would both be vastly improved by the redesign
using both incremental administration and digital instrumentation.
7.1 Procedure
The evaluation team developed item sets to address relevant constructs and outcomes
indicated by the developmental data. Multiple team members independently examined
the transcripts, discussed and collaborated to develop the overall instrument scope and
content (topical scales and items). Then, the team organized the content for initial
administration to students, as appropriate to their degree (masters/doctoral) and
progress-toward-degree of study (entrance/mid-Point/exit). All administration occurred
in an asynchronous online questionnaire administration system, with all participant
identification separated from item responses. Testing participants were recruited via e-
mail invitation, using lists of eligible students provided by the Graduate College. All
study activities were consistent with human subjects’ requirements and approved by the
institutional IRB. De-identified responses were analyzed and stored according to IRB
standards for data security and confidentiality.
7.2 Participants
Participants were 504 graduate students invited to take the form of the questionnaire
appropriate to their point-in-program whether they were at the beginning (130), middle
(118), or end (256). Detailed participant demographics are shown in Table 1. Students
were demographically representative of the larger graduate student population on
campus, with similar distributions of genders, ethnicities, colleges, and degree types
(within ±6.1 %).
The eventual intent of the instrument design was to assess developmental trajectories
of experiences in the same graduate students over time (a within-subjects sample).
However, in order to collect sample data efficiently, in a single year, we used different
graduate students as proxy groups for progress-in-program (a between-subjects
sample).
7.3 Instruments
A total of 149 items were developed for the first round (alpha) instruments: 21
demographic items (selection and fill-in), 97 Likert-type items, 19 dichotomous (yes/
no) items, and 12 open-ended items. For the Likert-type items, after consultation and
discussion with the client regarding the tradeoffs in various scale lengths and config-
urations, an eight-point scale (1=strongly disagree, 8=strongly agree) without a neutral
mid-point was used. In addition to the formal quantitative items, open-response fields
were provided and participants encouraged to “explain any responses” or “provide any
additional information.”
The items were organized into theoretical and topical clusters and subscales. The
sections were (1) Why Graduate School?, (2) Admissions Process, (3) Decision to
Attend, (4) Financial Aid, (5) The Graduate Experience, (6) Graduate College Advising
Educ Asse Eval Acc (2015) 27:223–251 231
10. and Staff, (7) Graduate College Events, (8) Graduate College Media and Materials, (9)
Program of Study Satisfaction, (10) Social Interaction, and (11) University Resources
and Services. Table 2 shows a summary of the scope and organization of the instru-
mentation, as well as reliabilities and factor structures.
Based on the initial redesign feedback, three web-based forms of the questionnaire
instruments were created. Each was designed to measure satisfaction with the graduate
experience at specific incremental points in students’ progress-toward-degree: entry,
mid-program, and exit. Participants were recruited via e-mail and provided with active,
generic hyperlinks to the questionnaires, which they could access from any location.
Timing for the assessments was at three key time points in their specific programs: at
entrance (their first semester), mid-point (first semester of second year for masters; first
Table 1 Alpha participant demographic characteristics
Frequency Percentage
All Masters PhD Institution Sample
Degree type
Masters 375 – – 72.5 75.0
Doctoral 125 – – 27.5 25.0
Gender
Male 237 164 70 51.7 47.6
Female 261 209 51 48.3 52.4
Ethnicity
African American/black 31 23 8 5.0 5.7
Asian American/Asian 44 29 15 5.1 8.1
Pacific Islander/native Hawaiian 2 2 – 0.2 0.4
Hispanic/Latino 25 23 2 5.2 4.6
Native American/American Indian 29 24 5 4.9 5.4
White/Caucasian 397 295 98 72.7 73.3
Other 14 9 5 6.9 3.6
Colleges
Architecture 6 6 – 2.2 1.2
Arts and Sciences 217 148 69 37.0 43.0
Atmospheric and Geographic Sciences 6 5 1 3.5 1.2
Business 37 30 6 8.3 7.4
Earth and Energy 11 9 2 5.2 2.2
Education 60 42 18 18.0 11.9
Engineering 43 30 13 14.1 8.6
Fine Arts 25 17 7 5.6 5.0
Journalism and Mass Communication 10 6 4 1.8 2.0
International Studies 11 11 – 0.4 2.2
Liberal Studies 46 44 2 2.9 9.2
Dual Degree/Interdisciplinary 31 26 3 0.8 6.2
232 Educ Asse Eval Acc (2015) 27:223–251
11. semester of third year for doctoral students), and exit (graduating semester). At this
stage of development, all students completed the same questionnaire sections, with two
exceptions: Admissions Process (entry students only) and Career Preparation
(mid-point and exit only).
8 Analysis
Once questionnaires were completed, data were exported to SPSS® for statistical analysis.
Means and standard deviations were computed for each Likert-type question. Additional
subgroup mean comparison statistics were computed for significant differences (by
degree type—masters and doctoral, and progress-toward-degree groups). Exploratory
factor analyses (EFAs) were conducted on theoretical and topical sections with more than
five items, to examine structural nuances and help determine the appropriateness of items
within sections. Reliabilities for the theoretically coherent subscales were computed using
Cronbach’s alpha (target α≥0.80). Additional generative commentary and questions
provided qualitative information to utilize in evaluation and revision.
8.1 Alpha measurement testing results
The alpha testing focused on measurement performance, with the system test implicit at
all stages, from development through response patterns. The assessment of validity at
this stage was a preliminary look at the appropriateness of instrument scope, item
Table 2 Section overview (alpha version)
Type of Scale No. of items Alpha No. of factors
Why graduate school Item cluster 8 – –
Admissions process Subscale 6 0.864 1
Decision to attend Item cluster 8 – 3
Financial aid Item cluster 8 – –
The graduate experience
Graduate experience satisfaction Subscale 13 0.928 2
To me, the graduate experience includes… Item cluster 12 – 3
Graduate college advising and staff Subscale 4 0.813 1
Graduate college events Item cluster 10 – –
Graduate college media and materials Item cluster 5 – 1
Program of study satisfaction
Program of study Subscale 9 0.806 2
Academic advisor Subscale 7 0.975 1
Academic program faculty Subscale 12 0.950 1
Career preparation Subscale 6 0.841 1
Social interaction Subscale 9 0.830 2
University resources and services Subscale 19 0.881 4
Negatively worded items were reverse-coded both for the reliability and factor analyses
Educ Asse Eval Acc (2015) 27:223–251 233
12. content, and overall fit. The assessment of reliability at this stage was to assess subscale
and section range and coherence, along with the nature and roles of item contributions.
Data on section-level relationships and item contributions would support refinement for
both content appropriateness and instrument efficiency, without reducing instrument
coherence or sacrificing measurement scope. Qualitative responses and commentary
provided information on how particular participants and subgroups processed and
interpreted the instrument content, which further informed revision and refinement.
8.2 Validity
The first goal of the alpha testing analysis (focused on validity) was to assess the
appropriateness, scope, and fit of the instrumentation for addressing the target variables
and indicators (Cook and Beckman 2006; Messick 1995), overall and for each sub-
group at point-in-program. This included not only the item and section content but also
the instructions and administration system. Validity information was contained in the
developmental data (from the “Needs analysis” section), based on both expert-client
and user-stakeholder perspectives on what should be included. From the alpha testing,
analysis of authentic user responses added authenticity to illuminate the hypothetical.
The EFAs were conducted on all sections (criteria of loading at 0.80 with cross-
loadings not exceeding 0.30). This analysis would confirm that the language used in
the items (taken from the generative contributions of various stakeholders) were
communicating what they were intended to, and relating appropriately with one another
as interpreted by end-users. Additionally, the open-ended fields inviting additional
explanation and commentary were analyzed for contributions to the scope, content,
and appropriateness of the instrument and sections, as well as for system issues. Most
of the scales showed adequate and consistent loadings, and those falling short of target
criteria provided information needed to refine them. The sample was inadequate to
demonstrate discriminatory capacity for all subgroups of interest, but its differential
performance in some global groups (such as between masters and doctoral students)
showed promise. Overall, the synthesis of validity data showed both current strength
and future promise in the redesigned GCE assessment.
8.3 Reliability
The second goal (focused on reliability) was to conduct a preliminary assessment of
subscales’ internal coherence and item-level contributions, along with their discrimi-
natory capacity. As evidence of internal reliability, all of the theoretically coherent
sections (scales) were assessed for internal coherence using Cronbach’s alpha (criterion
of 0.80). Some met the test immediately, and others varied based on nuances in
participant responses. These data analyses demonstrated how those scales and sections
falling short of the standard could be refined to meet it and thus improve the measure.
All instruments demonstrated high stability over multiple administrations.
8.4 Divergence of “should be” versus “is”
Notably more comments were received on the item set defining the Nature of The
Graduate Experience. That section’s item stem was phrased as: “For me, the graduate
234 Educ Asse Eval Acc (2015) 27:223–251
13. experience includes….” followed by the list of descriptors supplied by students, faculty,
and staff during the needs analysis process. Comments on this section converged to the
question of whether that section’s instructions were intended to address what the
student’s actual experience did include or an ideal perception of what the graduate
experience should include. The original item had been written to address the former, the
student’s actual experience, but the frequency of these comments illuminated a pattern
of fairly widespread perceptions that there was a difference between the two. That is,
they suggested a need to inquire as to how graduate students’ actual experiences
differed from their expectations of what they should be. In addition, factor structuring
showed a divergence of content focus between perceptions that clustered as preferences
and perceptions that clustered as quality indicators as a proxy for satisfaction, indicating
a need to further restructure this section.
8.5 Perceived length
A common global comment received was that the whole instrument was very long. We
recognized that containing just over 100 substantive items, it was longer than most
students commonly completed (particularly in the current climate of short, quick digital
questionnaires). However, the internal systems data also confirmed that average time-
on-task for users who completed all items was only about 30 min. This was within the
task time required that we had predicted (below the maximum time range in our
participant consent document). It was also within the time frame considered reasonable
for an online administration, with the caveat that some users may perceive it to be much
longer.
8.6 Online administration
The cumulative data on system redesign (both method and tooling) indicated that the
digital, online administration was more effective and appropriate for reaching more
graduate students, including distance learners and part-time students, than the previous
(paper-based) method. The specific tool chosen (SurveyMonkey®) had presented
challenges in development, requiring a good deal of specialized back-end programming
for configuring it to deliver the instrument as designed. In particular, differential options
that required skip logic and similar special presentations were tedious to develop. In
addition, some critical issues that arose in compatibility with user-end systems required
intervention. For a new evaluation package used over time and across platforms, we
decided to seek a new administration tool that would add ease for both developers and
end-users.
9 Conclusions and measurement revisions
The evidence and information provided by the full range of data produced in the first
round of instrument testing demonstrated that the GCE redesign was largely successful
to date. This reasonable sample yielded strong evidence for both the validity and
reliability of the instruments at this stage. It also provided a good deal of both
psychometric data and direct user feedback on how they could be further improved
Educ Asse Eval Acc (2015) 27:223–251 235
14. for the beta testing. Based on all of the information accrued, the evaluators made the
following revisions for the next round of testing:
Given the users’ qualitative feedback on the “Nature of the Graduate
Experience” we adopted the dual range of their responses, one general and
ideal, the other personal and perceptual. In the beta, this item cluster was presented
two parallel clusters: the graduate experience should include and my graduate experi-
ence does include.
& By the client’s request, two more participant group versions were added: alumni
and non-attendees. The first increased the scope and range of assessment of
program effects beyond students’ perceived preparation for careers, to include their
experiential perceptions after graduation and entry into their professions. It consti-
tuted a fourth sequential assessment for students who attended this institution. The
second addressed the client’s interest in what caused candidates accepted into
graduate programs not to enter them, to support recruitment and retention efforts.
It constituted an entirely different instrument for a new participant group.
& Based on multiple item-level and scale-level analyses, we determined that approx-
imately 17 items could be removed to shorten the assessment without reducing
subscale reliabilities. However, we retained those items in the beta versions, in
order to test those conclusions with a second round of testing and a larger, more
diverse sample.
& We acknowledged that our revision decisions included significantly increasing the
length of the instrumentation and that the users already perceived it to be long.
However, we wanted to gain evidence for the full range of possible redesign
decisions from the retest data in the beta cycle. We determined that with the
independent sample test-retest data, we would be better equipped with ample
evidence to make those revision decisions for the final client handoff.
& Based on the weaknesses found in the initial tool, we selected a different develop-
ment and administration system for the beta testing, with more sophisticated
development functionality and the added benefit of institutional licensure
accessibility.
10 Phase II: redesign and beta testing—student questionnaires
10.1 Procedure
All administration occurred in an asynchronous online questionnaire administration
system, with all participant identification separated from item responses. A new
(between-subjects) group of testing participants was recruited via e-mail invitation,
using lists of eligible students provided by the Graduate College. Participants were
offered small individual incentives (tee-shirts for the first 100 completing the instru-
ments) and all participants entered into a drawing for a larger incentive (a
digital device). All study activities were consistent with human subject requirements
and approved by the institutional IRB. All participant data was de-identified and kept
confidential.
236 Educ Asse Eval Acc (2015) 27:223–251
15. 10.2 Participants
The 2,081 current or potential student participants were invited to take the form of the
questionnaire appropriate to their identity and point-in-program whether they were
individuals admitted, but who chose not to attend (22); students at the beginning (661),
middle (481), or end of their program (672); or alumni (245). Detailed participant
demographics are shown in Table 3. Participants were demographically representative
of the larger graduate student population on campus, with similar distributions of
genders, ethnicities, and colleges (within ±6.6 %). Two colleges were overrepresented:
Liberal Studies (+9.5 %) and Dual Degree/Interdisciplinary (+16.0 %). Masters stu-
dents were also overrepresented (+13.6 %). Response rate from e-mail lists was 72.6 %
(2,081 out of 2,865).
Table 3 Beta participant demographic characteristics
Frequency Percentage
All Masters PhD Institution Sample
Degree type
Masters 1431 – – 72.5 86.1
Doctoral 230 – – 27.5 13.9
Gender
Male 863 716 146 51.7 46.9
Female 1019 904 114 48.3 54.1
Ethnicity
African American/black 166 151 15 5.0 8.8
Asian American/Asian 146 108 38 5.1 7.8
Pacific Islander/native Hawaiian 5 5 – 0.2 0.3
Hispanic/Latino 110 96 12 5.2 5.8
Native American/American Indian 85 77 8 4.9 4.5
White/Caucasian 1,297 1,118 179 72.7 68.9
Other 74 66 8 6.9 3.9
Colleges
Architecture 30 30 – 2.2 1.5
Arts and Sciences 652 523 111 37.0 33.3
Atmospheric and Geographic Sciences 36 26 8 3.5 1.8
Business 113 102 9 8.3 5.8
Earth and Energy 52 49 3 5.2 2.7
Education 225 172 44 18.0 11.5
Engineering 146 109 32 14.1 7.5
Fine Arts 47 34 13 5.6 2.4
Journalism and Mass Communication 31 24 7 1.8 1.6
International Studies 53 51 – 0.4 2.7
Liberal Studies 243 225 6 2.9 12.4
Dual Degree/Interdisciplinary 328 277 29 0.8 16.8
Educ Asse Eval Acc (2015) 27:223–251 237
16. 10.3 Instruments
A total of 268 items were administered for the second round (beta) questionnaires: 17
demographic items (selection and fill-in), 237 Likert-type items, 9 dichotomous (yes/
no) items, and 5 open-ended items. Similar to the alpha questionnaires, for theoretically
continuous items, an eight-point Likert scale (1=strongly disagree, 8=strongly agree)
was used. Open-response fields enabled participants to “explain any responses” or
“provide any additional information.”
The 11 sections for the alpha questionnaires largely remained, with some new
sections added and refined, based specifically on the data and feedback from the alpha
testing. After the revisions, a total of three sections were added to create a better
understanding of the Graduate College experience.
Five forms of the questionnaire instruments were created: non-attend, entrance, mid-
point, exit, and alumni. The expanded design was for graduate students to be assessed
at four time points in their programs: at entrance (their first semester), at mid-point
(first semester of their second year for masters students, or first semester of
their third year for doctoral students), at exit (their graduating semester), and at
2 years post-graduation. The fifth version would be completed only by students
who were accepted but chose not to attend, to help the Graduate College gain
information about why. All student forms were parallel except for the Admissions
(entry only) and Career Preparation (mid-point, exit, and alumni). Further, some
items within sections were unique to alumni, relevant to post-graduation expe-
riences. The non-attend version of the questionnaire was much shorter and
different in context than the other instruments as appropriate to its purpose
and target group. The various sections and subscales are described below, with the
results of their reliability and factor analyses as appropriate. The summary of statistical
results is also shown in Table 4.
10.4 Subscales and item clusters
The 14 sections were divided into subscales and/or item clusters as follows:
Why graduate school? This section was designed to determine the reasons that
students attend graduate school. It presented the item stem “I am pursuing a
graduate degree,” and then listed 17 different reasons, each with a Likert-type
scale. Sample item is “I am pursuing a graduate degree…to gain a competitive
advantage in my field.” The EFA showed four factors.
Admission process. This section presented items about the individual’s admission
experience, process, and satisfaction. First, a single item addressed whether or not
students used the (then still optional) online system (dichotomous). Second, a
subscale addressed participants’ satisfaction with their admissions process (four
items; Likert-type; α=0.866). Sample item was “The instructions for completing
the application were adequate and easy to understand.” The EFA confirmed a
single factor.
Decision to attend. This section assessed reasons why students chose to
come to this university. It first asked if this was the student’s first choice
school (dichotomous). Then, a summary Likert-type item to endorse was
238 Educ Asse Eval Acc (2015) 27:223–251
17. “I am happy with my decision to attend [univ]1
.” The third component was an item
cluster (14 items; Likert-type scale). Item stem was “My decision to attend [univ]
was influenced by…” followed by 16 different responses to endorse (e.g., “having
similar research interests as professors in the department”).
Financial aid. This section asked students to identify the sources and types of their
support for attending and engaging in graduate studies (e.g., graduate assistant-
ships, tuition waivers).
Graduate experience. This section consisted of three parts: satisfaction, what it
should be, what it is (all on Likert-type scales). Satisfaction with the graduate
experience (12 items; α=0.901). Sample item was “I would recommend [univ] to
prospective graduate students.” The EFA confirmed one factor. Students’
1
These items presented the university’s acronym, replaced here with the generic “[univ]”.
Table 4 Summary of instrument structure and statistical performance (beta version)
Type of scale No. of Items Alpha No. of factors
Why graduate school Item cluster 18 – 4
Admissions process Subscale 4 0.866 1
Decision to attend Item cluster 17 – 4
Financial aid Item cluster 7 – –
The graduate experience
Graduate experience satisfaction Subscale 12 0.901 1
To me, the graduate experience should include… Item cluster 34 – 4
To me, the graduate experience did include… Item cluster 34 – 3
Graduate college advising and staff Subscale 6 0.879 1
Graduate college events Item cluster 2 – 1
Graduate college media and materials Subscale 8 0.924 1
Graduate program self-efficacy
Success in graduate program Subscale 6 0.808 1
Success in chosen profession Subscale 7 0.873 1
Program of study satisfaction
Program of study Subscale 12 0.865 2
Academic advisor Subscale 9 0.987 1
Academic program faculty Subscale 12 0.971 1
Career preparation satisfaction
Career preparation Subscale 10 0.973 1
Utility/value of degree Subscale 5 0.938 1
Professional competence Subscale 10 0.957 1
Social interaction Subscale 21 0.855 3
University resources and services Subscale 19 0.929 3
Final thoughts Qualitative 2 – –
Negatively worded items were reverse-coded both for the reliability and factor analyses
Educ Asse Eval Acc (2015) 27:223–251 239
18. perceptions of what the graduate experience “should include” and the
parallel of what it “does include” for that student (34 items each) both
presented item stems followed by lists of parallel characteristics, each for
the student to endorse. Sample item was “To me, the graduate experience
should include…developing close connections with faculty.” The EFA showed that
the “should include” scale loaded on four factors, while the “does include” loaded
on three.
Graduate college advising and staff. This section first asked students whether they
had experienced direct contact with the GC staff, for advising or other assistance,
then presented items assessing their understanding of its role and services five5
items; Likert-type; α=0.879). Sample item was “I understand the role of the
Graduate College.” The EFA confirmed a single factor.
Graduate College events. This section assessed students’ participation in various
GC-sponsored activities, to support ongoing program planning. Sample items were
“I attended activities during [event]” (dichotomous), and “I often attend Graduate
College sponsored events” (Likert-type).
Graduate College media and materials. This section assessed students’ satisfaction
with, and perceived benefit from, the GC website and other informational materials
(eight items; Likert-type; α=0.924). Sample item was “Viewing information on
the Graduate College’s website benefits me.”
Graduate program and career self-efficacy. This section (two subscales) assessed
students’ perceptions of self-efficacy (positioning for success) in their graduate
programs and professions. Program self-efficacy consisted of six items (Likert-
type; α=0.808) and professional self-efficacy of seven items (Likert-type;
α=0.873). Sample items were “I am certain that I will do well in this graduate
program.” and “I am just not sure if I will do well in this field.” EFA confirmed one
factor for each subscale.
Program of study satisfaction and career preparation. This section (four subscales)
assessed students’ satisfaction with various components of their graduate pro-
grams: program (focus on content and curriculum) (12 items; α=0.848; 2 factors),
program faculty (focus on teaching and advising) (20 items; α=0.966; 2 factors),
career preparation (9 items; α=0.973; 1 factor), and career utility and value of
degree (5 items; α=0.938; 1 factor) (all Likert-type items). Sample items were
program (“I believe that the level of difficulty in my coursework is appropriate”),
faculty (“The faculty in my program are fair and unbiased in their treatment of
students.”), career preparation (“My program area course content is preparing me
to practice effectively in the field.”), and career utility and value of degree
(“My graduate degree will open up current and future employment opportunities.”).
Professional competence and identity development. This subscale assessed stu-
dents’ perceptions of becoming competent professionals (ten items; Likert-type;
α=0.957). Sample item was “More and more, I am becoming a scholar in my
field.” EFA confirmed a single factor.
Social interaction. This subscale assessed participants’ social interaction and
engagement in the graduate community (21 items; Likert-type; α=0.855). Some
items differed for alumni, as appropriate. Sample items were current students
(“I have many friends in this university”) and alumni (“I am still in contact
with friends from my graduate program”).
240 Educ Asse Eval Acc (2015) 27:223–251
19. University resources and services. This section assessed participants’ satisfaction
with university campus resources and services (19 items; Likert-type; α=0.929).
Sample item was “I am happy with the condition of the building(s) containing my
classrooms.”
Final thoughts. They were also asked to answer two qualitative questions describ-
ing notable positive and challenging experiences in graduate school. Items were
“Please describe one of your most meaningful and important graduate experiences
at this university to date. Give as much detail as possible. Include the reasons why
it was so meaningful and important for you,” and “Please describe one of your
most challenging graduate experiences at this university to date. Give as much
detail as possible. Include the reasons why it was so challenging for you.”
11 Analysis
The same instrument performance analyses were conducted for the beta test data as for
the alpha test, utilizing SPSS® (see Table 4). In addition, the larger beta sample size
made it possible to perform more fine-grained subgroup mean comparison statistics
with greater statistical power, to confirm that the instruments maintained reliability
within subgroups, and determine if they also demonstrated some discriminatory
power for within-group differences (see Table 5). To assess their discriminatory
potential, we used two key subgroups, by-degree (masters and doctoral) and
progress-toward-degree (entry, mid-point, exit). Student subgroup data demonstrated
good consistency of performance across subgroups, with some discrimination of mean
differences.
12 Phase II: redesign and beta testing—faculty questionnaires
12.1 Procedure
Faculty members were also asked to give feedback regarding the various forms and
subscales on the student questionnaires. Five forms of web-based questionnaire instru-
ments were created to parallel the five versions of the student beta questionnaires,
presenting faculty members with screenshots of the student instruments and unique
response items for faculty. Participants were recruited via e-mail and provided with
active, generic hyperlinks to the questionnaires. They responded regarding the value
and fit of that information for their program development and improvement.
12.2 Participants
Faculty participants were invited from a list of faculty who teach and advise graduate
students. The list was randomly divided into five groups, and each received one of the
five forms of the student questionnaires (all sections). Faculty responses (N=199) were
divided as follows: 43 non-Attend, 33 entrance, 42 mid-point, 44 exit, and 37 alumni.
Detailed participant demographics are shown in Table 6.
Educ Asse Eval Acc (2015) 27:223–251 241
21. 12.2.1 Instruments
Faculty members reviewed screen captures of each section of the student questionnaires
and responded to six items (three Likert and three open-response).
12.2.2 Perceived appropriateness
This section assessed how appropriate (applicable, coherent, and useful) the faculty
members found the student assessment sections (three items; Likert-type; α=0.80).
Items were “The items in this section are applicable to our graduate department/program;”
“The items in this section are cohesive, providing perspective related to the section topic;”
and “The results from this section will be useful to know about our graduate students.”
Table 6 Frequency of faculty par-
ticipant demographic characteristics
All
Gender
Male 115
Female 51
Other gendered 1
Ethnicity
African American/black 3
Asian American/Asian 5
Pacific Islander/native Hawaiian –
Hispanic/Latino 4
Native American/American Indian 2
White/Caucasian 144
Other 6
Colleges
Architecture 9
Arts and Sciences 109
Atmospheric and Geographic Sciences 8
Business 10
Earth and Energy 6
Education 10
Engineering 19
Fine Arts 5
Journalism and Mass Communication 7
International Studies –
Liberal Studies 1
Dual Degree/Interdisciplinary –
Professorial rank
Assistant professor 31
Associate professor 58
Full professor 80
Other 3
Educ Asse Eval Acc (2015) 27:223–251 243
22. 12.2.3 Open-response items
Three additional generative items invited original faculty input: (1) “Are there any
additional items that you believe need to be added to this section? If so, please identify
which items those are, and why they are needed here;” (2) “Are there any items here
that you believe should be removed from this section? If so, please identify which items
those are, and why they should be removed;” and (3) “Other comments.”
13 Analysis
Analyses were conducted utilizing SPSS®. Reliabilities for the fit scale were
computed as Cronbach’s alpha (target α≥0.80). De-identified questionnaire re-
sponses were analyzed and stored according to IRB standards for data security and
confidentiality.
On the three quantitative fit items, faculty members reported finding the information
applicable, cohesive, and useful for their programs (M=6.34, SD=1.42). Overall
appropriateness of each questionnaire was as follows: non-attend (M=6.17, SD=
1.41), entrance (M=6.15, SD=1.71), mid-point (M=6.21, SD=1.41), exit (M=6.79,
SD=1.15), and alumni (M=6.38, SD=1.42). Tables 7 and 8 show the subscale item
means and standard deviations of the faculty feedback.
14 Overall measurement performance results
These data together constitute an independent samples’ test-retest of the GCE instru-
ment and system redesign. The beta testing cycle was a confirmatory retest, along with
some extension and refinement, of the alpha testing. Its analysis functioned on the same
goals, assessing the validity, reliability and fit of the new GCE assessment, through
both direct administration and stakeholder feedback.
Understanding the item-level contributions, particularly across two testings with
independent, authentic user samples, supported final instrument refinement for align-
ment and efficiency. We had retained longer versions of the evaluation instruments
knowing that the second testing would confirm or disconfirm which items could be
removed to retain optimal evaluative effectiveness and efficiency. In addition, the two
rounds of testing (alpha and beta) provided independent confirmation of the psycho-
metric properties of these measures. Results from the beta instrument testing were
similar to those from the alpha cycle.
The first goal of the beta testing analysis was to assess the appropriateness, scope,
and fit of the refined instrument content in addressing the target variables and indica-
tors, overall and for key student subgroups (by degree type and point-in-program). The
scales and sections performed with a high degree of consistency across the whole
group, while also demonstrating the capacity to discriminate between groups both by
degree type (masters/doctoral) and by progress-in-program (entry, mid-point, exit).
The scales and sections once again loaded consistently, demonstrating good
test-retest stability as evidence of reliable performance and validity in assessing the
target constructs.
244 Educ Asse Eval Acc (2015) 27:223–251
23. The second goal was to conduct a confirmatory assessment of subscale reliability,
subscale and section range and coherence, and item contributions. Consistent with their
performance in the previous cycle, nearly all subscales met the target criteria in both
internal consistency and factor loadings. Those that demonstrated less coherence
(generally the newly added and reorganized sections) demonstrated statistically how
they could be refined to meet the criteria. Across the two testing cycles, the scales and
sections also demonstrated a high level of test-retest stability and external consistency.
In addition to their performance with students, the instruments received favorable
perceptions of fit from faculty members across colleges and disciplines. Few additions
and deletions were recommended, and those suggested were specific to particular fields
rather than generally appropriate to the broader graduate faculty. Overall, the revised
(beta version) GCE instrument demonstrated excellent measurement performance.
Table 7 Faculty feedback on scale
fit reliabilities, means, and standard
deviations
All subscales are measured on an
eight-point Likert scale (1=
strongly disagree, 8=strongly
agree). Also, “Success in gradu-
ate program” and “Success in
chosen profession” were mea-
sured as one section; “Academic
advisor” and “Academic program
faculty” were measured as one
section; and “Career preparation”
and “Utility/value of degree”
were measured as one section
Fit
Alpha Mean SD
Demographics 0.905 6.09 1.65
Why graduate school 0.958 6.32 1.79
Admissions process 0.960 6.55 1.67
Decision to attend 0.942 6.75 1.44
Financial aid 0.912 6.73 1.41
The graduate experience
Graduate experience satisfaction 0.956 6.81 1.23
To me, the graduate experience should
include…
0.951 6.47 1.50
To me, the graduate experience does
include…
0.956 6.84 1.27
Graduate college advising and staff 0.923 6.55 1.52
Graduate college events 0.966 5.70 1.90
Graduate college media and materials 0.928 6.57 1.19
Graduate program self-efficacy
Success in graduate program 0.968 6.41 1.59
Success in chosen profession 0.968 6.41 1.59
Program of study satisfaction
Program of study 0.969 6.80 1.29
Academic advisor 0.918 7.06 1.25
Academic program faculty 0.918 7.06 1.25
Career preparation satisfaction
Career preparation 0.967 6.72 1.36
Utility/value of degree 0.967 6.72 1.36
Professional competence .958 6.50 1.38
Social interaction 0.958 6.49 1.38
University resources and services 0.948 6.47 1.45
Final thoughts 0.989 6.64 1.56
Educ Asse Eval Acc (2015) 27:223–251 245
24. The new administration system (Qualtrics®) required something of a learning curve
in development but paid off with a high degree of clarity and usability for both
developers and end-users. A few user comments included confusion regarding the
interface, but those were easily addressed. As to time-on-task required to complete the
beta version, participants took only a few minutes more than the alpha version (37 min
on average). One in-system revision indicated prior to implementation was to simplify
the programming logic, as the originally complex skip-logic appeared to confound use
of the progress bar.
15 Data-driven findings demonstrating evaluation enhancement
While the research-based findings are the topic of separate manuscripts, it is important
here to underscore those that constitute evidence of the value-added of this particular
Table 8 Faculty section means
Applicability
(N=141)
Cohesiveness
(N=137)
Usefulness
(N=137)
Demographics 6.34 5.99 6.09
Why graduate school 6.70 6.62 6.32
Admissions process 6.37 6.33 6.27
Decision to attend 6.79 6.73 6.72
Financial aid 6.70 6.87 6.64
The graduate experience
Graduate experience satisfaction 6.94 6.75 6.77
To me, the graduate experience should include… 6.51 6.48 6.44
To me, the graduate experience does include… 6.89 6.85 6.79
Graduate college advising and staff 6.58 6.67 6.48
Graduate college events 5.73 5.91 5.60
Graduate college media and materials 6.62 6.67 6.40
Graduate program self-efficacy
Success in graduate program 6.49 6.38 6.34
Success in chosen profession 6.49 6.38 6.34
Program of study satisfaction
Program of study 6.89 6.66 6.82
Academic advisor 7.19 6.96 7.09
Academic program faculty 7.19 6.96 7.09
Career preparation satisfaction
Career preparation 6.82 6.64 6.72
Utility/value of degree 6.82 6.64 6.72
Professional competence 6.56 6.50 6.47
Social interaction 6.44 6.57 6.41
University resources and services 6.51 6.54 6.37
Final thoughts 6.65 6.61 6.66
246 Educ Asse Eval Acc (2015) 27:223–251
25. redesign strategy. One powerful product of this project was the instruments themselves,
developed from the direct input of faculty, staff, administrators, and students, then
tested and refined through authentic use. In addition, among data-driven findings are
potentially important patterns that are illuminated by specific elements of the instru-
ment and system design. For example, the subgroup differences by degree type and
point in progress-toward-degree had not been demonstrated in the previous published
literature nor had they every been analyzed or compared in this Graduate College’s
evaluation process, because the previous design did not allow for this type of compar-
ison. Also important was the general pattern of the mean score drop at mid-point across
multiple perceptions, as this trajectory of perceptions had been demonstrated in very
focused and small-scale studies, but not in a diverse interdisciplinary group of graduate
students across an entire university population. This was, again, because the published
studies did not present design of instrumentation and implementation on this scope.
Similarly, the development of parallel scales, such as the two forms of the Graduate
Experience scale (“should” and “is”) and the two self-efficacy scales (program and
career), support direct comparison of differential perceptions in these potentially
important nuanced constructs. In the test samples, there were some striking differences
in these perceptions. In addition, the redesign to include both graduate college and
program-level outcomes, explicitly endorsed by graduate faculty, supported the grad-
uate college giving back de-identified data to departments and programs. The redesign
included moving to online administration, which resulted in dramatically improved
participation rates in the graduate college evaluation overall, and the dual-phase testing
process included testing two different development and administration systems and
identifying weaknesses in one before full implementation. These results underscore the
value and importance of the redesign to include the range and types of perceptual
instruments, the development and delivery system, and the multi-point (trajectory) of
administration.
16 Limitations
A limitation of this developmental design and analysis is implicit in the available
sample. It was (1) volunteer (required by IRB) rather than comprehensive and (2) from
independent samples (resulting in between-subjects rather than within-subjects analy-
sis). These sampling constraints introduce variability beyond that for which the instru-
ments were designed. However, following implementation, the authentic, within-
subjects sample will be accessed over the next 5 years. An additional limitation is the
sample from a single institution and future goals include a multi-institutional test.
17 Conclusions
Based on the instrument and system performances, the evaluators recommended
transfer to the client for full implementation, with a list of items the client could choose
to delete without reducing the overall quality of the measure. It was important to
underscore that assessment efficiency is not the only criterion for item selection or
inclusion. Efficiency must be balanced with effectiveness as operationalized by scope
Educ Asse Eval Acc (2015) 27:223–251 247
26. and range of each scale or section. The evaluators proposed length reduction using the
criteria of maximum efficiency without reducing scale reliabilities (below 0.80) or
unduly constraining the scope of assessment to exclude a critical subgroup of students
or disciplines typically represented in a research university. Based on these criteria, a
maximum of 55 items could be removed. After discussion with the client, only 19 items
were removed for initial implementation, to maintain a robust instrument with the
greatest range for possible nuanced differences among colleges and disciplines.
These redesigned program evaluation methods and measures offer substantive
benefits consistent with the Graduate College’s expressed goals and emergent needs.
Product and process outcomes include the following:
1. Updated, reasoned, multi-event administrative process, attentive to organization
and programs across the university
2. Psychometrically sound instrumentation that produces objectively verifiable, de-
fensible results
3. Excellent validity evidence on internal context, scope, structure and substance of
the instrumentation, with perceived fit and value-added perceptions of faculty
across disciplines
4. Excellent reliability evidence including internal coherence as well as external and
factor structures, test-retest with students; and consistency across subgroups
5. Self-contained, stand-alone variable subscales and item clusters that enable admin-
istrators to utilize part or all sections of the instrument as needed
6. Updated administrative system and media to reach a larger group including off-site
and distributed graduate students
The team also emphasized that in addition to administering the complete instrument
at once, each subscale and section is designed as a potential stand-alone section. It is
feasible for an institution or unit to remove some sections that address issues of less
immediate priority or administer sections at different times with this design. If the user
intended to compare responses from the various sections, administering them at the
same time would control for some order and administration effects. Future directions
for this project include the extension via longitudinal testing with the dependent sample
for which it was originally designed. That data may also provide additional confirma-
tory insight on performance of the shorter (revised) version supported by these data.
18 Discussion
The redesign of assessments goes beyond instrumentation. Rethinking assessments is
much more than generating a new set of items, or even user instructions. Effective
redesign requires re-examining the full range of features, contexts, and conditions,
including timing, technology, tools, reframing of longitudinal instrumentation, and so
on, to produce a whole-system redesign. Many institutional assessments are moving to
digital administration systems, a shift that is more than simple digitization, involving
translation, as well as transfer (Bandilla et al. 2003; Hardré et al. 2010a).
Administrators need to consider design features (Vincente and Reis 2010) as well as
system and context elements that may influence user behaviors and consequent data
248 Educ Asse Eval Acc (2015) 27:223–251
27. outcomes (Hardré et al. 2012). Tools and systems need to be tested in authentic ways
with real user participants (Patton 2012), so test data not only reflects accurate product,
but also illuminates issues of process that may need to be adjusted for final implemen-
tation. This systematic and systemic approach to assessment design, development, and
testing provides the rigor needed to demonstrate accurate assessment and validate data
meaningfulness and use.
References
Allum, J. R., Bell, N. E., & Sowell, R. S. (2012). Graduate enrollment and degrees: 2001 to 2011.
Washington: Council of Graduate Schools.
Austin, J., Cameron, T., Glass, M., Kosko, K., Marsh, F., Abdelmagid, R., & Burge, P. (2009). First semester
experiences of professionals transitioning to full-time doctoral study. College Student Affairs Journal,
27(2), 194–214.
Baker, V. L., & Lattuca, L. R. (2010). Developmental networks and learning: toward an interdisciplinary
perspective on identity development during doctoral study. Studies in Higher Education, 35(7), 807–827.
Bandilla, W., Bosnjak, M., & Altdorfer, P. (2003). Survey administration effects? A comparison of web-based
and traditional written self-administered surveys using the ISSP environment model. Social Science
Computer Review, 21, 235–243.
Belcher, M. J. (1996). A survey of current & potential graduate students. Research report 96–04. Boise: Boise
State University.
Benishek, L. A., & Chessler, M. (2005). Facilitating the identity development of counseling graduate students
as researchers. Journal of Humanistic Counseling Education and Development, 44(1), 16–31.
Bloom, J. L., Cuevas, A. E. P., Evans, C. V., & Hall, J. W. (2007). Graduate students’ perceptions of
outstanding graduate advisor characteristics. NACADA Journal, 27(2), 28–35.
Brinkman, S. N., & Hartsell-Gundy, A. A. (2012). Building trust to relieve graduate student research anxiety.
Public Services Quarterly, 8(1), 26–39.
Chism, M., Thomas, E. L., Knight, D., Miller, J., Cordell, S., Smith, L., & Richardson, D. (2010). Study of
graduate student perceptions at the University of West Alabama. Alabama Counseling Association
Journal, 36(1), 49–55.
Cicognani, E., Menezes, I., & Nata, G. (2011). University students’ sense of belonging to the home town: the
role of residential mobility. Social Indicators Research, 104(1), 33–45.
Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric
instruments: theory and application. The American Journal of Medicine, 119, 116.e7–166.e16.
Coulter, F. W., Goin, R. P., & Gerard, J. M. (2004). Assessing graduate students’ needs: the role of graduate
student organizations. Educational Research Quarterly, 28(1), 15–26.
Council of Graduate Schools. (2012). Findings from the 2012 CGS international graduate admissions survey.
Phase III: final offers of admission and enrollment. Washington: Council of Graduate Schools.
Davidson-Shivers, G., Inpornjivit, K., & Sellers, K. (2004). Using alumni and student databases for evaluation
and planning. College Student Journal, 38(4), 510–520.
Delaney, A. M. (2004). Ideas to enhance higher education’s impact on graduates’ lives: alumni recommen-
dations. Tertiary Education and Management, 10(2), 89–105.
Fagen, A. P., & Suedkamp Wells, K. M. (2004). The 2000 national doctoral program survey: an online study
of students’ voices. In D. H. Wulff, A. E. Austin, & Associates (Eds.), Paths to the professoriate:
strategies for enriching the preparation of future faculty (pp. 74–91). San Francisco: Jossey-Bass.
Farley, K., McKee, M., & Brooks, M. (2011). The effects of student involvement on graduate student
satisfaction: a pilot study. Alabama Counseling Association Journal, 37(1), 33–38.
Fu, Y. (2012). The effectiveness of traditional admissions criteria in predicting college and graduate success
for American and international students. Doctoral dissertation, University of Arizona.
Gansemer-Topf, A. M., Ross, L. E., & Johnson, R. M. (2006). Graduate and professional student development
and student affairs. New Directions for Student Services, 2006(115), 19–30.
Gardner, S. K., & Barnes, B. J. (2007). Graduate student involvement: socialization for the professional role.
Journal of College Student Development, 48(4), 369–387.
Golde, C. M. (2000). Should I stay or should I go? Student descriptions of the doctoral attrition process. The
Review of Higher Education, 23(2), 199–227.
Educ Asse Eval Acc (2015) 27:223–251 249
28. Hardré, P. L. (2012a). Scalable design principles for TA development: lessons from research, theory, testing
and experience. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teaching
assistant development (pp. 3–38). Stillwater: NewForums.
Hardré, P. L. (2012b). Teaching assistant development through a fresh lens: a self-determination theory
framework. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teaching
assistant development (pp. 113–136). Stillwater: NewForums.
Hardré, P. L., & Burris, A. (2011). What contributes to TA development: differential responses to key design
features. Instructional Science, 40(1), 93–118.
Hardré, P. L., & Chen, C. H. (2005). A case study analysis of the role of instructional design in the
development of teaching expertise. Performance Improvement Quarterly, 18(1), 34–58.
Hardré, P. L., & Chen, C. H. (2006). Teaching assistants learning, students responding: process,
products, and perspectives on instructional design. Journal of Graduate Teaching Assistant
Development, 10(1), 25–51.
Hardré, P. L., Crowson, H. M., & Xie, K. (2010a). Differential effects of web-based and paper-based
administration of questionnaire research instruments in authentic contexts-of-use. Journal of
Educational Computing Research, 42(1), 103–133.
Hardré, P. L., Nanny, M., Refai, H., Ling, C., & Slater, J. (2010b). Engineering a dynamic science learning
environment for K-12 teachers. Teacher Education Quarterly, 37(2), 157–178.
Hardré, P. L., Beesley, A., Miller, R., & Pace, T. (2011). Faculty motivation for research: across disciplines in
research-extensive universities. Journal of the Professoriate, 5(2), 35–69.
Hardré, P. L., Crowson, H. M., & Xie, K. (2012). Examining contexts-of-use for online and paper-based
questionnaire instruments. Educational and Psychological Measurement, 72(6), 1015–1038.
Hegarty, N. (2011). Adult learners as graduate students: underlying motivation in completing graduate
programs. Journal of Continuing Higher Education, 59(3), 146–151.
Hephner LaBanc, B. (2010). Student affairs graduate assistantships: an empirical study of the perceptions of
graduate students’ competence, learning, and professional development. Doctoral dissertation, Northern
Illinois University.
Higher Education Research Institute (HERI) (2012). Faculty satisfaction survey. http://www.heri.ucla.edu/index.php.
Accessed 15 June 2013
Hyun, J., Quinn, B. C., Madon, T., & Lustig, S. (2006). Needs assessment and utilization of counseling
services. Journal of College Student Development, 47(3), 247–266.
Kanan, H. M., & Baker, A. M. (2006). Student satisfaction with an educational administration
preparation program: a comparative perspective. Journal of Educational Administration, 44(2),
159–169.
Kenner, C., & Weinerman, J. (2011). Adult learning theory: applications to non-traditional college students.
Journal of College Reading and Learning, 41(2), 87–96.
Lipschultz, J. H., & Hilt, M. L. (1999). Graduate program assessment of student satisfaction: a method for
merging university and department outcomes. Journal of the Association for Communication
Administration, 28(2), 78–86.
Lovitts, B. E. (2001). Leaving the ivory tower: the causes and consequences of departure from doctoral study.
Lanham: Rowman & Littlefield.
Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational
Measurement: Issues and Practice, 14(4), 5–8.
Nesheim, B. E., Guentzel, M. J., Gansemer-Topf, A. M., Ross, L. E., & Turrentine, C. G. (2006). If you want
to know, ask: assessing the needs and experiences of graduate students. New Directions for Student
Services, 2006(115), 5–17.
Offstein, E. H., Larson, M. B., McNeill, A. L., & Mwale, H. M. (2004). Are we doing enough for
today’s graduate student? The International Journal of Educational Management, 18(6/7),
396–407.
Patton, M. Q. (2012). Essentials of utilization-focused evaluation. Thousand Oaks: Sage.
Schlossberg, N. K., Waters, E. B., & Goodman, J. (1995). Counseling adults in transition: kinking practice
with theory (2nd ed.). New York: Spring.
Schram, L. N., & Allendoerfer, M. G. (2012). Graduate student development through the scholarship of
teaching and learning. Journal of Scholarship of Teaching and Learning, 12(1), 8–22.
Smallwood, S. (2004). Doctor dropout. Chronicle of Higher Education, 50 (19), A10. Retrieved from: http://
chronicle.com/article/Doctor-Dropout/33786
Stone, C., van Horn, C., & Zukin, C. (2012). Chasing the American Dream: recent college graduate and the
great recession. New Brunswick: John J. Heldrich Center for Workforce Development.
250 Educ Asse Eval Acc (2015) 27:223–251
29. US Department of Education, National Center for Education Statistics. (2005). Integrated post-secondary
education data system, Fall 2004. Washington: US Department of Education.
Vincente, P., & Reis, E. (2010). Using questionnaire design to fight nonresponse bias in web surveys. Social
Science Computer Review, 28(2), 251–267.
Weidman, J. C., Twale, D. J., & Stein, E. L. (2001). Socialization of graduate and professional students in
higher education: a perilous passage? San Francisco: Jossey-Bass.
Williams-Tolliver, S. D. (2010). Understanding the experiences of women, graduate student stress, and lack of
marital/social support: a mixed method inquiry. Doctoral dissertation, Capella University.
Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation
standards: a guide for evaluators and evaluation users. Los Angeles: Sage.
Educ Asse Eval Acc (2015) 27:223–251 251