Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Measuring Continuous Software Complexity
1. A Tool to Measure Continuous Complexity
John C. Thomas, John T. Richards, Cal Swart, Jonathan Brezin
IBM T.J. Watson Research Center
PO Box 218 Yorktown Heights New York 105098 USA
ABSTRACT complexity” are often used interchangeably although
In this paper we describe two types of complexity; “psychological complexity” is potentially a broader term
continuous complexity and discontinuous complexity. We that could encompass, for example, the emotional
describe a tool to measure continuous complexity that we complexity of relationships, the perceptual complexity of a
developed with an eye both to psychological theory and to fine Cognac or the motor complexity of a finely tuned golf
the practicality of the software development process. The swing. Although these types of complexity might impact
use of the tool requires developers to change perspective usability, investigators in Human Computer Interaction
from system functionality to the tasks required of the users have generally focused on “cognitive complexity.” [6]
and this in itself has value. The quantitative output of the
tool provides metrics to measure progress in reducing It is useful to distinguish between intrinsic complexity (that
overall complexity as well as pin-pointing where a required supplied by the nature of the environment or the task itself)
task is particularly complex. Although this tool focuses on and gratuitous complexity (additional complexity imposed
continuous complexity, it also allows the developer to by a tool, system, or artifact). In general, we want to
document instances of discontinuous complexity. We minimize gratuitous complexity although in a few contexts,
illustrate the output of the tool for anonymized installations. that may not be the case. For instance, in vigilance tasks,
educational, aesthetic, or entertainment settings,
Author Keywords minimization of complexity might not be the goal.
Complexity, metrics, usability, modeling, methods, tools. Regardless of just how complex one wishes to make a
system, however, having a reliable and valid way to
ACM Classification Keywords measure complexity is important.
H5.m. Information interfaces and presentation
Miscellaneous; H.1.2 Models and principles, User/machine APPROACHES TO MEASURE COMPLEXITY
systems; I.6.3 Simulation and modeling, applications. One useful approach to cognitive complexity is the work on
the “Cognitive Dimensions of Complexity.” [4]. It is
INTRODUCTION probably better thought of as a useful tool to aid discussion
Complexity is a widely used concept in fields as diverse as than as a “metric” of complexity. More quantitative
biology, computation, economics and psychology. Here, we methods of modeling human behavior include notably,
are concerned primarily with the psychology of complexity. GOMS [2] and EPIC [5]. The use of these models is not
The terms “psychological complexity” and “cognitive limited to human-computer interaction, but they can
certainly be applied there. In cases where an economically
significant number of users will be using a product for a
significant amount of time, such approaches can be quite
useful [3]. However, in other cases, systems and
applications are developed for a small number of users;
increasingly, end users are even creating essentially ad hoc
applications for themselves. In addition, many applications
and systems are subject to updates that are made so
frequently that such an extensive approach to modeling is
not economically feasible. Finally, such fine modeling is
1
2. best done after special training in addition to a background increases the likelihood of adoption as well as acceptance
in cognitive psychology. For these reasons, we set out to of results. That may or may not be ideal, but it is a reality
develop a tool to help developers quickly obtain reliable that faced our team as designers.
and useful quantitative metrics about the complexity of
what they were building. The Utility of Faster Feedback in Learning
Studies have long indicated that delayed feedback can be
CONTINUOUS AND DISCONTINUOUS COMPLEXITY very confusing and disruptive; for instance, talking with
Our main goal was to develop a tool that produced metrics delayed auditory feedback or watching one’s motor
of complexity. We envisioned this to be something that performance with delayed visual feedback can be very
would give metrics in terms of a fairly continuous scale. In disruptive. When feedback is delayed, it can also make
our experience both with observing usability and in learning extremely difficult. Mere passage of time makes
attempting to build usable applications, we also see cases learning more difficult and in addition, in the real world, it
where the most natural description is that usability is often makes the attribution of error (and therefore, choosing
discontinuous. That is, the user does not simply feel among potentially corrective strategies) more ambiguous.
somewhat more frustrated, take a little longer to do a task, There is a trade-off therefore, between tools that provide the
or make a few more errors along the way. Rather, it often greatest verisimilitude to real world usage (which requires,
happens that the user is completely prevented from making ultimately, real users using the tool with real
any further progress without outside intervention. For this documentation, real product and real support systems) and
reason, we wanted to include in the tool a simple facility for those that are available as early as possible in the design-
documenting such cases. development-deployment life cycle.
As an example of discontinuous complexity, an installation
THE MODEL UNDERLYING THE TOOL
process was meant to install several components and the The complexity model underlying the tool is based on the
installation kept failing. The associated log file was empty. work of Brown, Keller and Hellerstein [1]. This model
After several tries, the expert user attempted to install one measures complexity along three dimensions: the number
of the components by itself. In this case, an informative of steps in a process, the number of context shifts, and the
error message returned indicating that there was not enough working memory load required at each step.
memory. The user added memory, re-ran the installation
and succeeded. In this case, an underlying informative Rationale for Number of Steps
error message was somehow “blocked” in the over-arching Of course, not all “steps” are equal and so using the sheer
process and not surfaced despite its criticality. number of steps as a metric is somewhat arbitrary.
However, in most of the tasks we have studied so far
THE CONTEXT OF OUR MODEL (installation, configuration, and simple administration), the
Special purpose tools are ideally developed with respect to steps can be defined fairly objectively. In GUI’s, every
a particular context, set of tasks, and set of users. In our new screen is considered one step. In line-oriented
case, we wanted to develop a tool that would be useful to interfaces, every “Enter” is considered another step.
developers doing actual software development. Typically, in comparing alternative products or various
versions of one product, the “steps” are fairly similar in
The Software Development Process
complexity (except as captured in the other two metrics;
Software development has become, in many ways, a race
i.e., memory load and context shifts). There are two major
against time. While it is difficult to amass overall statistics,
dissatisfactions or shortcomings with the model as applied
even the potential outliers provide some insight. For
to straight-line processes. One is that it does not capture the
example, one software system issued successive releases on
complexity of the reading that is required either on the
6-19-01, 7-23-01 and 8-17-2001. Another site lists release
screen or with accompanying documentation. The second
dates as 4-20-2005, 7-7-2005, 9-28-2005, 2-28-2006, and
is that it does not measure how much background
3-10-2006. Realistically, how much do such schedules
knowledge is required to decide which items need to be
allow for unit and functional testing, let alone user testing
noted for future reference. Nonetheless, in general, as
or constructing detailed psychological modeling? The
processes gain more steps, there is a fairly uniform increase
educated guess would be, little time indeed.
in the chance of an error and certainly, an increase in time.
As these tasks are performed in the real world, each
The Culture of Metrics and Tools
additional step also increases the probability of being
Our particular corporate culture places a high value on
interrupted by some other task which again increases both
“objective” measurement. To the extent that we can
the chance of error and requires added time to recover state.
provide a tool that offers a way to measure complexity with
a minimum of interpretation on the input side and a
Rationale for Memory Load
maximum of quantification on the output side, that
Memory load is increased whenever the user sees
3. something on a screen that must be stored and used for consume data, so it should be convenient to enter new items
some future step. Again, in detail, we know that the actual of any of these types at any point. To answer this need, we
memory load will depend on the type of item that needs to used a tabbed display, one tab each for actions, data, and
be stored and on the user’s experience and strategies. contexts, and one for the final scoring of the model. (The
However, as a first approximation, each new “item” that the tabs are implemented using the Dojo JavaScript toolkit.)
user must take note of and remember, increases felt
The model for a single task is sufficiently small that it can
complexity as well as increasing the chance for error. Even
easily be kept in JavaScript data structures, so moving
without error, it takes longer to recover a particular item
between tabs involves no delay. When, for instance, the
from working memory if there are more items in working
actions tab is selected, a list of actions is shown (in the
memory.
order in which they occurred), and one action is shown as
selected. The selected action’s details are visible for
Rationale for Context Shifts
Context shifts were originally defined by the model builders editing, and buttons are used to maintain the list: inserting a
in terms of computing contexts (server vs. client, e.g., or new action, reordering those already there, or marking the
operating system vs. data base). We have kept such points at which context switches occur. Editing changes are
changes as context shifts but broadened the definition to immediately transmitted by an asynchronous HTTP POST
include shifts between applications or between installation to the server, where they are written to a database that is
components. If an installation requires the installation of used to maintain the persistent state of the task model. The
three sub-components, these components often have data and context tabs differ from the actions tab only in
somewhat different appearances and conventions. their view of the content of the model.
Context shifts can be disruptive to working memory. In The relational database is straightforward: it requires tables
addition, different contexts often employ different for tracking users, tasks, actions, data, contexts, and data
conventions and this can cause interference resulting in usage. The latter tracks pairs of data items and actions to
longer latencies, a greater chance of error, or both. record which actions produce, and which consume, what
data. The users table allows us to use a single database to
ITERATIVE TOOL DEVELOPMENT PROCESS serve many users. Each task table entry has a column that
The original model that we built on took a detailed XML indicates which user’s task it is. Each of the remaining
description of the task as input. We thought it unlikely that tables (actions, etc.) has a column that indicates which task
developers would use a complexity metric that required it belongs to. Thus each action, etc. belongs to a single
this. Therefore, we developed a GUI tool to allow users to task, and each task to a single user.
define tasks, action steps, context switches, and memory For purposes of communicating the model to other
load without having to use XML. The tool was used by a applications that might wish to use it, there is an XML
small group of people for some months. Interviews, Schema that describes an XML document for the raw data
observations, and spontaneous comments were all used to associated with a single task. There will be a forthcoming
drive a continuous round of changes in the interface. Schema for the scored model as well. One common and
convenient method of working is to open two instances of
THE STRUCTURE OF THE TOOL AND DATABASE the tool; one where successive action steps are noted and
The tool we developed to facilitate the data entry required one where data items are noted in order to calculate
by the model is a classic dynamic HTML application with memory load. In some cases, tool users watch an expert
the persistent data held in a relational database at the server. perform a task, take notes and then code the result. In other
The client is implemented with XHTML and JavaScript, cases, an expert performs a task such as installation and
and the server with PHP and MySQL. captures each step via screen shots which are then sent to
There are two phases from the point of view of user input: the tool user for coding.
creating or choosing the task to work on, and for a fixed
task, entering its details. Working with the task list is EXAMPLE CONTINUOUS RESULTS
simple and handled easily with a drop-down list and a The tool allows developers and managers to calculate
button or two. The essential problem is to use the screen overall metrics for their products and to gauge progress
real-estate effectively to enter the details for a fixed task, through successive releases. Figures 1 and 2 respectively
which involves two parallel lists, actions, and data, both of show anonymized results for installation of comparable
which can be expected to be of the order of several tens products in terms of steps, and memory load, respectively.
long, perhaps a hundred or so. There is also a much smaller The first figure shows that products differ significantly in
list of contexts, less than half a dozen or so in the vast number of steps required and that custom installs require
majority of cases. only a few more steps than taking all the defaults.
Actions take place in contexts, and they produce and
3
4. Figure 3. Blue “spikes” illustrate action steps that require a
particularly high memory load.
USAGE
The tool has had 75 people register to use it. Interviews
with a subset of users finds general agreement that the
current user interface is relatively straightforward to use
and a significant improvement over the first iteration. The
tool is being used by personnel in a number of different
product lines within our company. It is used by both
developers and by UI practitioners.
CONCLUSION
The tool, though based on a simplified model of cognitive
complexity, is useful and usable by developers during
development. To the extent that development teams take
the effort to really engage in “Outside-In-Design,” and
Figure 1. Action steps needed to install comparable products specify relatively detailed user tasks in advance of system
when taking all defaults and for a custom installation. design, the tool can be used even earlier in the overall
Figure two shows, however, that custom installs require system development process. The tool helps management
considerably more memory load. Taken together these determine roughly the competitive position of their
figures illustrate that simple but meaningful comparisons products with respect to complexity and whether progress
between whole software packages are possible using our toward simplification is being made with successive
tool. versions. For developers and HCI professionals, the tool
also provides pointers to those places in a task which
require a particularly large memory load thereby focusing
efforts to improve usability.
ACKNOWLEDGMENTS
We thank anonymous reviewers as well as our colleagues
for comments and suggestions regarding this paper. We
thank our users of the tool and our funders.
REFERENCES
1.Brown, A. B., Keller, A. and Hellerstein, J. L. (2005), A
model of configuration complexity and its application to a
change management system. In Proceedings of the Ninth
IFIP/IEEE International Symposium on Integrated
Network Management. (IM 2005).
2.Card, S.K., Moran, T.P., and Newell, A. (1983), The
Figure 2. Memory load needed to install comparable products
when taking all defaults and for a custom installation. psychology of human-computer interaction. Hillsdale,
N.J.: Erlbaum.
3.Gray, W.D., John B.E., Stuart, R., Lawrence, D., &
In addition to allowing overall comparisons, the tool can Atwood, M.E. (1990), GOMS meets the phone company:
help pinpoint specific areas of complexity in terms of Analytic modeling applied to real-world problems. In
memory load as shown in Figure 3, below. Proceedings of IFIP ’90: Human Computer Interaction.
29-34.
4. Green, T.G.R. (1989), Cognitive dimensions of
notations. In A. Sutcliffe & L. Macualay (Eds.), People
and computers V. Cambridge: Cambridge Unverity Press.
5. Kieras, D. and Meyer, D. (1997). An overview of the
EPIC architecture for cognition and performance with
application to human-computer interaction. Human-
Computer Interaction, 12, 391-438.
5. 6.Rauterberg, M. (1996). How to measure cognitive Society for Cybernetic Studies.
complexity in human-computer interaction. In
Cybernetics and Systems ’96, 815-820. Vienna: Austrian
The columns on the last page should be of approximately equal length.
5