SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Page |
CIT
DATA EXPLORATION
AND INFORMATION
RETRIEVAL IN THE
B2C MARKETING
DEPARTMENT OF THE
FINANCIAL TIMES
Seun Odueko
December 2013
Page | i
ABSTRACT
A case study of marketing professionals in the B2C Marketing department of the Financial
Times investigated their patterns of behaviour when interacting with data sources and
analytics tools to perform work tasks. The study was conducted within the framework of
search and discovery proposed by Russell-Rose and Tate. Data was gathered by observing the
participants in their natural work environment, and their data exploration goals and strategies
were analysed respectively, using Attfield and Blandford's Information Journey Model and
Russell-Rose et al.'s Dimensions of Search User Experience. The findings reveal that the
participants’ goals were mainly static rather than dynamic, but they performed ‘exploratory
search’ activities to fulfil them. Evidence was also found in support of Search Modes, and their
occurrence in clusters of Mode Chains, within a live work environment. Recommendations for
design features of analytics tools that facilitate the observed patterns of behaviour were
made, with examples of how they could be implemented.
Keywords: User Goal; Search Mode; Mode Chain; Data Exploration; Information Retrieval.
Page | ii
1 Table of Contents
ABSTRACT....................................................................................................................................... i
LIST OF FIGURES............................................................................................................................iv
LIST OF TABLES..............................................................................................................................iv
2 PROBLEM DESCRIPTION AND OBJECTIVES............................................................................ 1
2.1 Background ................................................................................................................... 1
2.2 Question........................................................................................................................ 2
2.3 Objectives...................................................................................................................... 2
2.4 Outcomes...................................................................................................................... 2
2.5 Beneficiaries.................................................................................................................. 3
2.6 Structure ....................................................................................................................... 3
3 ACADEMIC CONTEXT............................................................................................................. 4
3.1 Data Exploration............................................................................................................ 4
3.2 User Type ...................................................................................................................... 5
3.3 User Goal....................................................................................................................... 6
3.4 Search Context.............................................................................................................. 7
3.5 Search Mode ................................................................................................................. 7
3.6 Data Type ...................................................................................................................... 8
3.7 Analytics Tool................................................................................................................ 9
4 METHODS............................................................................................................................ 11
4.1 Participant Selection ................................................................................................... 11
4.2 Data Gathering............................................................................................................ 12
4.2.1 User Type ............................................................................................................ 12
4.2.2 User Goal............................................................................................................. 15
4.2.3 Search Context .................................................................................................... 15
4.2.4 Observation......................................................................................................... 16
4.3 Data Preparation......................................................................................................... 20
4.3.1 Tools.................................................................................................................... 20
4.3.2 Coding ................................................................................................................. 20
5 RESULTS AND DISCUSSION.................................................................................................. 23
5.1 User Goal..................................................................................................................... 23
5.1.1 Code Frequency .................................................................................................. 23
5.1.2 Code Sequences.................................................................................................. 25
5.1.3 Designing for User Goals..................................................................................... 26
5.2 Search Mode ............................................................................................................... 28
5.2.1 Code Frequency .................................................................................................. 28
5.2.2 Code Sequences.................................................................................................. 42
Page | iii
6 EVALUATION, REFLECTIONS AND CONCLUSIONS............................................................... 46
6.1 User Goal..................................................................................................................... 46
6.2 Search Mode ............................................................................................................... 47
6.3 Project Management .................................................................................................. 47
6.4 Further Research......................................................................................................... 49
7 GLOSSARY............................................................................................................................ 50
7.1 Usability....................................................................................................................... 50
7.2 User Experience .......................................................................................................... 50
7.3 User Interface.............................................................................................................. 50
7.4 Cognition..................................................................................................................... 50
8 REFERENCES........................................................................................................................ 51
Page | iv
LIST OF FIGURES
Figure 1: Serialist......................................................................................................................... 14
Figure 2: Holist ............................................................................................................................ 14
Figure 3: Frequency of User Goals for All Participants ............................................................... 23
Figure 4: Frequency of User Goals per Participant ..................................................................... 24
Figure 5: Code Sequences - Participant H................................................................................... 25
Figure 6: Code Sequences - All Participants................................................................................ 26
Figure 7: Number of Occurrences per Code ............................................................................... 28
Figure 8: Number of Participants per Code ................................................................................ 29
Figure 9: Workbook Connections................................................................................................ 34
Figure 10: Evaluation Comments................................................................................................ 34
Figure 11: Current Selections...................................................................................................... 35
Figure 12: Code Sequences......................................................................................................... 42
Figure 13: Dashboard Supporting Monitor-Compare-Evaluate.................................................. 44
Figure 15: Dashboard Supporting Explore-Compare-Evaluate................................................... 45
Figure 14: Dashboard Supporting Explore-Compare-Evaluate................................................... 45
LIST OF TABLES
Table 1: Domain Expertise of Participants.................................................................................. 12
Table 2: Technical Expertise of Participants ............................................................................... 13
Table 3: Learning Style of Participants........................................................................................ 13
Table 4: User Type of Participants .............................................................................................. 14
Table 5: Job Roles of Participants ............................................................................................... 16
Table 6: Summary of Observation Sessions................................................................................ 19
Table 7: Codebook of User Goals................................................................................................ 21
Table 8: Codebook of Search modes (those highlighted in green are my own “grounded codes”)
..................................................................................................................................................... 22
Page | 1
2 PROBLEM DESCRIPTION AND OBJECTIVES
2.1 Background
Business data has evolved from being perceived primarily as a by-product of business
processes (Magal and Word, 2009, p.18) to a key productivity driver through which
organisations derive competitive advantage (Barton and Court, 2012). In the present era of
“big data” where organisations face a deluge of different forms of data from various internal
and external sources and at near real-time speed (Beyer and Laney, 2012, cited in Sicular,
2013), those who have successfully harnessed it in their decision-making process achieved
average improvements of 5% in relative productivity and 6% in relative profitability (McAfee
and Brynjolfsson, 2012).
Research into user attitudes towards business intelligence software, conducted in October
2012 by Ventana Research (Smith, 2013), revealed that a greater percentage of respondents
are more concerned about usability (63%) than functionality (49%) or reliability (46%).
Analytics tools tend to be optimised for functionality and although accessible to experts, end-
users are unable to make the most of them or even use them at all due to their complexity
(Barton and Court, 2012). IDC researchers concluded that the business analytics software
market will grow at a compound annual rate of 9.8% between 2012 and 2016 (Vesset et al.,
2012), reaching $50.7 billion (Taft, 2012). Senior executives are increasingly deploying analytics
tools in their organisations, contributing to a shortage of Information Technology (IT) and
analytics staff with the necessary skills to use them (Vesset et al., 2012). Hence there is a
requirement to design tools that fit the way managers and frontline staff, as opposed to data
analysts, explore information (Smith, 2013). The understanding of how they explore data and
gain actionable insight is important to senior executives making strategic decisions on how
data is used across the organisation as well as data analytics and visualisation professionals
developing tools and models for end-users.
Any organisation producing goods or services has a requirement to identify and fulfil the needs
of its target customers (The Chartered Institute of Marketing, 2009) and as such, the marketing
function is central to the existence of organisations of virtually all sizes and sectors. Marketing
departments are forecast to spend more on technology than IT departments by 2017
(McLellan, 2012) due to growth in electronic commerce (e-commerce) sales volumes and
increasing adoption of social media and mobile channels for engaging customers (Murphy,
2012), with spending largely focused on business and web analytics (Moth, 2013). The
Business-to-Consumer (B2C) marketing department of the Financial Times – a global business
news organisation – includes customer acquisition, customer retention, customer relationship
management (CRM) and payment optimisation teams. Current analytics tools are based on
user-centred Microsoft Excel (Microsoft, n.d.) models and expert-retrieved Oracle SQL (Oracle,
n.d.) reports; as a member of the marketing operations team, I am responsible for retrieving
SQL reports and building Excel models for the marketing professionals. These are in the
process of being consolidated in Microsoft Excel 2010, using the PowerPivot (Microsoft, n.d.)
add-in to empower end-user exploration of data on OLAP cubes (Microsoft, n.d.) and retrieval
of information with minimal expert intervention. Thus the data exploration patterns of
behaviour exhibited by members of the department may yield useful insight into how analytics
tools may be optimised for the marketing function.
Page | 2
This will also complement scientific and industrial research into the data exploration patterns
of behaviour of non-experts in a work context. It offers an opportunity to build on existing
insight into the search patterns of professionals including sales and marketing specialists
(O’Day and Jeffries, 1993) and feed into forthcoming research into data exploration tools
designed for non-IT experts working in a variety of sectors (The Technology Strategy Board,
2013).
2.2 Question
This study seeks to contribute to the body of knowledge about the characteristics of non-
experts and their data exploration patterns of behaviour, which can be used to inform the
design of business analytics tools with high usability for staff and thus higher returns on
investment (ROI) for organisations.
The research question to be answered is “how do Financial Times’ B2C marketing professionals
in the customer acquisition, customer retention, CRM and payment optimisation teams
explore customer data and retrieve required information?” and this will be investigated from
two perspectives:
 Why and how do they seek information?
 What strategies do they employ to find and understand information?
Data is usually distinguished from information with the former consisting of numbers and
words that exist in a raw, unprocessed state while the latter has added qualities of
organisation and relationships that make it meaningful (Yacci, 1999, Pohl, 2001). For the
purposes of this study however, no distinction is made between data and information and they
are used interchangeably, following a precedent established in previous research (Baker et al.,
2009).
2.3 Objectives
My primary objective is to research and analyse the behavioural processes exhibited by the
marketing professionals as they interact with data sources and analytical tools to fulfil specific
work tasks.
My secondary objective is to use the results of the analyses to suggest user interface (UI)
design features of analytics tools that support the behavioural processes.
2.4 Outcomes
The project outcomes will be:
 Closed-format questionnaires will be used to gather data and analysed to classify the
participants according to their domain expertise, technical expertise, cognitive style
and learning style. Plans to classify the marketing professionals’ work data according
to Cool and Belkin’s dimensions (2002, cited in Russell-Rose and Tate, 2013, p.58) have
been discontinued in order to keep the study’s scope manageable.
Page | 3
 Participants will be observed in their natural working environment while performing a
regular work task; the observation data will be coded and analysed to yield a
description of the methods and reasoning processes followed in selecting and
interpreting information. The analysis framework to be used has been changed from
Sensemaking (Pirolli and Card, 2005) to Dimensions of Search User Experience (Russell-
Rose and Tate, 2013, p.1) as the latter generates a more robust insight into data
exploration processes.
 Desirable UI features for supporting those processes that can be implemented in
Microsoft Excel 2010 and alternative business analytics tools will be suggested,
drawing on prior research, existing software features and creative thinking. There
were initial plans to go beyond suggesting the features and implement them by
designing dashboards in Microsoft Excel for the marketing professionals, which would
be subsequently evaluated. However, this has been shelved due to time constraints.
2.5 Beneficiaries
The intended beneficiaries of the project are:
 Academics and researchers in the fields of information retrieval and user experience
(UX) design.
 Senior executives that make investment decisions on business analytics software for
their organisations.
 Data analysts and IT professionals that deploy analytics tools in organisations and
design models for end users.
 Business analytics software vendors that develop the applications.
2.6 Structure
The project report continues in Chapter 3 with a review of relevant academic literature in the
field of information seeking to identify previous research and established theories that have
shaped the current body of knowledge. These will provide the study with requisite analytical
frameworks that inform the data gathering and result evaluation strategies.
In Chapter 4, suitable methods identified in the reviewed literature will be applied to the
study, with a description of the participants’ selection process and explanations of how data
has been gathered and analysed to yield insight.
Chapter 5 presents the results of the analyses, and the findings are discussed in light of the
objectives of the study, the extent to which established theories are supported and
recommendations for their implementation.
The project report concludes in Chapter 6 with an evaluation of the management and
execution of the project as well as the outcomes. Suggestions for follow-up research are also
made and personal lessons learned from undertaking the project are reflected upon.
Page | 4
3 ACADEMIC CONTEXT
The subject of data exploration has received considerable attention from the academic
community and a range of established theories have emerged from empirical research. The
research findings provide sound frameworks for understanding the various facets of data
exploration, identifying gaps in the body of knowledge that need to be addressed and
providing direction for future research efforts. Hence a review of academic literature has been
conducted to elucidate the impact of personal and professional traits on search behaviour, the
differing effects of both information requirements and types of task on search process as well
as the strategies employed in retrieving information and gaining insight. The appropriate
format for representing different types of data, and analytics software features suitable for
data exploration are also examined. For each facet, limitations of existing research are
highlighted where applicable. The facets provide the analytical framework for this study, with
the theories guiding the choice of data to be gathered and the criteria for evaluating results.
3.1 Data Exploration
Exploration in its various forms – investigating, studying, analysing, testing, experimenting,
discovering or examining (“Exploring,” n.d.) – is a fundamental human experience and lies at
the heart of our individual and collective advancement. It is the process by which we satisfy
our curiosity (Edelman, 1997), whether in fulfilment of a defined objective or as an end in itself
(Berlyne, 1954, cited in Loewenstein, 1994, p.77).
In the domain of information science, exploration is synonymous with the discovery or
synthesizing of new knowledge from existing data; it is a journey rather than a race having a
pre-conceived outcome (Gossen et al., 2012). It can involve “Exploratory Data Analysis”
whereby data is analysed and information that cannot be immediately explained is
hypothesised about and tested (Tukey, 1977). Although a statistical technique, “Exploratory
Data Analysis” can be generalised for the manner in which users confronted with puzzling
information go through an iterative process of possible explanations, which are tested with the
available data to prove or disprove them (Gossen et al., 2012), making it synonymous with
data exploration.
In general terms, a task can be categorised as data exploration if it involves the examination of
data without prior knowledge of the information to be discovered (Grinstein and Ward, 1997,
Tukey, 1977, Tukey, 1980, cited in Baker et al., 2009). Data exploration is a search activity
(Tukey, 1977), that is, looking into or over data carefully or thoroughly in an effort to find or
discover information (“Search,” n.d.). However, there is a distinction between simple item
retrieval that does not require examination and ‘exploratory search’, which employs learning
and investigating methods for information discovery (Marchionini, 2006).
In order to appreciate the intricacies of data exploration, it is beneficial to view it through the
four “dimensions of search user experience” that frame information search and discovery
(Russell-Rose and Tate, 2013, p.1), namely User Type, User Goal, Search Context and Search
Mode, as well as data type and analytics tool.
Page | 5
3.2 User Type
The “level of knowledge and expertise” (Russell-Rose and Tate, 2013, p.1) of users has a
significant impact on the way a given data set is explored and thus it ought to be taken into
consideration in the design of search and analytics tools.
“Domain experts” possess a high level of subject-matter knowledge and experience relative to
“domain novices” while “technical experts” are more versed in the use of information systems
to extract information in comparison to “technical novices” (Russell-Rose and Tate, 2013, p.4).
Users who possess both domain and technical expertise tend to drill deeply into a narrow
subset of data and use advanced techniques to evaluate information, in contrast to novices in
both areas who explore a data set widely without venturing beyond the surface and engage in
minimal evaluation (Jenkins et al., 2003). In-between these polar extremes are users who are
domain experts and technical novices, exploring widely with rigorous evaluation of
information, and others who are domain novices and technical experts, exploring both widely
and deeply with less sophisticated evaluation (Jenkins et al., 2003).
The study by Jenkins et al. (2003) and related research by Kim (2001) focused on web searches
and it would be interesting to see how their findings apply to business users of analytics tools.
Although a study – involving participants with a broad range of domain knowledge and mostly
high technical expertise – has been conducted into the use of interactive information systems
in the execution of work tasks (Li and Belkin, 2010), the effects of user knowledge and
expertise were neutralized rather than incorporated into the study.
As there is a requirement to design analytics tools that fit the way business users explore
information (Fisher et al., 2012), an understanding of this matrix of domain and technical
expertise as it applies to the marketing professionals can inform UI design and potentially
increase their speed in exploring data and making discoveries; this raises the hazards of lax
analysis and poor judgment, however (Fisher et al., 2012). Expertise is a function of time and
experience and as such it is not advisable to design for a specific group (Russell-Rose and Tate,
2013, p.9); rather UI features can be used to support the transition of users from novice to
expert in order to remain relevant at each stage of their development cycle.
In the same vein, users’ cognitive style of information processing and learning style of concept
representation (Russell-Rose and Tate, 2013, p.14) affect how they explore data and can be
supported by the UI of analytics tools. “Holistic thinkers” process information by starting with
a high-level view of a given data set in order to grasp its wider context while conversely, “serial
thinkers” zero in on details and analyse its constituent parts (Russell-Rose and Tate, 2013,
p.11). “Holistic thinkers” are “global processors” who progress from developing a broad
overview to focusing on specifics, in contrast to serial thinkers who are “analytic processors”,
progressing from a step-by-step grasping of facts to an understanding of the overall concept
(Dunn and Griggs, 2003, cited in Denig, 2004). This suggests that cognitive style influences
users’ starting point and progression path when seeking information and the business analytics
software industry may benefit from empirical research into the relationship between
information-seeking patterns of behaviour and cognitive styles of users.
Similar understanding of the preferred learning styles of business users can inform the
representation of data, with tables and charts provided for verbal and visual learners
respectively (Russell-Rose and Tate, 2013, p.14). On the other hand, representing data to suit
such preferences may not always be appropriate as the nature of the work task and
exploration requirements may determine the format of the data; visual learners may be better
served by tabular data for in-depth analysis, for example.
Page | 6
3.3 User Goal
Lookup, Learn and Investigate have been identified as search tasks conducted by users in
fulfillment of their goals (Marchionini, 2006). Lookup tasks are carried out as a systematic
information retrieval process when the goal is clearly defined, such as finding the answer to an
unambiguous question like the number of new customers acquired each week during a
particular year. On the other hand, goals that are vaguely defined or unknown during the
search activity, such as comparing the performances of all acquisition channels employed or
analyzing the causes of a drop in acquisition numbers in the third quarter of the year, may
require an iterative analysis of multiple data before satisfactory insight is derived. In other
words, the more complex and loosely defined the User Goal, the more exploratory iterations
needed to uncover relevant insight (Li and Belkin, 2010). These learning tasks and investigating
tasks are classified under ‘exploratory search’ (Marchionini, 2006).
Gossen et al. (2012, p.290) defined ‘exploratory search’ as “a highly dynamic process of a user
to interact with an information space in order to satisfy an information need that requires
learning about structure and/or content of the information space.” The exploration process
was seen as being shaped by the user’s perspective, which may in turn be transformed by the
insight gained and thus lead to further exploration. This notion is corroborated by dynamic
models of information seeking, such as the Information Journey Model (Attfield and Blandford,
2010, p.29), which acknowledge that the act of interpreting data modifies the original goals
(Belkin, 1993).
It has been argued that the fundamental goal of users is to synthesize and further explore their
initial information requirements and therefore, the process of exploration is often more
important than the information discovered since the successful discovery of information does
not fulfill the users’ goals but rather lead to new requirements (Gossen et al., 2012, p.290).
However while the importance of the exploration process is acknowledged, the postulation
about the fulfillment of user goals could be challenged by the recognition that exploration
does not iterate indefinitely; there are entry and exit points in the search process, the former
arguably triggered by the desire to fulfill a requirement and the latter arguably triggered by
the fulfillment of the requirement. Bridging these two positions is the position that a user’s
goal in exploring data is sensemaking, that is, developing an understanding of the information
and synthesizing it before responding to it in fulfillment of a requirement (Baker et al., 2009).
From a different perspective, the Information Journey Model highlights that the entry point is
not restricted to the desire to fulfill a need; rather it could also be an accidental discovery of
information or serendipitous insight gained during the evaluation of information (Russell-Rose
and Tate, 2013, p.27). Likewise the exit point may not necessarily be triggered by the
fulfillment of the exact requirement; the demand placed on users’ attention by the volume
and variety of information explored, further amplified by time constraints, may lead to an exit
of the search process when the requirements have been “satisficed” (Simon, in Greenberger,
1971) as opposed to fulfilled.
A common thread is that an information journey, or data exploration, has a start and an end
point. Regardless of their trigger, both are only meaningful when aligned with a pre-defined or
newly formed goal of the user. Hence identification of actual, as opposed to simulated, work
goals of users in business organisations is essential to understanding their data exploration
behaviour. In order to derive findings with a higher degree of validity for business
organisations, work-based research is essential and the majority of research conducted in this
field of study has not addressed this.
Page | 7
3.4 Search Context
Baeza-Yates and Ribeiro-Neto (2011, p.5) distinguish between data retrieval and information
retrieval (IR), with the former concerned with matching query parameters to structured data
and the latter involving concepts of interpretation and meaning. Hence context is a critical
factor in evaluating the IR process.
The context of an information-seeking task determines its scope, technique, time length,
success criteria and resources used. Building on research by Järvelin and Ingwersen (2004),
Russell-Rose and Tate (2013, p.49) devised a model for representing search context in four
progressive layers of “information retrieval” – akin to Baeza-Yates and Ribeiro-Neto’s data
retrieval definition and Marchionini’s Lookup task; “information seeking” – akin to Baeza-Yates
and Ribeiro-Neto’s IR definition and Marchionini’s Learn and Investigate tasks; “work task” –
organisational need or personal motive that motivates information seeking; and “cultural
context” – expectations associated with the organisation or person performing a work task.
Each successive layer encompasses the preceding one; hence an organisation’s cultural context
informs the importance attached to work tasks, which in turn are fulfilled by information
seeking activities that may involve information retrieval.
Work task in organisations is characteristically exploratory in nature and requires clear
communication of results to non-experts (Fisher et al., 2012). Functional teams such as
marketing or finance identify an information need and rely on the technical expertise of the
analytics team to retrieve the information (Fisher et al., 2012). This has a number of
drawbacks: the time lag between the request for information and the communication of
results; the increased potential for aspects of the original information need to be ‘lost in
translation’; less opportunity to explore the available data and gain additional insight beyond
the original information need. Hence there is a requirement for organisations to both build
robust data models and provide easy-to-use tools for frontline staff to directly interact with
the models (Barton and Court, 2012).
Furthermore, work task highlights the influence of Search Context on User Goal and associated
search activities. In Li and Belkin's (2010) experimental study of the relationship between work
task and the process of using information systems to search for information, the type of work
task – defined as any activity performed by people in fulfilment of their work responsibility,
similar to Russell-Rose and Tate’s organisational work task – determined how users searched
for information. Work tasks that involved knowledge acquisition, for example, resulted in
‘exploratory search’ activities and were perceived to be difficult to accomplish, with increased
time and information requirements. Similarly, work tasks with clearly defined goals that
required Lookup search activities required less effort to accomplish. Findings that different
types of tasks utilised different search processes (Toms et al., 2003) corroborate Li and Belkin’s
conclusions.
Hence a better understanding of organisational needs, including the marketing function, could
lead to improved UI design of analytics tools that support users’ information seeking
processes. This would provide easier access to relevant information in less time.
3.5 Search Mode
While Search Context is concerned with external factors that influence users’ information
seeking activities (Russell-Rose and Tate, 2013, p.48), Search Mode focuses on the strategies
Page | 8
employed in seeking information. The process of exploring data involves not only data retrieval
but also analyzing and interpreting the retrieved data to build understanding (Russell-Rose and
Tate, 2013, p.2). Thus a clear grasp of the entire cycle of exploration, that is, Search Mode, is
useful for defining search behavior and supportive features of analytics tools (Russell-Rose and
Tate, 2013, p.73).
The journey from realizing an information need to satisfactorily fulfilling it often involves a
series of search tasks (O’Day and Jeffries, 1993). Building on the search tasks of Lookup, Learn
and Investigate (Marchionini, 2006), insights gained from the behavior of business intelligence
applications have been distilled into nine modes of enterprise search and discovery for which
analytics tools can be optimized, ranging from location of objectively defined data to
subjective interpretation of associated sets of data (Russell-Rose et al., 2011). In a similar vein,
research into how professional clients of a library interpret and use search results pinpointed
three Search Modes – monitoring, planned and exploratory (O’Day and Jeffries, 1993). Users
tend to switch between different modes as a search activity progresses, forming a pattern of
Mode Chains until evaluation is completed and synthesis is achieved (Russell-Rose et al.,
2011). This was demonstrated by the library clients’ use of multiple Search Modes (O’Day and
Jeffries, 1993).
O’Day and Jeffries focused on the specific information needs and resultant search behavior of
professionals; however their study involved the use of librarians as intermediaries in the
search process. Hence there remains an opportunity to apply and possibly extend their
research findings to the direct IR activities of professionals.
The findings that the nature of work tasks may affect the level of effort and amount of
information required by users to complete them (Li and Belkin, 2010) underscores the utility of
optimizing UI for specific Search Modes, since modes are determined by the task being
performed. This has implications for the amount of detail presented and level of drill-down
depth made available, depending on task complexity.
3.6 Data Type
Extensive research has been conducted to examine various facets of information search and
knowledge discovery. The established body of knowledge is predominantly focused on the
exploration of textual data through the use of search engines and library databases. This leaves
a gap in the understanding of how quantitative data is explored and analysed to yield insight.
Tukey (1977) coined the term “exploratory data analysis” as a paradigm for simplifying the
description of quantitative data and actively looking beneath its surface to discover new
insight (Tukey, 1977, p.v). Data exploration is likened to a detective investigation, discovering
and evaluating clues and following their trail until insight is gained (Tukey, 1977, p.1). The
preferred investigation tool is a visual display of data, which is seen as vital to the discovery
process (Church, 1979).
Visualised data, consisting of a fixed scene, variable objects within the scene and defining
characteristics of each object, provides a graphical means by which users perceive data and by
extension, the underlying context which the data represents (Baker et al., 2009). Visual
representations such as graphs and charts are a popular way of depicting and enhancing the
perception of quantitative data. Although non-experts might be inclined to represent
quantitative data with graphs, the Cognitive Fit theory (Vessey, 1991) was developed as a
Page | 9
framework for determining the optimal conditions for visualisation and conversely, when such
data are best represented in tabular form. The research concluded that graphs are suited to
spatial tasks, requiring an overview of associations and relationships within the data, while
tables are suited to symbolic tasks, requiring an analysis of discrete components of the data
(Vessey, 1991).
While the validity of the conclusions is accepted within the relevant academic community, it is
worth noting that the type of data, rather than user, determines the proposed cognitive fit. It
may thus be of interest to find out whether users’ cognitive style and learning style have an
impact on the performance of graphs and tables. Specifically, it might be useful to investigate
the relative performance of graphs in spatial tasks carried out by serialists and verbal learners
as well as the relative performance of tables in symbolic tasks carried out by holists and visual
learners.
3.7 Analytics Tool
Marchionini (2006) asserts that data retrieval tools such as database systems and general-
purpose search engines, which have varying levels of technical expertise requirements, can
efficiently fulfill lookup tasks with human interaction limited to query formulation. Conversely,
learning and investigating tasks require tools that facilitate continuous human interaction in
the ‘exploratory search’ process (Marchionini, 2006). The need for continuous interaction with
retrieved data places users, rather than tools, at the heart of the search process (Belkin, 1993),
with design implications for user interaction with tools.
Users’ ability to explore data and discover associations is enhanced by the development of
bisociative (relationships between items in different data sets), as well as what I term
intrasociative (relationships between items in one data set), knowledge discovery tools, with
appropriate UI design. Gossen et al. (2012, p.291) argue that such UI need to support: dynamic
information seeking, identification of associations between data sets, creative ways of
exploring data and real-time access to live data. However the exploration of live data, which
are typically cloud-based, increases the complexity and cost of the analytics tool used and as
such, it may be more productive and feasible to explore subsets of live data stored locally
(Fisher et al., 2012).
Designing UI to facilitate creative exploration implies a flexible environment with loose
navigation structures to avoid shepherding users along a predetermined path, and this may be
daunting for non-experts and unsuitable for certain tasks. Hence, more robust knowledge
discovery tools would cater for different search mode sequences (Russell-Rose, Lamantia and
Burrell, 2011, cited in Russell-Rose and Tate, 2013, p.85), each with a customised UI.
The inherent superiority of graphs over lists in revealing associations between data sets
(Gossen et al., 2012, p.292) is tempered by advances in analytics tools, particularly with
quantitative data. Microsoft Excel (Microsoft, n.d.), one of the most pervasive tools, provides
pivot tables as a means of aggregating and combining data to yield insight. PowerPivot
(Microsoft, n.d.), a more advanced version implemented in Excel 2010 and above, includes
additional abilities to combine separate data sets - albeit with at least one field in common -
for analysis as well as the Slicer tool (Microsoft, n.d.) for flexible and interactive data filtering.
Specialised analytics tools such as Tableau (Tableau Software, n.d.) and QlikView (QlikTech,
n.d.) are optimized for non-experts to explore associations between data sets in a flexible and
Page | 10
interactive manner. The drawback from an organisation’s perspective is the extra cost of
deploying such tools, in contrast to Microsoft Excel, which is virtually available by default.
While QlikView offers a free version for individual use, accessing corporate data and sharing
insights require a paid license.
The challenges of evaluating the usability of knowledge discovery tools involve the creation of
realistic scenarios and recruiting of representative users (Gossen et al., 2012, p.295). An
effective way of sidestepping these challenges is to eschew laboratory-based testing and
conduct the evaluation in an actual working environment with typical end-users completing
real tasks. The main drawback here would be the risk of participants modifying their normal
behavior as a result (Oates, 2006, p.204), whether consciously or subconsciously.
Page | 11
4 METHODS
This is an explanatory case study (Yin, 2003, cited in Oates, 2006, p.143) investigating the data
exploration and information retrieval patterns of behaviour of four sub-groups of marketing
professionals in a business news organisation. The ubiquity of the marketing profession in all
industries and organisation sizes, established in section 1.1, makes the findings generalisable
for other B2C marketing professionals performing similar tasks. The type of generalisation
made is the “rich insight” yielded by the results (Walsham, 1995, cited in Oates, 2006, p.146).
The four “dimensions” of User Type, User Goal, Search Context and Search Mode (Russell-Rose
and Tate, 2013, p.1) provided the framework under which the study was conducted. The
participants completed closed-format questionnaires, which were subsequently analysed to
classify them according to their domain expertise, technical expertise, cognitive style and
learning style. A description of the goals was obtained from each participant at the beginning
of the observation session and analysis of the observation data yielded further insight into
their goals. The participants’ job roles, obtained from their LinkedIn profiles, and specific
details of their work task, gathered during the observation, defined the context of the
information retrieval activities. The observations were conducted in the participants’ natural
work environment while performing a regular work task, and qualitative analysis of the data
revealed their Search Modes, that is, strategies employed in selecting and interpreting
information. Further analysis yielded the sequence of occurrence of the Search Modes and
those with the highest levels of clustering are indicative of the Mode Chains followed by the
participants.
Desirable UI features supporting the Search Modes and Mode Chains – as well as User Goals –
which can be implemented in Microsoft Excel 2010 and alternative business analytics tools are
suggested, drawing on prior research, existing software features and creative thinking.
4.1 Participant Selection
A presentation was given to members of the B2C marketing department explaining the
purpose and format of the study. A detailed participant information sheet (Appendix C) was
provided to all twelve members of the department based in the London offices at the
commencement of the study, out of which eleven agreed to participate and signed an
informed consent form (Appendix D). One subsequently withdrew from the study, leaving a
total of ten participants for the duration of the study. Members of the department based in
the New York, Hong Kong and Manila offices were not invited to participate for logistical
reasons.
All sub-divisions of the B2C Marketing department were represented, vis-à-vis Audience
Development; Customer Acquisition – Optimisation; Customer Acquisition – Direct Marketing;
Customer Acquisition – Emerging Platforms; Customer Retention – Retention; Customer
Retention – Customer Relationship Management; and Marketing Operations – Payments
Optimisation. One participant was at Executive level, six were at Managerial level, one was at
Senior Managerial level and a further two were at Departmental Head level.
Page | 12
4.2 Data Gathering
Russell-Rose and Tate propose the use of four “dimensions” as a framework for analysing
search behaviour, re: User Type, User Goal, Search Context and Search Mode. While this study
investigates the last of these dimensions, and User Goal to a lesser degree, data was gathered
about the other dimensions through preliminary questionnaires and the actual observation in
order to define the boundaries of the Search Modes observed. As such, the data gathered
about User Type and Search Context is presented in this chapter rather than in Chapter 5,
since they do not form a central part of the study.
4.2.1 User Type
Participants completed an 11-question self-assessment of their proficiency in using IT tools to
analyse data, a 5-item questionnaire to determine their preferred learning style and a rod-and-
frame diagram exercise to indicate how they process information. Data about participants’
domain expertise was gathered from their LinkedIn profiles, with experts defined as possessing
3+ years of professional experience.
4.2.1.1 Domain Expertise
Data about participants’ marketing domain expertise was gathered from their LinkedIn
profiles, factoring in relevant academic qualifications, professional experience and industry
certifications. For the purposes of this study, the benchmark for classification as a “domain
expert” is the possession of a relevant first degree or higher and more than three years of
professional experience, based on the Chartered Institute of Marketing’s criteria (The
Chartered Institute of Marketing, n.d.). Those not meeting these criteria are considered a
“domain novice”. Table 1 shows the participants’ qualifications and number of years of
experience, with all meeting the “domain expert” criteria.
4.2.1.2 Technical Expertise
An 11-question self-assessment of proficiency in using IT tools to analyse data (Appendix E),
based on extracts from Bowling Green State University’s self-assessment questionnaire
Table 1: Domain Expertise of Participants
Page | 13
(Bowling Green State University Career Center, n.d.), was completed by participants. Each
question had five answer choices ranging from ‘very low’ to ‘very high’ and a corresponding
number range of 1 to 5. The questionnaire was scored by adding up the answer choices,
dividing by 55 and multiplying by 1000 to rebase their values between 0 and 1000. Isograd’s
TOSA proficiency scoring scale (Isograd, n.d.) was adopted in interpreting the scores, with
participants classified as ‘beginner’, ‘basic’, ‘productive’, ‘advanced’ or ‘expert’. Those with a
classification of productive or greater are considered “technical experts” while those classified
as basic or lower are considered “technical novices”. Table 2 shows the results of the
participants’ self-assessment with nine meeting the “technical expert” criteria and one falling
into the “technical novice” category.
4.2.1.3 Learning Style
A 5-question Learning Scenario Questionnaire (Mayer and Massa, 2003) (Appendix F), each
describing a learning scenario and having an answer choice of a verbal or visual learning style,
was used to determine participants’ preferred style of learning. Participants selecting a verbal
learning style three or more times were classified as verbal learners while those selecting a
visual learning style three or more times were classified as visual learners. Table 3 shows
participants’ answers, with all falling into the visual learning style category.
Table 2: Technical Expertise of Participants
Table 3: Learning Style of Participants
Page | 14
4.2.1.4 Cognitive Style
A rod-and-frame diagram exercise (Russell-Rose and Tate, 2013, p.9) (Appendix G) was
completed by participants to indicate how they process information. Those who drew a line
parallel to the edges of the rectangle were classified as “serialists” (Figure 1) while those who
drew a line along the north-south axis of the square background were classified as “holists”
(Figure 2). The diagrams indicated that eight participants are “holists” while two are
“serialists”.
4.2.1.5 User Type
The data of participants’ domain expertise, technical expertise, learning style and cognitive
style were combined to form a picture of the types of users whose search modes are being
studied (Table 4). This information mainly contextualises the observation data within Russell-
Rose and Tate's (2013, p.3) framework but efforts were also made to identify any influence of
participants’ characteristics on their data exploration activities.
Figure 2: HolistFigure 1: Serialist
Table 4: User Type of Participants
Page | 15
4.2.2 User Goal
Although the objective of this study is to gain insight into how users interact with data in order
to fulfil their needs, rather than the actual nature of the needs, knowledge of the participants’
goals provides a useful backdrop to their respective search modes. Prior to the observation,
the following questions were sent to participants to ensure their tasks have a clearly defined
goal:
1. What specific information do you need?
2. What triggered the need?
3. Which data set will you explore to meet the need?
4. Which software application is the data set contained in?
5. Which software application is the data set analysed in?
6. How will you act on the information once it is retrieved?
Where a participant performed more than one task, the questions had to be answered
separately for each task. The questions correspond to the four steps of the Information
Journey Model (Attfield and Blandford, 2010), which covers the lifecycle of information need
fulfilment. Questions 1 and 2 correspond to “recognising an information need” while questions
3 and 4 correspond to “acquiring information”. The third step, “interpreting and validating the
information”, can be inferred from question 5 and question 6 corresponds to “using the
information”.
A potential limitation of defining User Goals before the data exploration activity – as opposed
to capturing them during the activity – is it takes a static view of the goals by not
accommodating dynamic changes along the way. Hence while participants were required to
consider the questions in choosing a suitable observation task and answer them at the
beginning of the observation, further evidence of their goals were derived from the analysis of
the observation data after it was transcribed and coded.
4.2.3 Search Context
To aid an understanding of the context in which information seeking is conducted Russell-Rose
and Tate (2013, p.50) propose the viewing of data exploration activity from four progressive
perspectives: “information retrieval”, “information seeking”, “work task” and “cultural
context”. The work task layer in particular provides the professional context in which
participants engage in data exploration and information retrieval. The participants’ job
descriptions and general areas of responsibilities within the organisation were harnessed from
their LinkedIn profiles (Table 5) while information about the specific work task being fulfilled
was gathered during the observation (Table 6), based on their answers to the User Goal
questions in Chapter 4.2.2. Their general job roles and the specific duties carried out during
the observation convey the organisational need that motivated the information seeking. This
information mainly contextualises the observation data within Russell-Rose and Tate's (2013,
p.47) framework and to a lesser extent, was used in analysing the influence of participants’
work task on their data exploration activities.
Page | 16
4.2.4 Observation
Gaining insight into the behavioural processes followed by the participants as they explore
data and retrieve information necessitated a case study research strategy, as it allows in-depth
study of the phenomenon with all variables taken into consideration, and within their natural
setting of occurrence (Oates, 2006, p.141). Basing the case study on a regular task makes it
representative and thus generalisable to marketing professionals performing similar tasks
while providing an opportunity to test the applicability of established theories (Oates, 2006,
p.146), specifically the Information Journey model (Attfield and Blandford, 2010) and Russell-
Rose and Tate's (2013) framework for search and discovery.
Data was gathered by observing the participants while performing their tasks which, as a
“practitioner-researcher” working in the organisation, I was able to carry out after obtaining
necessary permissions without the need for familiarisation or access negotiations (Oates,
2006, p.209). However, challenges of balancing the study with my work responsibilities as well
as minimising interference with participants’ work schedule had to be managed. Despite
drawbacks of being relatively time-consuming and generating large amounts of unstructured
data (Rogers et al., 2011, p.261), as well as the risk of participants modifying their normal
behaviour, observation was selected over other data gathering methods such as interviews
and questionnaires because it captures what is been done and how; this is more reliable than
potentially selective verbal descriptions (Oates, 2006, p.198). Furthermore, observing
participants in their natural work environment rather than in a controlled environment
avoided the probable effects of artificial work conditions limiting the relevance of the results
(Rogers et al., 2011, p.261).
Observation sessions were scheduled for a maximum of 60 minutes but flexibility was allowed
to address concerns about the potential impact on participants’ work schedule and
productivity. 10 minutes was scheduled for introduction and equipment set up, followed by up
to 50 minutes of direct observation at their workstations. Participants were observed while
performing a regular task during their normal work schedule, using their normal data sets and
analytics tools.
Participants were required to think aloud as they performed their tasks so as to verbalise their
thought processes (Rogers et al., 2011, p.256) and questions were asked for clarification during
Table 5: Job Roles of Participants
Page | 17
the observation session rather than in a follow-up interview in order not to depend on
memory recall.
4.2.4.1 Introduction
At the beginning of each session, I explained:
 The participant’s role in the study;
 My role as the moderator;
 The format of the study;
 Think-aloud technique.
4.2.4.2 Equipment Set Up
The sessions were conducted in the natural work environment of the participants, using their
regular furniture and Windows laptop computer as well as data sources and data analysis
software routinely used to complete tasks.
Screencast-O-Matic (n.d.) screen capture java applet was used to record on-screen activities,
with participants’ voices recorded simultaneously by the applet using the computer’s built-in
microphone. The audio-visual recordings were exported in MP4 file format at the end of the
sessions.
4.2.4.3 Observation Tasks
At the beginning of each session, participants gave a description of their goals regarding:
 The specific information needed;
 The trigger of the need;
 The data set to be explored to meet the need;
 The software application the data set is contained in;
 The software application the data set will be analysed in;
 How the information will be acted on once it is retrieved.
Participants were requested beforehand to select a task that was performed regularly –
defined as at least once a week – for the observation sessions. Nine participants selected one
task while one participant selected two tasks. Of the eleven tasks observed, seven met this
criterion. One, which was performed once a month at the time of the observation, was
planned for increase to a weekly basis. Two tasks were performed regularly but without a fixed
schedule; the last task, although performed on an ad-hoc basis, was considered to be
representative by the participant who otherwise did not perform regular data exploration
tasks.
Seven of the observed tasks involved the creation of a report on a spreadsheet, of which six
were based on quantitative data extracted from web-based sources and one based on
quantitative data in an existing spreadsheet. Two tasks involved the analysis of quantitative
data on an existing spreadsheet and a further two tasks involved the analysis of quantitative
and qualitative data on web-based platforms. The shortest observation session lasted for
almost 7 minutes while the longest lasted for just over 80 minutes. The total length of the
observation sessions was approximately 4 hours and 50 minutes.
Page | 18
A task was deemed completed when:
 The initially identified information need had been satisfied; and/or
 Subsequent information needs triggered by the exploration activity had been satisfied;
or
 The exploration was terminated because the information need could not be satisfied.
Nine participants satisfied the identified information needs; Participant I was unable to
retrieve all the required information due to technical problems with the data source but gave a
verbal description of how the task would have been performed while providing on-screen
demonstrations with the available data.
A summary of the observation sessions is shown in Table 6:
Page | 19
Table 6: Summary of Observation Sessions
Page | 20
4.3 Data Preparation
4.3.1 Tools
Screencast-O-Matic java applet was used to record the computer screen and sound during
each observation and saved in MP4 file format. All but one of the observations were of good
audio-visual quality, enabling accurate transcriptions. The exception was Participant F which
suffered from poor audio quality and hence had gaps in the transcript, although that was
partially mitigated by interpreting on-screen actions.
The transcription of each recording was produced in HyperTranscribe 1.6 (Researchware, n.d.)
and exported as a Microsoft Word (Microsoft, n.d.) document (Appendix O). The transcripts
consisted of the vocal dialogues of participants and where appropriate, supplemented by
descriptions of on-screen events, using a different-coloured text to differentiate them from
participants’ own words. Transcribed sentences were timestamped to facilitate referencing.
The transcripts were imported into QDA Miner 4.1 (Provalis Research, n.d.) for coding and
software-assisted analysis, with the results exported as Microsoft Excel documents for in-
depth analysis.
4.3.2 Coding
A prerequisite for analysing the qualitative data gathered in the observations was to look for
themes and patterns that emerge from them. The transcripts were read, interpreted and
coded with the themes reflected in each passage (Appendices I and N). Duplicate coding was
avoided by interpreting the passages and coding each unique occurrence of a theme once,
regardless of how many times it may have been referred to. However where a unit of passage
was perceived to reflect more than one theme, it was coded with all applicable themes.
Additional steps were taken to ensure themes were coded in their correct sequence by re-
arranging the subject and predicate where appropriate. Descriptions of on-screen actions (in
red text) were added to complement verbal descriptions, where deemed necessary.
A combination of descriptive and analytic coding techniques (Gibbs and Taylor, 2010) was used
in identifying the methods and reasoning processes followed by the participants in their
selection and interpretation of information. Efforts were made to ensure consistency of coding
by comparing similarly coded passages within each transcript and with transcripts of other
participants (Gibbs and Taylor, 2010). Explanatory comments to justify coding decisions were
also added where deemed necessary. However it is acknowledged that coding decisions are
subjective and based on the interpretations of a single coder, without the benefit of checking
coding agreements with other coders.
The transcripts were coded separately for User Goals and Search Modes and both dimensions
were analysed independently.
4.3.2.1 User Goals
The Information Journey Model (Attfield and Blandford, 2010) provided the underlying theory
for identifying the participants’ goals (Table 7). The four steps of the model support a dynamic
as opposed to static nature of information seeking (Russell-Rose and Tate, 2013, p.26) and
activities of participants were interpreted within the framework.
Page | 21
Table 7: Codebook of User Goals
Code Description
Recognising Need Recognising an information need
Acquiring
Information
Acquiring information to fulfil the need,
whether an entire data set or an item
within a data set
Interpreting
Information
Interpreting and validating the acquired
information
Using Information Using the information to fulfil the need
4.3.2.2 Search Modes
The nine search modes developed by Russell-Rose et al. (2011), distilled from over one
hundred conceptual user scenarios of search and business intelligence applications, provided
the “priori codes” (Gibbs and Taylor, 2010) used for the coding. These include all the codes in
Table 8 except Collaborate, Measure, Recognise and Update, which I developed. These are
“grounded codes” (Gibbs and Taylor, 2010) that emerged from the observation data,
independent of pre-existing theories. Forecast, though identified as a search activity
(Marchionini, 2006), is not a distinct search mode under Russell-Rose et al.’s classification.
Russell-Rose et al. classified each Search Mode according to the three types of search activities
developed by Marchionini, that is, Lookup, Learn and Investigate. Search Modes that entail
finding information are classified under Lookup; those that involve knowledge acquisition and
development are classified under Learn while others concerned with dissecting and combining
information or using judgement are classified under Investigate. Lookup searches retrieve
precise information to meet clearly defined needs whereas Learn and Investigate searches are
exploratory in nature, with users analysing information and making sense of it (Russell-Rose
and Tate, 2013, p.72).
These three top-level categories were in turn used in classifying the “grounded codes” with the
exception of Update, as it does not fit precisely into any category.
Page | 22
Table 8: Codebook of Search modes (those highlighted in green are my own “grounded codes”)
Code Description
Locate LOOKUP: To find a specific (possibly known) item
Verify LOOKUP: To confirm that an item meets some specific, objective criterion
Monitor LOOKUP: To maintain awareness of the status of an item for the purposes of
management or control
Recognise LOOKUP: To identify an item based on prior knowledge
Compare LEARN: To examine two or more items to identify similarities and
differences
Comprehend LEARN: To generate independent insight by interpreting patterns within a
data set
Explore LEARN: To investigate an item or data set for the purposes of knowledge
discovery
Measure LEARN: To determine the quantitative value of an item
Analyse INVESTIGATE: To examine an item or data set to identify patterns and
relationships
Evaluate INVESTIGATE: To use judgement to determine the value of an item with
respect to a specific goal
Synthesise INVESTIGATE: To create a novel or composite artefact from diverse inputs
Forecast INVESTIGATE: To extrapolate a future value of an item from current data
(added from Marchionini's taxonomy)
Collaborate INVESTIGATE: To seek the input of others in interpreting an item
Update Editing the value of an item
The “grounded codes” were deemed to be of a similar level of abstraction to, and conceptually
different from, the “priori codes” to merit inclusion. Forecast – classified under Investigate by
Marchionini – was added to describe the derivation of future values from existing data.
Collaborate stood out as an Investigate search mode that draws on the expertise and
professional judgement of colleagues to gain insight about an item. Measure was identified as
a Learn search mode peculiar to quantitative data, typically involving performing calculations
on a given data set. Recognise, classified under Lookup, was considered to be sufficiently
different from Locate as it emphasises the use of prior knowledge gained through experience
in identifying an item. Update, perhaps technically not a search activity as it involves adding or
changing items, was considered important as it describes a distinct activity within data analysis
and interpretation.
Russell-Rose et al. (2011) proposed three properties that a Search Mode should possess:
‘consistency’; ‘orthogonality’; and ‘comprehensiveness’. While ‘consistency’ and
‘orthogonality’ can be evaluated without any data, ‘comprehensiveness’ can only be assessed
after data has been analysed and as such, the degree to which the “grounded codes” possess
the three characteristics are evaluated in Chapter 5.2.1.3.
Page | 23
5 RESULTS AND DISCUSSION
The participants were observed as they performed a regular task in their natural work settings.
The observation data was coded separately for their User Goals and Search Modes, with
coding themes derived from Attfield and Blandford’s Information Journey Model and Russell-
Rose and Tate’s search and discovery framework respectively. The individually coded
transcripts were combined within QDA Miner to derive an aggregate view of the data
exploration and information retrieval behaviour patterns of the marketing professionals. Using
the built-in tools of QDA Miner, the number of occurrences (frequency) and order of
occurrences (sequence) of the codes were computed and the results exported in Microsoft
Excel file format for analysis.
5.1 User Goal
The User Goal provides insight into the need that motivates information-seeking activity and
the different stages that lead to the fulfilment of the need. Such understanding can be
leveraged by designing analytics tools to support users at each stage of their information
seeking process as well as their transition between stages.
5.1.1 Code Frequency
The code frequency conveys the total number of times each code occurred during the
information seeking process, and is indicative of the relative significance of each User Goal to
the process.
Of the combined 491 occurrences of all goals, the predominant goal of the participants was
Interpreting Information, occurring 161 times, followed by Acquiring Information at 155, Using
Information at 101 and Recognising Need at 74 (Figure 3). This partially aligns with the
assertion that in the Information Journey Model, information interpretation and use are the
key drivers of information seeking behaviour (Russell-Rose and Tate, 2013, p.26), although
information acquisition seems to play a comparably dominant role.
Figure 3: Frequency of User Goals for All Participants
74
155
161
101
Recognising Need
Acquiring Information
Interpreting Information
Using Information
Page | 24
In fulfilling recognised needs, the participants were three times more engaged in information
acquisition and interpreting activities as they were in using information. There are a range of
probable reasons for the higher frequency of these activities. It is theoretically possible that
the process of searching, retrieving and analysing data is demanding and labour-intensive
relative to the process of using it once it is prepared. Neither time nor effort was measured
however, so this cannot be corroborated by evidence.
It could also be possibly due to difficulty in locating the required data or understanding its
meaning. This is unlikely however, given the regularity of the observed tasks and “domain
expertise” of the participants.
Another plausible factor could be relative difficulty in using the data sources and analytic tools,
making the process less efficient. A distinguishing feature of ‘technical novices’ compared to
experts is a higher tendency of the former to reformulate queries, resulting in longer time and
effort expended in acquiring information (Russell-Rose and Tate, 2013, p.4,6). The sole
‘technical novice’ in the group, Participant A, whose technical proficiency self-assessment was
Basic, had a ratio of information acquisition to information usage of 2 (Figure 4), in line with
the group average (Appendix H). Participant I, whose self-assessment was ‘Advanced’, had a
higher ratio of 3, suggesting lower efficiency. On the other hand Participant K, similarly self-
assessed as Advanced, had a below-average ratio of 1, suggesting higher efficiency. The results
are thus insufficient to demonstrate a link to “technical expertise”. The possible effects of the
relative complexity of the tasks have not been investigated, however.
Alternatively, the higher proportion of engagement in acquiring and interpreting information
could suggest that the participants are comfortable with both the content and the tools used
in extracting and analysing them, hence they are willing and able to explore the data in enough
detail to gain insight. The regularity of the tasks and the participants’ “domain expertise” lends
some credence to this hypothesis but perhaps more important is the nature of the tasks – the
performance of the marketing campaigns and factors that influenced them needed to be
understood and explained for monitoring and reporting purposes, placing greater emphasis on
information acquisition and interpretation. Thus the majority of activities involved exporting
data from its source into a separate analytics tool as well as identifying specific items within a
data set for examination (Appendix I). Conversely, using the results for insight or decision
making (Attfield and Blandford, 2010, p.33), or condensing into a report, involved less
iterations.
Figure 4: Frequency of User Goals per Participant
0
5
10
15
20
25
30
35
40
45
Recognising Need
Acquiring Information
Interpreting Information
Using Information
Page | 25
It is striking that unlike other participants, the predominant goal observed in Participant H was
Using Information, occurring 11 times compared to 9 and 8 for Acquiring and Interpreting
Information respectively (Figure 4). The participant’s information seeking path is indicative, as
seen in the code sequences heat map which uses brighter colours to indicate higher
occurrences (Figure 5). Out of 35 total activities, 6 information acquisition activities were
followed by information usage, without an apparent interpretation process. The nature of the
task performed offers a possible explanation, as the “information seeking process” involved
iteratively acquiring data from one source and inputting it into another before analysing the
combined data. This is evident in statements such as “…and just go through, find the actual
SegID and then input the numbers here…”, “…and then just put that number in here…”, “…and
put it in here…” and “…so then basically once I fill in everything, I'm done with this data
source…”. Other participants demonstrated similar information seeking behaviour albeit to a
lesser extent (Appendix J) and this may be a reasonably pervasive process, as noted by Pirolli
and Card (2005) in their study of intelligence analysis: “much day-to-day intelligence mainly
consists of extracting information and repackaging it without much actual analysis”.
Figure 5: Code Sequences - Participant H
5.1.2 Code Sequences
The code sequences convey the number of times one code is followed by another and is
indicative of the path followed in seeking information, that is, how users transition from one
goal to another until their information need is met.
Further analysis of the aggregated code sequences of all participants (Figure 6) yields results
that are in line with the progressive steps of the Information Journey Model (Attfield and
Blandford, 2010, p.30). The total number of times one code was followed by another is 485,
with Recognising Need followed by Acquiring Information 62 times, which was in turn followed
by Interpreting Information 100 times and the latter followed by Using Information 68 times.
This suggests that when seeking information, the participants are likely to follow a linear path
of need recognition – information acquisition – information interpretation – information
usage; hence analytics tools could be optimised to smooth their journey along the path.
2
Number of times that the User
Goal in a row (1) is immediately
followed by the User Goal in the
intersecting column (2)
RecognisingNeed
Acquiring
Information
Interpreting
Information
UsingInformation
1
Recognising Need 5 3
Acquiring Information 3 6
Interpreting Information 3 1 4
Using Information 4 3 2 1
Page | 26
Figure 6: Code Sequences - All Participants
Considering the theory of the dynamic model of information seeking which the Information
Journey Model is built on (Russell-Rose and Tate, 2013, p.26), it is interesting that Interpreting
Information was followed by Acquiring Information 38 times. Examples of information
acquisition following interpretation without the explicit recognition of a need can be seen in
Participants A (“but then if you look at this, this is showing that clicks on this tweet…saying we
had 7000 clicks and then when you look at the Bitly one, so that records the actual clicks on link
so 598 clicks, but then when you go back and look at this MatterSight report, it only actually
delivers 7 registrations”) and D (“so this gives us averages, so weekly increase or decrease, so
we can see that was a particularly high week, so you get a flow of...subs, we should have
acquisitions in here as well, that is the dark blue so we can see we've kind of been hovering
around the 1000 mark recently which is really nice”).
Similarly, Using Information was followed by Acquiring Information 37 times, bypassing
Recognising Need. This is apparent in Participant G (“the last thing we need to do here is add
these numbers in here so I can see the UK split so now that I've done that, the UK numbers will
change for the week just so I can have the trial and sub numbers together… so the next bit, I'll
just then take the end figures here as well as this graph”) and Participant I (“that I would try
and answer with the data in a certain view and I'll be breaking that out further, but also I'd try
and take qualitative data to answer that”).
These seem to support the standard and cognitive models of information seeking that
emphasise the iterative acquisition and interpretation of information until an initially
recognised need is fulfilled (Russell-Rose and Tate, 2013, p.24). The lower instances of
acquiring and interpreting information leading to the recognition of a new need suggest that
the participants’ goals were mostly static in nature, in contrast to goals that are dynamically
modified as information is acquired and interpreted. Although information acquisition that
does not directly proceed from a recognised need could alternatively suggest serendipitous
discovery, where “[information is encountered] without explicitly looking for it” (Blandford,
featured in Russell-Rose and Tate, 2013, p.43), this could not be confidently inferred from the
data available.
5.1.3 Designing for User Goals
Implementing specific features in business analytics software can support the observed
information seeking processes, thereby improving the efficiency of users.
2
Number of times that the User Goal
in a row (1) is immediately followed
by the User Goal in the intersecting
column (2)
Recognising
Need
Acquiring
Information
Interpreting
Information
Using
Information
1
Recognising Need 3 62 6 3
Acquiring Information 15 19 100 21
Interpreting Information 24 38 31 68
Using Information 23 37 25 10
Page | 27
5.1.3.1 Acquiring and Interpreting Information
User Goals that involve using information that has been acquired without need for
interpretation, as demonstrated by Participant H, are viable candidates for automation.
Standardised data that are regularly exported into business analytics software would benefit
from direct connection to the data source, which could be refreshed with a single click.
Marketing professionals could thus invest a greater proportion of their effort in using the
information.
Participants did not merely acquire information for usage however; monitoring and reporting
on the performance of marketing campaigns involved a high proportion of interpretation in
order to make sense of the data. Multiple data sets from one or several data sources were
typically acquired for interpretation, a practice described as building a “shoebox” collection of
all data items before they are interpreted for relevance and meaning (Pirolli and Card, 2005).
Russell-Rose and Tate (2013, p.34) assert that enabling rapid population of the “shoebox” can
support this process and this reinforces the utility of a seamless integration between analytics
software and data sources, empowering users to acquire and interpret data in a single
workflow.
Moreover, judging the relevance and meaning of data items is an internal mental process that
benefits from externalisation to prevent memory overload and potentially allow collaboration
(Russell-Rose and Tate, 2013, p.34). One way of facilitating this is to reorganise and reformat
the data to make it suitable for interpretation (Pirolli and Card, 2005) and simple-to-use tools
in Microsoft Excel for filtering and sorting partly fulfil the requirement. On the other hand,
more complex processing such as charts for visualisation or pivot tables for sub-selection and
aggregation require more technical nous and could benefit from simplification. The relative
ease of switching between tabular and graphical representation of data in Tableau, for
instance, demonstrates a more refined implementation that could be adopted more widely.
Page | 28
5.2 Search Mode
Search Mode describes the different strategies employed by users in exploring data during the
full cycle of their information seeking activity. A clearer understanding of such strategies and
how they tend to be combined could be supported in analytics tools to enhance data
exploration and information retrieval processes.
5.2.1 Code Frequency
The code frequency conveys the total number of times each code occurred during the
information seeking process, and is indicative of the relative significance of each Search Mode
to the process.
The number of times each code was used and the number of participants in which it appeared
were computed to identify recurring themes that could be suitable candidates for
generalisation. There was a late discovery of inconsistency with respect to Recognise, which
was initially named Recall. The code was renamed part way through the coding process and it
was not updated in previously coded participants. Although the figures for Recognise and
Recall could have been added to derive the total frequency, it might not be a suitable solution
for deriving the code sequences (section 4.2.2) as the computation process used in QDA Miner
is unknown. Since the error could not be rectified due to the expiry of the trial licence of QDA
Miner, the frequency of Recognise was deemed unreliable and thus discounted from the
results. The remaining thirteen codes had a total frequency of 433, ranging between 9 and 61
(Figure 7), and each code could feature in a maximum of 10 participants (Figure 8).
Figure 7: Number of Occurrences per Code
Page | 29
Overall, participants performed more ‘exploratory searches’ than Lookup searches, with 313
and 96 occurrences respectively. 160 of the ‘exploratory searches’ were of a learning nature
while 153 were investigative. A further 24 instances of updating items were conducted. The
higher occurrence of ‘exploratory search’ activities was reflected in all ten participants with no
significant variations between those performing analysis on web-based platforms and those
importing data into Microsoft Excel for analysis (Appendix L). Moreover the highest occurring
search activity in each participant was exploratory, with the exception of Participant G
(Appendix L). The prevalence of Locate in Participant G is not correlated with the nature of the
observation task, as this was not repeated in other participants who extracted weekly
customer acquisition data from web-based platforms and analysed them in Microsoft Excel for
management report, including Participants F and H (Chapter 4.2.4.3: Table 6). No explanation
for the divergence can be inferred from the data.
The results suggest that finding information is a minor component of the marketing
professionals’ data exploration and information retrieval patterns of behaviour; the majority of
their activities involve analysing and making sense of information once it has been located. The
implications for data sources and business analytics software used by such professionals is the
provision of improved features to enhance analytical and sensemaking processes rather than
focusing on finding information (Russell-Rose and Tate, 2013, p.72).
5.2.1.1 Analyses of Most Frequent Priori Codes
The ‘priori codes’ (those derived from Russell-Rose et al.’s Search Modes) with the highest
number of occurrences offer a relatively high degree of confidence for generalisation, since
they occur in reasonable quantities to suggest commonality.
The most frequently occurring code was Compare at 61, appearing in all observed tasks except
Participant J. As a Learn activity concerned with common and/or contrasting qualities between
two or more items, it appears to be a vital element in the interpretation of quantitative data. It
was a recurring theme in mainly three contexts:
Figure 8: Number of Participants per Code
Page | 30
 Assessing periodic performance against set targets, as seen is such statements as “so
at the moment we're 500 short of that target, actually we're about 400 short of that
target cuz that's minus 100 (Participant F)”, “so we've got a column to tell us what we
should be achieving and we can look to see if that's happening or not (Participant C) ”
and “so again that's not quite hitting target but it's in line with weekly average
(Participant D)”;
 Assessing performance across different periods, for example “so we're seeing there
were 1 or 2% of percentage point drop on a weekly basis then we get to week 10 and
all of a sudden it drops by 40 percentage points (Participant I)”, “and then on week 29,
right around here, we switched the sign up forms to just the new version of the sign up
forms that performed better across the board on other sources, and you can see that
on week 29 there was a jump (Participant H)” and “September got higher click-through
(Participant A)”;
 Assessing the validity of data sources, evidenced by comments like “what we've done
is really said, if iJento is showing a huge number and MatterSight isn't, and it's big,
we'll take some action in looking into it (Participant H)”, “and then I'll have a call on
Thursday with the search agency and then we discuss the figures that they're getting
through their tracking, Google Analytics, match it up against what we've got here, first
to see how accurate it is (Participant G)” and “but then if you look at this, this is
showing that clicks on this tweet, so it could be our handle or anywhere around it,
saying we had 7000 clicks and then when you look at the Bitly one, so that records the
actual clicks on link, so 598 clicks (Participant A)”.
Evaluate also featured highly, garnering 60 occurrences across all 10 participants. There were
two contexts in which this Investigate activity was repeatedly identified:
 Drawing an objective conclusion as a follow-up to a comparison, which can be seen in
extracts such as “so we've got a column to tell us what we should be achieving and we
can look to see if that's happening or not. So, we can see that we're generally doing
pretty well on registration barriers; on subscription barriers we're struggling a bit
(Participant C)”, “so clearly Friday is our poor one, a poor day for click-through and
open rate, and consistently I'm seeing here Wednesday and Thursday actually which is
quite interesting, consistently we've found Monday is a good day to send news on, and
Sundays people are on their iPads and they're highly engaged as well (Participant K)”
and “so for example for the last few weeks, SEO has delivered quite a phenomenal
amount of subs which is great, but we're trying to understand why it's been delivering
more subs than usual...so that said to me it wasn't necessarily an increase it was just
more accurate attribution, so that could be important because we've had quite good
acquisition results over the last few weeks, and it's quite important to know why we've
been up… so that was quite good cuz then we could say there hadn't been a big
increase in SEO, it's down to more the programmes that we've run (Participant D)”;
 Making subjective professional judgements, as demonstrated by “the US one is
exactly the same or we might see it on the site and think that might work well
(Participant A)”, “maybe we can modify that by including live Twitter content now or
advertising particular Twitter niche fields based on their industry... that would do
better, but just to say follow us on Twitter, it's a bit ambiguous, there's no real drive,
call to action is not very strong (Participant K)” and “this one I probably wouldn't use
because it's probably out of date, with the exception if I'm doing something with
comment pieces, they tend to last for a bit longer (Participant C)”.
Page | 31
With a frequency of 57, Explore had the third highest occurrence and featured in 8 of the 10
participants. Two common contexts were reflected in this Learn activity:
 Sifting through a data set to gain insight about a specific item, illustrated by “has
there been any sort of changes in the traffic, have we put more money towards
something, have we done any optimisation, because we're seeing an increase in
subscriptions, why is that the case (Participant H)”, “so we're seeing there were 1 or
2% of percentage point drop on a weekly basis then we get to week 10 and all of a
sudden it drops by 40 percentage points, that then gives me to say why, and then I'd
try and drill down further and say ok, what happened at that week (Participant I)” and
“so from there I would just dig down and I can see where is this problem (Participant
L)”;
 Scanning through a data set for items that stand out without seeking a specific
outcome, exemplified by “interestingly 6% have forgotten their password…the
upgrade to standard and premium didn't get a click, then you've got the email
briefings' done quite well, 11% (Participant K)”, “I just look at that as a reference, to
see if I can see anything coming out, in terms of content (Participant C)” and “there are
other bits, so this is quite interesting to see number of followers, so that would be how
many people have started to follow @FT, which obviously is good because it means
they'll then be getting the updates in their feed as well which might help with
engagement; retweets are interesting because people are obviously interested in the
content (Participant A)”.
Locate, a Lookup activity, occurred 39 times across all 10 participants in two recurrent
contexts:
 Finding the value of an item, including “so I'll be looking at this section here and I'll be
looking at what the total subs is, FT.com, how many individual subs we've acquired
(Participant D)”, “I would start with this, so a visualisation, how are print subs doing
(Participant I)” and “so looking at this grand totals for Welcome for registered users,
we've got 28% open rate, 3.64% click through (Participant K)”.
 Identifying a specific item within a data set, such as “so, I know pretty much off by
heart the SegIDs in mind, so I can select them (Participant C)”, “this [Sub Source] is
what I'm interested in (Participant F)” and “yeah do it side-by-side, and just go through,
find the actual SegID (Participant H)”.
Another Lookup activity, Monitor, occurred 39 times in 9 participants. Two recurring contexts
were identified in the occurrences:
 Maintaining passive awareness of the status of an item to provide a backdrop to
assessed items, as seen in “the Asia and the US I'm not so worried about cuz they're
nice to know but they're not for us to manage, therefore they're managed from those
regions but I do take the number out of MatterSight just so I can see how they're going,
just to see cross-region comparisons (Participant G)”, “then obviously we'll look at the
number of subscribers that have cancelled, and payment failure, and then look at the
net increase for the week so that's sort of quite good to know about but for me it's
really how many acquisitions we've achieved (Participant D)” and “we do take insight
from that, it's a temperature check more than anything else (Participant K)”.
Page | 32
 Maintaining active awareness of the status of an item in order to assess it,
demonstrated by “so within Splunk we have a payment dashboard, so we can see for
different time spans - go to 7 days - the new payment flow, and this will populate, so
we have debit successes, debit failures and debit errors, so this is when there's a
technical error. So we can see that at a glance at any point, and then we've got the
reasons and the error messages that are coming through here, and then some more
data, and then the same for the old flow (Participant L)”, “this was the SegID we put
against this tweet, that was a registration push, so I started to look at this to see how
many did it actually deliver (Participant A)” and “so we track these every week, so that
we can see any fluctuation, try and identify where any marketing activity that's being
done is affecting these SegIDs (Participant C)”.
Analyse occurred 37 times in nine participants, with two contexts encountered repeatedly in
this Investigate activity:
 Determining factors contributing to the value of an item, exemplified by “hey we've
achieved what looks like an extra 20, 30 subs, and in this case she came back and said
that we had sent out, we had communicated with a lot more people because there'd
been a glitch in previous weeks, and so that explains the large number (Participant
D)”, “what channel did they come in from, why are they doing well...all of them came
in from email and what letter series did they have, they didn't have a letter, they had
emails so we then have a hypothesis or something to answer which is, we think people
that come in through email and only respond by email are better, are retained better
(Participant I)” and “try and identify where any marketing activity that's being done is
affecting these SegIDs (Participant C)”.
 Discovering patterns within a data set, illustrated by “I'd then go back to the pivot
and see, I'd then try and see if I can break it down by currency and see, maybe put
some, it's mostly ad-hoc, and say is it affecting one currency more than another
(Participant L)”, “we can chart over time to see trends regionally and see whether the
growth..., just by looking at their email behaviour, whether regions are growing in
volume and in engagement levels as well, this is from January to August, I'd have to
export this into a spreadsheet and chart it, that's what I would generally do, I take this
data, export it and then play with it (Participant K)” and “the time that we push it out
we might think about differently, so should we push it out more on a weekend because
it's more of a video, and, it's different, apparently 9 o'clock on a Sunday night works
quite well for Twitter as well (Participant A)”.
Synthesise, although only occurring 18 times, was the only code other than Evaluate and
Locate to feature in all 10 participants. One recurring context was identified in this Investigate
activity:
 Condensing multiple items into a management report, for example “so this is the full
report that I send to [Senior Manager]...a much more condensed version (Participant
F)”, “I'll just then take the end figures here as well as this graph, and include it in my
weekly report which is in the form of an email (Participant G)” and “so out of this I
write a weekly report for the Optimisation team, which looks at these SegIDs and any
barriers and numbers and any reasons why that might be (Participant C)”.
Russell-Rose et al.’s Search Modes were based on user scenarios derived from customers
during the development of search and business intelligence applications on Endeca Latitude
(Oracle, n.d.), an enterprise data discovery platform. Future work suggested by the researchers
Page | 33
included “empirical research and observation of knowledge workers in context to validate and
refine the discovery modes and triggers that give rise to the observed patterns of usage”.
In this research, the ‘priori codes’ occurred within the observation data of marketing
professionals, who were using a mixture of data sources and analytics tools to perform regular
work tasks under normal working conditions. Hence the data supports the applicability of the
Search Modes within a live work environment.
The implications for marketing professionals include the design of data and analytics software
for optimal user experience, based on their data exploration and information retrieval patterns
of behaviour (Russell-Rose and Tate, 2013, p.76). By having a clearer understanding of the
Search Modes of business users, interaction designers of analytics software would be better
equipped to enhance the UI with relevant usability, functionality and content features (Rogers
et al., 2011, p.15).
5.2.1.2 Designing Information Systems for Search Modes
The observation tasks were highly concentrated on assessing the performance of marketing
campaigns for monitoring and reporting purposes, which is the raison d'être for business
analytics within the marketing function (SAS, n.d.). Therefore using the three highest occurring
codes – Compare, Evaluate and Explore – some insight can be gained into ways business
analytics tools might be designed to support the data exploration and information retrieval
processes of marketing professionals.
5.2.1.2.1 Compare – Assessing periodic performance against set targets / Assessing
performance across different periods
Given that all participants described themselves as visual learners, it might be helpful to
present the target and actual data in Microsoft Excel as column or bar charts to facilitate
comparisons. The process could be automated by designing a dashboard-style worksheet
containing the charts, which are linked by formulae to a table on a different worksheet where
the data is updated periodically. Likewise, the dashboard could include a line chart to show
trends over time, enabling the spotting of sharp changes at a glance.
Building pivots for the tables and charts would facilitate deeper exploration and analysis, as
double-clicking on any point of interest on the chart can access the underlying data. Other
analytics tools such as QlikView might offer a richer user experience since the design of
dashboards and exploratory capabilities available can be more intuitive than in Excel. That
said, PowerPivot – available in Excel 2010 and above – offers more advanced features to
compete more strongly in the business analytics market.
The usability would be further enhanced by connecting the analytics tool to web-based data
sources where applicable, enabling direct retrieval of data from within the tool using single-
click refresh (Figure 9).
Page | 34
5.2.1.2.2 Evaluate – Drawing an objective conclusion as a follow up to a comparison /
Making subjective professional judgements
Evaluation is a sensemaking activity engaged in by participants, following the retrieval and
analysis of an item. It may be useful to capture the objective or subjective conclusions drawn
for future reference and the ability to annotate data enables this, reducing reliance on
memory recall (Russell-Rose and Tate, 2013, p.39).
Excel provides the ability to add a comment to a cell (Figure 10) and although they can be
viewed easily, the process of editing, extracting or collating them is arguably unwieldy. An
added functionality to view all comments on a spreadsheet might be beneficial for retrieving
evaluations without having to comb through individual cells.
Figure 9: Workbook Connections
Figure 10: Evaluation Comments
Page | 35
5.2.1.2.3 Explore – Sifting through a data set to gain insight about a specific item
The ease with which a data set can be explored depends on both the analytics tool deployed
and the structure of the data. Building a relational data table that models the relationships
between different sets of data would expedite investigations into how other items might have
affected the item of interest. QlikView’s associative data model (QlikTech, n.d.) and Excel’s
PowerPivot linked tables (Microsoft, n.d.) are some of the tools that support such features. By
displaying related data items on an interactive dashboard that only requires clicking and
selecting rather than writing queries in a technical language, serendipitous discovery of
knowledge is facilitated (Russell-Rose et al., 2011, p.38).
QlikView has the added benefit of displaying a ‘breadcrumb’ of selected fields in its ‘Current
Selections’ pane, which enables users to keep track of the layers of selections and retrace their
steps if needed (Figure 11), further encouraging exploration (Russell-Rose et al., 2011, p.79).
5.2.1.2.4 Explore – Scanning through a data set for items that stand out without seeking a
specific outcome
The visual learning preference of the participants suggests that a dashboard using charts or
other graphical formats to visualise the data would expedite such exploration activities. This is
supported by the observed behaviours of Participants K and L who made use of the
visualisation features of their respective web-based data sources to identify items that stood
out, and Participant I who used the chart on an Excel-based data cube to identify areas of
interest for further investigation.
Figure 11: Current Selections
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times
Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times

Contenu connexe

Similaire à Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times

Impact of celebraity endorsment
Impact of celebraity endorsmentImpact of celebraity endorsment
Impact of celebraity endorsmentsamarpita27
 
3 30022 assessing_yourbusinessanalytics
3 30022 assessing_yourbusinessanalytics3 30022 assessing_yourbusinessanalytics
3 30022 assessing_yourbusinessanalyticscragsmoor123
 
Rand rr2504z1.appendixes
Rand rr2504z1.appendixesRand rr2504z1.appendixes
Rand rr2504z1.appendixesBookStoreLib
 
e Learning-data-of-learning need analysis
e Learning-data-of-learning need analysise Learning-data-of-learning need analysis
e Learning-data-of-learning need analysisandinieldananty
 
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsLisa Garcia
 
Sample SEMPO State of Search Marketing Report 2012
Sample SEMPO State of Search Marketing Report 2012Sample SEMPO State of Search Marketing Report 2012
Sample SEMPO State of Search Marketing Report 2012João Caetano
 
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...Dinushika Madhubhashini
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_DescriptionSuman Banerjee
 
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)NAFCU Services Corporation
 
Marketing research proposal.pdf
Marketing research proposal.pdfMarketing research proposal.pdf
Marketing research proposal.pdfChiho Ye
 
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...KBHN KT
 
Final Internship Report_Sachin Serigar
Final Internship Report_Sachin SerigarFinal Internship Report_Sachin Serigar
Final Internship Report_Sachin SerigarSachin Serigar
 
Predictive Modelling Analytics through Data Mining
Predictive Modelling Analytics through Data MiningPredictive Modelling Analytics through Data Mining
Predictive Modelling Analytics through Data MiningIRJET Journal
 
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsData Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsDataSpace Academy
 
Data Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaData Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaMichael Olafusi
 
Data Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaData Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaMichael Olafusi
 
Business analytics batch 4 2 .
Business analytics batch 4  2 .Business analytics batch 4  2 .
Business analytics batch 4 2 .Irshad Ansari
 

Similaire à Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times (20)

Impact of celebraity endorsment
Impact of celebraity endorsmentImpact of celebraity endorsment
Impact of celebraity endorsment
 
3 30022 assessing_yourbusinessanalytics
3 30022 assessing_yourbusinessanalytics3 30022 assessing_yourbusinessanalytics
3 30022 assessing_yourbusinessanalytics
 
Rand rr2504z1.appendixes
Rand rr2504z1.appendixesRand rr2504z1.appendixes
Rand rr2504z1.appendixes
 
e Learning-data-of-learning need analysis
e Learning-data-of-learning need analysise Learning-data-of-learning need analysis
e Learning-data-of-learning need analysis
 
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
 
Research
ResearchResearch
Research
 
Sample SEMPO State of Search Marketing Report 2012
Sample SEMPO State of Search Marketing Report 2012Sample SEMPO State of Search Marketing Report 2012
Sample SEMPO State of Search Marketing Report 2012
 
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...
FORMULATION AND IMPLEMENTATION OF RECRUITMENT AND SELECTION PROCESS IN SUN MA...
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)
Populating a Data Quality Scorecard with Relevant Metrics (Whitepaper)
 
Marketing research proposal.pdf
Marketing research proposal.pdfMarketing research proposal.pdf
Marketing research proposal.pdf
 
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
 
Aregay_Msc_EEMCS
Aregay_Msc_EEMCSAregay_Msc_EEMCS
Aregay_Msc_EEMCS
 
Final Internship Report_Sachin Serigar
Final Internship Report_Sachin SerigarFinal Internship Report_Sachin Serigar
Final Internship Report_Sachin Serigar
 
Predictive Modelling Analytics through Data Mining
Predictive Modelling Analytics through Data MiningPredictive Modelling Analytics through Data Mining
Predictive Modelling Analytics through Data Mining
 
Data Analysis Method
Data Analysis MethodData Analysis Method
Data Analysis Method
 
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable InsightsData Analysis Methods 101 - Turning Raw Data Into Actionable Insights
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
 
Data Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaData Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - Nigeria
 
Data Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - NigeriaData Analysis Industry Report 2016 - Nigeria
Data Analysis Industry Report 2016 - Nigeria
 
Business analytics batch 4 2 .
Business analytics batch 4  2 .Business analytics batch 4  2 .
Business analytics batch 4 2 .
 

Dernier

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 

Dernier (16)

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 

Data Exploration and Information Retrieval in the B2C Marketing Department of the Financial Times

  • 1. Page | CIT DATA EXPLORATION AND INFORMATION RETRIEVAL IN THE B2C MARKETING DEPARTMENT OF THE FINANCIAL TIMES Seun Odueko December 2013
  • 2. Page | i ABSTRACT A case study of marketing professionals in the B2C Marketing department of the Financial Times investigated their patterns of behaviour when interacting with data sources and analytics tools to perform work tasks. The study was conducted within the framework of search and discovery proposed by Russell-Rose and Tate. Data was gathered by observing the participants in their natural work environment, and their data exploration goals and strategies were analysed respectively, using Attfield and Blandford's Information Journey Model and Russell-Rose et al.'s Dimensions of Search User Experience. The findings reveal that the participants’ goals were mainly static rather than dynamic, but they performed ‘exploratory search’ activities to fulfil them. Evidence was also found in support of Search Modes, and their occurrence in clusters of Mode Chains, within a live work environment. Recommendations for design features of analytics tools that facilitate the observed patterns of behaviour were made, with examples of how they could be implemented. Keywords: User Goal; Search Mode; Mode Chain; Data Exploration; Information Retrieval.
  • 3. Page | ii 1 Table of Contents ABSTRACT....................................................................................................................................... i LIST OF FIGURES............................................................................................................................iv LIST OF TABLES..............................................................................................................................iv 2 PROBLEM DESCRIPTION AND OBJECTIVES............................................................................ 1 2.1 Background ................................................................................................................... 1 2.2 Question........................................................................................................................ 2 2.3 Objectives...................................................................................................................... 2 2.4 Outcomes...................................................................................................................... 2 2.5 Beneficiaries.................................................................................................................. 3 2.6 Structure ....................................................................................................................... 3 3 ACADEMIC CONTEXT............................................................................................................. 4 3.1 Data Exploration............................................................................................................ 4 3.2 User Type ...................................................................................................................... 5 3.3 User Goal....................................................................................................................... 6 3.4 Search Context.............................................................................................................. 7 3.5 Search Mode ................................................................................................................. 7 3.6 Data Type ...................................................................................................................... 8 3.7 Analytics Tool................................................................................................................ 9 4 METHODS............................................................................................................................ 11 4.1 Participant Selection ................................................................................................... 11 4.2 Data Gathering............................................................................................................ 12 4.2.1 User Type ............................................................................................................ 12 4.2.2 User Goal............................................................................................................. 15 4.2.3 Search Context .................................................................................................... 15 4.2.4 Observation......................................................................................................... 16 4.3 Data Preparation......................................................................................................... 20 4.3.1 Tools.................................................................................................................... 20 4.3.2 Coding ................................................................................................................. 20 5 RESULTS AND DISCUSSION.................................................................................................. 23 5.1 User Goal..................................................................................................................... 23 5.1.1 Code Frequency .................................................................................................. 23 5.1.2 Code Sequences.................................................................................................. 25 5.1.3 Designing for User Goals..................................................................................... 26 5.2 Search Mode ............................................................................................................... 28 5.2.1 Code Frequency .................................................................................................. 28 5.2.2 Code Sequences.................................................................................................. 42
  • 4. Page | iii 6 EVALUATION, REFLECTIONS AND CONCLUSIONS............................................................... 46 6.1 User Goal..................................................................................................................... 46 6.2 Search Mode ............................................................................................................... 47 6.3 Project Management .................................................................................................. 47 6.4 Further Research......................................................................................................... 49 7 GLOSSARY............................................................................................................................ 50 7.1 Usability....................................................................................................................... 50 7.2 User Experience .......................................................................................................... 50 7.3 User Interface.............................................................................................................. 50 7.4 Cognition..................................................................................................................... 50 8 REFERENCES........................................................................................................................ 51
  • 5. Page | iv LIST OF FIGURES Figure 1: Serialist......................................................................................................................... 14 Figure 2: Holist ............................................................................................................................ 14 Figure 3: Frequency of User Goals for All Participants ............................................................... 23 Figure 4: Frequency of User Goals per Participant ..................................................................... 24 Figure 5: Code Sequences - Participant H................................................................................... 25 Figure 6: Code Sequences - All Participants................................................................................ 26 Figure 7: Number of Occurrences per Code ............................................................................... 28 Figure 8: Number of Participants per Code ................................................................................ 29 Figure 9: Workbook Connections................................................................................................ 34 Figure 10: Evaluation Comments................................................................................................ 34 Figure 11: Current Selections...................................................................................................... 35 Figure 12: Code Sequences......................................................................................................... 42 Figure 13: Dashboard Supporting Monitor-Compare-Evaluate.................................................. 44 Figure 15: Dashboard Supporting Explore-Compare-Evaluate................................................... 45 Figure 14: Dashboard Supporting Explore-Compare-Evaluate................................................... 45 LIST OF TABLES Table 1: Domain Expertise of Participants.................................................................................. 12 Table 2: Technical Expertise of Participants ............................................................................... 13 Table 3: Learning Style of Participants........................................................................................ 13 Table 4: User Type of Participants .............................................................................................. 14 Table 5: Job Roles of Participants ............................................................................................... 16 Table 6: Summary of Observation Sessions................................................................................ 19 Table 7: Codebook of User Goals................................................................................................ 21 Table 8: Codebook of Search modes (those highlighted in green are my own “grounded codes”) ..................................................................................................................................................... 22
  • 6. Page | 1 2 PROBLEM DESCRIPTION AND OBJECTIVES 2.1 Background Business data has evolved from being perceived primarily as a by-product of business processes (Magal and Word, 2009, p.18) to a key productivity driver through which organisations derive competitive advantage (Barton and Court, 2012). In the present era of “big data” where organisations face a deluge of different forms of data from various internal and external sources and at near real-time speed (Beyer and Laney, 2012, cited in Sicular, 2013), those who have successfully harnessed it in their decision-making process achieved average improvements of 5% in relative productivity and 6% in relative profitability (McAfee and Brynjolfsson, 2012). Research into user attitudes towards business intelligence software, conducted in October 2012 by Ventana Research (Smith, 2013), revealed that a greater percentage of respondents are more concerned about usability (63%) than functionality (49%) or reliability (46%). Analytics tools tend to be optimised for functionality and although accessible to experts, end- users are unable to make the most of them or even use them at all due to their complexity (Barton and Court, 2012). IDC researchers concluded that the business analytics software market will grow at a compound annual rate of 9.8% between 2012 and 2016 (Vesset et al., 2012), reaching $50.7 billion (Taft, 2012). Senior executives are increasingly deploying analytics tools in their organisations, contributing to a shortage of Information Technology (IT) and analytics staff with the necessary skills to use them (Vesset et al., 2012). Hence there is a requirement to design tools that fit the way managers and frontline staff, as opposed to data analysts, explore information (Smith, 2013). The understanding of how they explore data and gain actionable insight is important to senior executives making strategic decisions on how data is used across the organisation as well as data analytics and visualisation professionals developing tools and models for end-users. Any organisation producing goods or services has a requirement to identify and fulfil the needs of its target customers (The Chartered Institute of Marketing, 2009) and as such, the marketing function is central to the existence of organisations of virtually all sizes and sectors. Marketing departments are forecast to spend more on technology than IT departments by 2017 (McLellan, 2012) due to growth in electronic commerce (e-commerce) sales volumes and increasing adoption of social media and mobile channels for engaging customers (Murphy, 2012), with spending largely focused on business and web analytics (Moth, 2013). The Business-to-Consumer (B2C) marketing department of the Financial Times – a global business news organisation – includes customer acquisition, customer retention, customer relationship management (CRM) and payment optimisation teams. Current analytics tools are based on user-centred Microsoft Excel (Microsoft, n.d.) models and expert-retrieved Oracle SQL (Oracle, n.d.) reports; as a member of the marketing operations team, I am responsible for retrieving SQL reports and building Excel models for the marketing professionals. These are in the process of being consolidated in Microsoft Excel 2010, using the PowerPivot (Microsoft, n.d.) add-in to empower end-user exploration of data on OLAP cubes (Microsoft, n.d.) and retrieval of information with minimal expert intervention. Thus the data exploration patterns of behaviour exhibited by members of the department may yield useful insight into how analytics tools may be optimised for the marketing function.
  • 7. Page | 2 This will also complement scientific and industrial research into the data exploration patterns of behaviour of non-experts in a work context. It offers an opportunity to build on existing insight into the search patterns of professionals including sales and marketing specialists (O’Day and Jeffries, 1993) and feed into forthcoming research into data exploration tools designed for non-IT experts working in a variety of sectors (The Technology Strategy Board, 2013). 2.2 Question This study seeks to contribute to the body of knowledge about the characteristics of non- experts and their data exploration patterns of behaviour, which can be used to inform the design of business analytics tools with high usability for staff and thus higher returns on investment (ROI) for organisations. The research question to be answered is “how do Financial Times’ B2C marketing professionals in the customer acquisition, customer retention, CRM and payment optimisation teams explore customer data and retrieve required information?” and this will be investigated from two perspectives:  Why and how do they seek information?  What strategies do they employ to find and understand information? Data is usually distinguished from information with the former consisting of numbers and words that exist in a raw, unprocessed state while the latter has added qualities of organisation and relationships that make it meaningful (Yacci, 1999, Pohl, 2001). For the purposes of this study however, no distinction is made between data and information and they are used interchangeably, following a precedent established in previous research (Baker et al., 2009). 2.3 Objectives My primary objective is to research and analyse the behavioural processes exhibited by the marketing professionals as they interact with data sources and analytical tools to fulfil specific work tasks. My secondary objective is to use the results of the analyses to suggest user interface (UI) design features of analytics tools that support the behavioural processes. 2.4 Outcomes The project outcomes will be:  Closed-format questionnaires will be used to gather data and analysed to classify the participants according to their domain expertise, technical expertise, cognitive style and learning style. Plans to classify the marketing professionals’ work data according to Cool and Belkin’s dimensions (2002, cited in Russell-Rose and Tate, 2013, p.58) have been discontinued in order to keep the study’s scope manageable.
  • 8. Page | 3  Participants will be observed in their natural working environment while performing a regular work task; the observation data will be coded and analysed to yield a description of the methods and reasoning processes followed in selecting and interpreting information. The analysis framework to be used has been changed from Sensemaking (Pirolli and Card, 2005) to Dimensions of Search User Experience (Russell- Rose and Tate, 2013, p.1) as the latter generates a more robust insight into data exploration processes.  Desirable UI features for supporting those processes that can be implemented in Microsoft Excel 2010 and alternative business analytics tools will be suggested, drawing on prior research, existing software features and creative thinking. There were initial plans to go beyond suggesting the features and implement them by designing dashboards in Microsoft Excel for the marketing professionals, which would be subsequently evaluated. However, this has been shelved due to time constraints. 2.5 Beneficiaries The intended beneficiaries of the project are:  Academics and researchers in the fields of information retrieval and user experience (UX) design.  Senior executives that make investment decisions on business analytics software for their organisations.  Data analysts and IT professionals that deploy analytics tools in organisations and design models for end users.  Business analytics software vendors that develop the applications. 2.6 Structure The project report continues in Chapter 3 with a review of relevant academic literature in the field of information seeking to identify previous research and established theories that have shaped the current body of knowledge. These will provide the study with requisite analytical frameworks that inform the data gathering and result evaluation strategies. In Chapter 4, suitable methods identified in the reviewed literature will be applied to the study, with a description of the participants’ selection process and explanations of how data has been gathered and analysed to yield insight. Chapter 5 presents the results of the analyses, and the findings are discussed in light of the objectives of the study, the extent to which established theories are supported and recommendations for their implementation. The project report concludes in Chapter 6 with an evaluation of the management and execution of the project as well as the outcomes. Suggestions for follow-up research are also made and personal lessons learned from undertaking the project are reflected upon.
  • 9. Page | 4 3 ACADEMIC CONTEXT The subject of data exploration has received considerable attention from the academic community and a range of established theories have emerged from empirical research. The research findings provide sound frameworks for understanding the various facets of data exploration, identifying gaps in the body of knowledge that need to be addressed and providing direction for future research efforts. Hence a review of academic literature has been conducted to elucidate the impact of personal and professional traits on search behaviour, the differing effects of both information requirements and types of task on search process as well as the strategies employed in retrieving information and gaining insight. The appropriate format for representing different types of data, and analytics software features suitable for data exploration are also examined. For each facet, limitations of existing research are highlighted where applicable. The facets provide the analytical framework for this study, with the theories guiding the choice of data to be gathered and the criteria for evaluating results. 3.1 Data Exploration Exploration in its various forms – investigating, studying, analysing, testing, experimenting, discovering or examining (“Exploring,” n.d.) – is a fundamental human experience and lies at the heart of our individual and collective advancement. It is the process by which we satisfy our curiosity (Edelman, 1997), whether in fulfilment of a defined objective or as an end in itself (Berlyne, 1954, cited in Loewenstein, 1994, p.77). In the domain of information science, exploration is synonymous with the discovery or synthesizing of new knowledge from existing data; it is a journey rather than a race having a pre-conceived outcome (Gossen et al., 2012). It can involve “Exploratory Data Analysis” whereby data is analysed and information that cannot be immediately explained is hypothesised about and tested (Tukey, 1977). Although a statistical technique, “Exploratory Data Analysis” can be generalised for the manner in which users confronted with puzzling information go through an iterative process of possible explanations, which are tested with the available data to prove or disprove them (Gossen et al., 2012), making it synonymous with data exploration. In general terms, a task can be categorised as data exploration if it involves the examination of data without prior knowledge of the information to be discovered (Grinstein and Ward, 1997, Tukey, 1977, Tukey, 1980, cited in Baker et al., 2009). Data exploration is a search activity (Tukey, 1977), that is, looking into or over data carefully or thoroughly in an effort to find or discover information (“Search,” n.d.). However, there is a distinction between simple item retrieval that does not require examination and ‘exploratory search’, which employs learning and investigating methods for information discovery (Marchionini, 2006). In order to appreciate the intricacies of data exploration, it is beneficial to view it through the four “dimensions of search user experience” that frame information search and discovery (Russell-Rose and Tate, 2013, p.1), namely User Type, User Goal, Search Context and Search Mode, as well as data type and analytics tool.
  • 10. Page | 5 3.2 User Type The “level of knowledge and expertise” (Russell-Rose and Tate, 2013, p.1) of users has a significant impact on the way a given data set is explored and thus it ought to be taken into consideration in the design of search and analytics tools. “Domain experts” possess a high level of subject-matter knowledge and experience relative to “domain novices” while “technical experts” are more versed in the use of information systems to extract information in comparison to “technical novices” (Russell-Rose and Tate, 2013, p.4). Users who possess both domain and technical expertise tend to drill deeply into a narrow subset of data and use advanced techniques to evaluate information, in contrast to novices in both areas who explore a data set widely without venturing beyond the surface and engage in minimal evaluation (Jenkins et al., 2003). In-between these polar extremes are users who are domain experts and technical novices, exploring widely with rigorous evaluation of information, and others who are domain novices and technical experts, exploring both widely and deeply with less sophisticated evaluation (Jenkins et al., 2003). The study by Jenkins et al. (2003) and related research by Kim (2001) focused on web searches and it would be interesting to see how their findings apply to business users of analytics tools. Although a study – involving participants with a broad range of domain knowledge and mostly high technical expertise – has been conducted into the use of interactive information systems in the execution of work tasks (Li and Belkin, 2010), the effects of user knowledge and expertise were neutralized rather than incorporated into the study. As there is a requirement to design analytics tools that fit the way business users explore information (Fisher et al., 2012), an understanding of this matrix of domain and technical expertise as it applies to the marketing professionals can inform UI design and potentially increase their speed in exploring data and making discoveries; this raises the hazards of lax analysis and poor judgment, however (Fisher et al., 2012). Expertise is a function of time and experience and as such it is not advisable to design for a specific group (Russell-Rose and Tate, 2013, p.9); rather UI features can be used to support the transition of users from novice to expert in order to remain relevant at each stage of their development cycle. In the same vein, users’ cognitive style of information processing and learning style of concept representation (Russell-Rose and Tate, 2013, p.14) affect how they explore data and can be supported by the UI of analytics tools. “Holistic thinkers” process information by starting with a high-level view of a given data set in order to grasp its wider context while conversely, “serial thinkers” zero in on details and analyse its constituent parts (Russell-Rose and Tate, 2013, p.11). “Holistic thinkers” are “global processors” who progress from developing a broad overview to focusing on specifics, in contrast to serial thinkers who are “analytic processors”, progressing from a step-by-step grasping of facts to an understanding of the overall concept (Dunn and Griggs, 2003, cited in Denig, 2004). This suggests that cognitive style influences users’ starting point and progression path when seeking information and the business analytics software industry may benefit from empirical research into the relationship between information-seeking patterns of behaviour and cognitive styles of users. Similar understanding of the preferred learning styles of business users can inform the representation of data, with tables and charts provided for verbal and visual learners respectively (Russell-Rose and Tate, 2013, p.14). On the other hand, representing data to suit such preferences may not always be appropriate as the nature of the work task and exploration requirements may determine the format of the data; visual learners may be better served by tabular data for in-depth analysis, for example.
  • 11. Page | 6 3.3 User Goal Lookup, Learn and Investigate have been identified as search tasks conducted by users in fulfillment of their goals (Marchionini, 2006). Lookup tasks are carried out as a systematic information retrieval process when the goal is clearly defined, such as finding the answer to an unambiguous question like the number of new customers acquired each week during a particular year. On the other hand, goals that are vaguely defined or unknown during the search activity, such as comparing the performances of all acquisition channels employed or analyzing the causes of a drop in acquisition numbers in the third quarter of the year, may require an iterative analysis of multiple data before satisfactory insight is derived. In other words, the more complex and loosely defined the User Goal, the more exploratory iterations needed to uncover relevant insight (Li and Belkin, 2010). These learning tasks and investigating tasks are classified under ‘exploratory search’ (Marchionini, 2006). Gossen et al. (2012, p.290) defined ‘exploratory search’ as “a highly dynamic process of a user to interact with an information space in order to satisfy an information need that requires learning about structure and/or content of the information space.” The exploration process was seen as being shaped by the user’s perspective, which may in turn be transformed by the insight gained and thus lead to further exploration. This notion is corroborated by dynamic models of information seeking, such as the Information Journey Model (Attfield and Blandford, 2010, p.29), which acknowledge that the act of interpreting data modifies the original goals (Belkin, 1993). It has been argued that the fundamental goal of users is to synthesize and further explore their initial information requirements and therefore, the process of exploration is often more important than the information discovered since the successful discovery of information does not fulfill the users’ goals but rather lead to new requirements (Gossen et al., 2012, p.290). However while the importance of the exploration process is acknowledged, the postulation about the fulfillment of user goals could be challenged by the recognition that exploration does not iterate indefinitely; there are entry and exit points in the search process, the former arguably triggered by the desire to fulfill a requirement and the latter arguably triggered by the fulfillment of the requirement. Bridging these two positions is the position that a user’s goal in exploring data is sensemaking, that is, developing an understanding of the information and synthesizing it before responding to it in fulfillment of a requirement (Baker et al., 2009). From a different perspective, the Information Journey Model highlights that the entry point is not restricted to the desire to fulfill a need; rather it could also be an accidental discovery of information or serendipitous insight gained during the evaluation of information (Russell-Rose and Tate, 2013, p.27). Likewise the exit point may not necessarily be triggered by the fulfillment of the exact requirement; the demand placed on users’ attention by the volume and variety of information explored, further amplified by time constraints, may lead to an exit of the search process when the requirements have been “satisficed” (Simon, in Greenberger, 1971) as opposed to fulfilled. A common thread is that an information journey, or data exploration, has a start and an end point. Regardless of their trigger, both are only meaningful when aligned with a pre-defined or newly formed goal of the user. Hence identification of actual, as opposed to simulated, work goals of users in business organisations is essential to understanding their data exploration behaviour. In order to derive findings with a higher degree of validity for business organisations, work-based research is essential and the majority of research conducted in this field of study has not addressed this.
  • 12. Page | 7 3.4 Search Context Baeza-Yates and Ribeiro-Neto (2011, p.5) distinguish between data retrieval and information retrieval (IR), with the former concerned with matching query parameters to structured data and the latter involving concepts of interpretation and meaning. Hence context is a critical factor in evaluating the IR process. The context of an information-seeking task determines its scope, technique, time length, success criteria and resources used. Building on research by Järvelin and Ingwersen (2004), Russell-Rose and Tate (2013, p.49) devised a model for representing search context in four progressive layers of “information retrieval” – akin to Baeza-Yates and Ribeiro-Neto’s data retrieval definition and Marchionini’s Lookup task; “information seeking” – akin to Baeza-Yates and Ribeiro-Neto’s IR definition and Marchionini’s Learn and Investigate tasks; “work task” – organisational need or personal motive that motivates information seeking; and “cultural context” – expectations associated with the organisation or person performing a work task. Each successive layer encompasses the preceding one; hence an organisation’s cultural context informs the importance attached to work tasks, which in turn are fulfilled by information seeking activities that may involve information retrieval. Work task in organisations is characteristically exploratory in nature and requires clear communication of results to non-experts (Fisher et al., 2012). Functional teams such as marketing or finance identify an information need and rely on the technical expertise of the analytics team to retrieve the information (Fisher et al., 2012). This has a number of drawbacks: the time lag between the request for information and the communication of results; the increased potential for aspects of the original information need to be ‘lost in translation’; less opportunity to explore the available data and gain additional insight beyond the original information need. Hence there is a requirement for organisations to both build robust data models and provide easy-to-use tools for frontline staff to directly interact with the models (Barton and Court, 2012). Furthermore, work task highlights the influence of Search Context on User Goal and associated search activities. In Li and Belkin's (2010) experimental study of the relationship between work task and the process of using information systems to search for information, the type of work task – defined as any activity performed by people in fulfilment of their work responsibility, similar to Russell-Rose and Tate’s organisational work task – determined how users searched for information. Work tasks that involved knowledge acquisition, for example, resulted in ‘exploratory search’ activities and were perceived to be difficult to accomplish, with increased time and information requirements. Similarly, work tasks with clearly defined goals that required Lookup search activities required less effort to accomplish. Findings that different types of tasks utilised different search processes (Toms et al., 2003) corroborate Li and Belkin’s conclusions. Hence a better understanding of organisational needs, including the marketing function, could lead to improved UI design of analytics tools that support users’ information seeking processes. This would provide easier access to relevant information in less time. 3.5 Search Mode While Search Context is concerned with external factors that influence users’ information seeking activities (Russell-Rose and Tate, 2013, p.48), Search Mode focuses on the strategies
  • 13. Page | 8 employed in seeking information. The process of exploring data involves not only data retrieval but also analyzing and interpreting the retrieved data to build understanding (Russell-Rose and Tate, 2013, p.2). Thus a clear grasp of the entire cycle of exploration, that is, Search Mode, is useful for defining search behavior and supportive features of analytics tools (Russell-Rose and Tate, 2013, p.73). The journey from realizing an information need to satisfactorily fulfilling it often involves a series of search tasks (O’Day and Jeffries, 1993). Building on the search tasks of Lookup, Learn and Investigate (Marchionini, 2006), insights gained from the behavior of business intelligence applications have been distilled into nine modes of enterprise search and discovery for which analytics tools can be optimized, ranging from location of objectively defined data to subjective interpretation of associated sets of data (Russell-Rose et al., 2011). In a similar vein, research into how professional clients of a library interpret and use search results pinpointed three Search Modes – monitoring, planned and exploratory (O’Day and Jeffries, 1993). Users tend to switch between different modes as a search activity progresses, forming a pattern of Mode Chains until evaluation is completed and synthesis is achieved (Russell-Rose et al., 2011). This was demonstrated by the library clients’ use of multiple Search Modes (O’Day and Jeffries, 1993). O’Day and Jeffries focused on the specific information needs and resultant search behavior of professionals; however their study involved the use of librarians as intermediaries in the search process. Hence there remains an opportunity to apply and possibly extend their research findings to the direct IR activities of professionals. The findings that the nature of work tasks may affect the level of effort and amount of information required by users to complete them (Li and Belkin, 2010) underscores the utility of optimizing UI for specific Search Modes, since modes are determined by the task being performed. This has implications for the amount of detail presented and level of drill-down depth made available, depending on task complexity. 3.6 Data Type Extensive research has been conducted to examine various facets of information search and knowledge discovery. The established body of knowledge is predominantly focused on the exploration of textual data through the use of search engines and library databases. This leaves a gap in the understanding of how quantitative data is explored and analysed to yield insight. Tukey (1977) coined the term “exploratory data analysis” as a paradigm for simplifying the description of quantitative data and actively looking beneath its surface to discover new insight (Tukey, 1977, p.v). Data exploration is likened to a detective investigation, discovering and evaluating clues and following their trail until insight is gained (Tukey, 1977, p.1). The preferred investigation tool is a visual display of data, which is seen as vital to the discovery process (Church, 1979). Visualised data, consisting of a fixed scene, variable objects within the scene and defining characteristics of each object, provides a graphical means by which users perceive data and by extension, the underlying context which the data represents (Baker et al., 2009). Visual representations such as graphs and charts are a popular way of depicting and enhancing the perception of quantitative data. Although non-experts might be inclined to represent quantitative data with graphs, the Cognitive Fit theory (Vessey, 1991) was developed as a
  • 14. Page | 9 framework for determining the optimal conditions for visualisation and conversely, when such data are best represented in tabular form. The research concluded that graphs are suited to spatial tasks, requiring an overview of associations and relationships within the data, while tables are suited to symbolic tasks, requiring an analysis of discrete components of the data (Vessey, 1991). While the validity of the conclusions is accepted within the relevant academic community, it is worth noting that the type of data, rather than user, determines the proposed cognitive fit. It may thus be of interest to find out whether users’ cognitive style and learning style have an impact on the performance of graphs and tables. Specifically, it might be useful to investigate the relative performance of graphs in spatial tasks carried out by serialists and verbal learners as well as the relative performance of tables in symbolic tasks carried out by holists and visual learners. 3.7 Analytics Tool Marchionini (2006) asserts that data retrieval tools such as database systems and general- purpose search engines, which have varying levels of technical expertise requirements, can efficiently fulfill lookup tasks with human interaction limited to query formulation. Conversely, learning and investigating tasks require tools that facilitate continuous human interaction in the ‘exploratory search’ process (Marchionini, 2006). The need for continuous interaction with retrieved data places users, rather than tools, at the heart of the search process (Belkin, 1993), with design implications for user interaction with tools. Users’ ability to explore data and discover associations is enhanced by the development of bisociative (relationships between items in different data sets), as well as what I term intrasociative (relationships between items in one data set), knowledge discovery tools, with appropriate UI design. Gossen et al. (2012, p.291) argue that such UI need to support: dynamic information seeking, identification of associations between data sets, creative ways of exploring data and real-time access to live data. However the exploration of live data, which are typically cloud-based, increases the complexity and cost of the analytics tool used and as such, it may be more productive and feasible to explore subsets of live data stored locally (Fisher et al., 2012). Designing UI to facilitate creative exploration implies a flexible environment with loose navigation structures to avoid shepherding users along a predetermined path, and this may be daunting for non-experts and unsuitable for certain tasks. Hence, more robust knowledge discovery tools would cater for different search mode sequences (Russell-Rose, Lamantia and Burrell, 2011, cited in Russell-Rose and Tate, 2013, p.85), each with a customised UI. The inherent superiority of graphs over lists in revealing associations between data sets (Gossen et al., 2012, p.292) is tempered by advances in analytics tools, particularly with quantitative data. Microsoft Excel (Microsoft, n.d.), one of the most pervasive tools, provides pivot tables as a means of aggregating and combining data to yield insight. PowerPivot (Microsoft, n.d.), a more advanced version implemented in Excel 2010 and above, includes additional abilities to combine separate data sets - albeit with at least one field in common - for analysis as well as the Slicer tool (Microsoft, n.d.) for flexible and interactive data filtering. Specialised analytics tools such as Tableau (Tableau Software, n.d.) and QlikView (QlikTech, n.d.) are optimized for non-experts to explore associations between data sets in a flexible and
  • 15. Page | 10 interactive manner. The drawback from an organisation’s perspective is the extra cost of deploying such tools, in contrast to Microsoft Excel, which is virtually available by default. While QlikView offers a free version for individual use, accessing corporate data and sharing insights require a paid license. The challenges of evaluating the usability of knowledge discovery tools involve the creation of realistic scenarios and recruiting of representative users (Gossen et al., 2012, p.295). An effective way of sidestepping these challenges is to eschew laboratory-based testing and conduct the evaluation in an actual working environment with typical end-users completing real tasks. The main drawback here would be the risk of participants modifying their normal behavior as a result (Oates, 2006, p.204), whether consciously or subconsciously.
  • 16. Page | 11 4 METHODS This is an explanatory case study (Yin, 2003, cited in Oates, 2006, p.143) investigating the data exploration and information retrieval patterns of behaviour of four sub-groups of marketing professionals in a business news organisation. The ubiquity of the marketing profession in all industries and organisation sizes, established in section 1.1, makes the findings generalisable for other B2C marketing professionals performing similar tasks. The type of generalisation made is the “rich insight” yielded by the results (Walsham, 1995, cited in Oates, 2006, p.146). The four “dimensions” of User Type, User Goal, Search Context and Search Mode (Russell-Rose and Tate, 2013, p.1) provided the framework under which the study was conducted. The participants completed closed-format questionnaires, which were subsequently analysed to classify them according to their domain expertise, technical expertise, cognitive style and learning style. A description of the goals was obtained from each participant at the beginning of the observation session and analysis of the observation data yielded further insight into their goals. The participants’ job roles, obtained from their LinkedIn profiles, and specific details of their work task, gathered during the observation, defined the context of the information retrieval activities. The observations were conducted in the participants’ natural work environment while performing a regular work task, and qualitative analysis of the data revealed their Search Modes, that is, strategies employed in selecting and interpreting information. Further analysis yielded the sequence of occurrence of the Search Modes and those with the highest levels of clustering are indicative of the Mode Chains followed by the participants. Desirable UI features supporting the Search Modes and Mode Chains – as well as User Goals – which can be implemented in Microsoft Excel 2010 and alternative business analytics tools are suggested, drawing on prior research, existing software features and creative thinking. 4.1 Participant Selection A presentation was given to members of the B2C marketing department explaining the purpose and format of the study. A detailed participant information sheet (Appendix C) was provided to all twelve members of the department based in the London offices at the commencement of the study, out of which eleven agreed to participate and signed an informed consent form (Appendix D). One subsequently withdrew from the study, leaving a total of ten participants for the duration of the study. Members of the department based in the New York, Hong Kong and Manila offices were not invited to participate for logistical reasons. All sub-divisions of the B2C Marketing department were represented, vis-à-vis Audience Development; Customer Acquisition – Optimisation; Customer Acquisition – Direct Marketing; Customer Acquisition – Emerging Platforms; Customer Retention – Retention; Customer Retention – Customer Relationship Management; and Marketing Operations – Payments Optimisation. One participant was at Executive level, six were at Managerial level, one was at Senior Managerial level and a further two were at Departmental Head level.
  • 17. Page | 12 4.2 Data Gathering Russell-Rose and Tate propose the use of four “dimensions” as a framework for analysing search behaviour, re: User Type, User Goal, Search Context and Search Mode. While this study investigates the last of these dimensions, and User Goal to a lesser degree, data was gathered about the other dimensions through preliminary questionnaires and the actual observation in order to define the boundaries of the Search Modes observed. As such, the data gathered about User Type and Search Context is presented in this chapter rather than in Chapter 5, since they do not form a central part of the study. 4.2.1 User Type Participants completed an 11-question self-assessment of their proficiency in using IT tools to analyse data, a 5-item questionnaire to determine their preferred learning style and a rod-and- frame diagram exercise to indicate how they process information. Data about participants’ domain expertise was gathered from their LinkedIn profiles, with experts defined as possessing 3+ years of professional experience. 4.2.1.1 Domain Expertise Data about participants’ marketing domain expertise was gathered from their LinkedIn profiles, factoring in relevant academic qualifications, professional experience and industry certifications. For the purposes of this study, the benchmark for classification as a “domain expert” is the possession of a relevant first degree or higher and more than three years of professional experience, based on the Chartered Institute of Marketing’s criteria (The Chartered Institute of Marketing, n.d.). Those not meeting these criteria are considered a “domain novice”. Table 1 shows the participants’ qualifications and number of years of experience, with all meeting the “domain expert” criteria. 4.2.1.2 Technical Expertise An 11-question self-assessment of proficiency in using IT tools to analyse data (Appendix E), based on extracts from Bowling Green State University’s self-assessment questionnaire Table 1: Domain Expertise of Participants
  • 18. Page | 13 (Bowling Green State University Career Center, n.d.), was completed by participants. Each question had five answer choices ranging from ‘very low’ to ‘very high’ and a corresponding number range of 1 to 5. The questionnaire was scored by adding up the answer choices, dividing by 55 and multiplying by 1000 to rebase their values between 0 and 1000. Isograd’s TOSA proficiency scoring scale (Isograd, n.d.) was adopted in interpreting the scores, with participants classified as ‘beginner’, ‘basic’, ‘productive’, ‘advanced’ or ‘expert’. Those with a classification of productive or greater are considered “technical experts” while those classified as basic or lower are considered “technical novices”. Table 2 shows the results of the participants’ self-assessment with nine meeting the “technical expert” criteria and one falling into the “technical novice” category. 4.2.1.3 Learning Style A 5-question Learning Scenario Questionnaire (Mayer and Massa, 2003) (Appendix F), each describing a learning scenario and having an answer choice of a verbal or visual learning style, was used to determine participants’ preferred style of learning. Participants selecting a verbal learning style three or more times were classified as verbal learners while those selecting a visual learning style three or more times were classified as visual learners. Table 3 shows participants’ answers, with all falling into the visual learning style category. Table 2: Technical Expertise of Participants Table 3: Learning Style of Participants
  • 19. Page | 14 4.2.1.4 Cognitive Style A rod-and-frame diagram exercise (Russell-Rose and Tate, 2013, p.9) (Appendix G) was completed by participants to indicate how they process information. Those who drew a line parallel to the edges of the rectangle were classified as “serialists” (Figure 1) while those who drew a line along the north-south axis of the square background were classified as “holists” (Figure 2). The diagrams indicated that eight participants are “holists” while two are “serialists”. 4.2.1.5 User Type The data of participants’ domain expertise, technical expertise, learning style and cognitive style were combined to form a picture of the types of users whose search modes are being studied (Table 4). This information mainly contextualises the observation data within Russell- Rose and Tate's (2013, p.3) framework but efforts were also made to identify any influence of participants’ characteristics on their data exploration activities. Figure 2: HolistFigure 1: Serialist Table 4: User Type of Participants
  • 20. Page | 15 4.2.2 User Goal Although the objective of this study is to gain insight into how users interact with data in order to fulfil their needs, rather than the actual nature of the needs, knowledge of the participants’ goals provides a useful backdrop to their respective search modes. Prior to the observation, the following questions were sent to participants to ensure their tasks have a clearly defined goal: 1. What specific information do you need? 2. What triggered the need? 3. Which data set will you explore to meet the need? 4. Which software application is the data set contained in? 5. Which software application is the data set analysed in? 6. How will you act on the information once it is retrieved? Where a participant performed more than one task, the questions had to be answered separately for each task. The questions correspond to the four steps of the Information Journey Model (Attfield and Blandford, 2010), which covers the lifecycle of information need fulfilment. Questions 1 and 2 correspond to “recognising an information need” while questions 3 and 4 correspond to “acquiring information”. The third step, “interpreting and validating the information”, can be inferred from question 5 and question 6 corresponds to “using the information”. A potential limitation of defining User Goals before the data exploration activity – as opposed to capturing them during the activity – is it takes a static view of the goals by not accommodating dynamic changes along the way. Hence while participants were required to consider the questions in choosing a suitable observation task and answer them at the beginning of the observation, further evidence of their goals were derived from the analysis of the observation data after it was transcribed and coded. 4.2.3 Search Context To aid an understanding of the context in which information seeking is conducted Russell-Rose and Tate (2013, p.50) propose the viewing of data exploration activity from four progressive perspectives: “information retrieval”, “information seeking”, “work task” and “cultural context”. The work task layer in particular provides the professional context in which participants engage in data exploration and information retrieval. The participants’ job descriptions and general areas of responsibilities within the organisation were harnessed from their LinkedIn profiles (Table 5) while information about the specific work task being fulfilled was gathered during the observation (Table 6), based on their answers to the User Goal questions in Chapter 4.2.2. Their general job roles and the specific duties carried out during the observation convey the organisational need that motivated the information seeking. This information mainly contextualises the observation data within Russell-Rose and Tate's (2013, p.47) framework and to a lesser extent, was used in analysing the influence of participants’ work task on their data exploration activities.
  • 21. Page | 16 4.2.4 Observation Gaining insight into the behavioural processes followed by the participants as they explore data and retrieve information necessitated a case study research strategy, as it allows in-depth study of the phenomenon with all variables taken into consideration, and within their natural setting of occurrence (Oates, 2006, p.141). Basing the case study on a regular task makes it representative and thus generalisable to marketing professionals performing similar tasks while providing an opportunity to test the applicability of established theories (Oates, 2006, p.146), specifically the Information Journey model (Attfield and Blandford, 2010) and Russell- Rose and Tate's (2013) framework for search and discovery. Data was gathered by observing the participants while performing their tasks which, as a “practitioner-researcher” working in the organisation, I was able to carry out after obtaining necessary permissions without the need for familiarisation or access negotiations (Oates, 2006, p.209). However, challenges of balancing the study with my work responsibilities as well as minimising interference with participants’ work schedule had to be managed. Despite drawbacks of being relatively time-consuming and generating large amounts of unstructured data (Rogers et al., 2011, p.261), as well as the risk of participants modifying their normal behaviour, observation was selected over other data gathering methods such as interviews and questionnaires because it captures what is been done and how; this is more reliable than potentially selective verbal descriptions (Oates, 2006, p.198). Furthermore, observing participants in their natural work environment rather than in a controlled environment avoided the probable effects of artificial work conditions limiting the relevance of the results (Rogers et al., 2011, p.261). Observation sessions were scheduled for a maximum of 60 minutes but flexibility was allowed to address concerns about the potential impact on participants’ work schedule and productivity. 10 minutes was scheduled for introduction and equipment set up, followed by up to 50 minutes of direct observation at their workstations. Participants were observed while performing a regular task during their normal work schedule, using their normal data sets and analytics tools. Participants were required to think aloud as they performed their tasks so as to verbalise their thought processes (Rogers et al., 2011, p.256) and questions were asked for clarification during Table 5: Job Roles of Participants
  • 22. Page | 17 the observation session rather than in a follow-up interview in order not to depend on memory recall. 4.2.4.1 Introduction At the beginning of each session, I explained:  The participant’s role in the study;  My role as the moderator;  The format of the study;  Think-aloud technique. 4.2.4.2 Equipment Set Up The sessions were conducted in the natural work environment of the participants, using their regular furniture and Windows laptop computer as well as data sources and data analysis software routinely used to complete tasks. Screencast-O-Matic (n.d.) screen capture java applet was used to record on-screen activities, with participants’ voices recorded simultaneously by the applet using the computer’s built-in microphone. The audio-visual recordings were exported in MP4 file format at the end of the sessions. 4.2.4.3 Observation Tasks At the beginning of each session, participants gave a description of their goals regarding:  The specific information needed;  The trigger of the need;  The data set to be explored to meet the need;  The software application the data set is contained in;  The software application the data set will be analysed in;  How the information will be acted on once it is retrieved. Participants were requested beforehand to select a task that was performed regularly – defined as at least once a week – for the observation sessions. Nine participants selected one task while one participant selected two tasks. Of the eleven tasks observed, seven met this criterion. One, which was performed once a month at the time of the observation, was planned for increase to a weekly basis. Two tasks were performed regularly but without a fixed schedule; the last task, although performed on an ad-hoc basis, was considered to be representative by the participant who otherwise did not perform regular data exploration tasks. Seven of the observed tasks involved the creation of a report on a spreadsheet, of which six were based on quantitative data extracted from web-based sources and one based on quantitative data in an existing spreadsheet. Two tasks involved the analysis of quantitative data on an existing spreadsheet and a further two tasks involved the analysis of quantitative and qualitative data on web-based platforms. The shortest observation session lasted for almost 7 minutes while the longest lasted for just over 80 minutes. The total length of the observation sessions was approximately 4 hours and 50 minutes.
  • 23. Page | 18 A task was deemed completed when:  The initially identified information need had been satisfied; and/or  Subsequent information needs triggered by the exploration activity had been satisfied; or  The exploration was terminated because the information need could not be satisfied. Nine participants satisfied the identified information needs; Participant I was unable to retrieve all the required information due to technical problems with the data source but gave a verbal description of how the task would have been performed while providing on-screen demonstrations with the available data. A summary of the observation sessions is shown in Table 6:
  • 24. Page | 19 Table 6: Summary of Observation Sessions
  • 25. Page | 20 4.3 Data Preparation 4.3.1 Tools Screencast-O-Matic java applet was used to record the computer screen and sound during each observation and saved in MP4 file format. All but one of the observations were of good audio-visual quality, enabling accurate transcriptions. The exception was Participant F which suffered from poor audio quality and hence had gaps in the transcript, although that was partially mitigated by interpreting on-screen actions. The transcription of each recording was produced in HyperTranscribe 1.6 (Researchware, n.d.) and exported as a Microsoft Word (Microsoft, n.d.) document (Appendix O). The transcripts consisted of the vocal dialogues of participants and where appropriate, supplemented by descriptions of on-screen events, using a different-coloured text to differentiate them from participants’ own words. Transcribed sentences were timestamped to facilitate referencing. The transcripts were imported into QDA Miner 4.1 (Provalis Research, n.d.) for coding and software-assisted analysis, with the results exported as Microsoft Excel documents for in- depth analysis. 4.3.2 Coding A prerequisite for analysing the qualitative data gathered in the observations was to look for themes and patterns that emerge from them. The transcripts were read, interpreted and coded with the themes reflected in each passage (Appendices I and N). Duplicate coding was avoided by interpreting the passages and coding each unique occurrence of a theme once, regardless of how many times it may have been referred to. However where a unit of passage was perceived to reflect more than one theme, it was coded with all applicable themes. Additional steps were taken to ensure themes were coded in their correct sequence by re- arranging the subject and predicate where appropriate. Descriptions of on-screen actions (in red text) were added to complement verbal descriptions, where deemed necessary. A combination of descriptive and analytic coding techniques (Gibbs and Taylor, 2010) was used in identifying the methods and reasoning processes followed by the participants in their selection and interpretation of information. Efforts were made to ensure consistency of coding by comparing similarly coded passages within each transcript and with transcripts of other participants (Gibbs and Taylor, 2010). Explanatory comments to justify coding decisions were also added where deemed necessary. However it is acknowledged that coding decisions are subjective and based on the interpretations of a single coder, without the benefit of checking coding agreements with other coders. The transcripts were coded separately for User Goals and Search Modes and both dimensions were analysed independently. 4.3.2.1 User Goals The Information Journey Model (Attfield and Blandford, 2010) provided the underlying theory for identifying the participants’ goals (Table 7). The four steps of the model support a dynamic as opposed to static nature of information seeking (Russell-Rose and Tate, 2013, p.26) and activities of participants were interpreted within the framework.
  • 26. Page | 21 Table 7: Codebook of User Goals Code Description Recognising Need Recognising an information need Acquiring Information Acquiring information to fulfil the need, whether an entire data set or an item within a data set Interpreting Information Interpreting and validating the acquired information Using Information Using the information to fulfil the need 4.3.2.2 Search Modes The nine search modes developed by Russell-Rose et al. (2011), distilled from over one hundred conceptual user scenarios of search and business intelligence applications, provided the “priori codes” (Gibbs and Taylor, 2010) used for the coding. These include all the codes in Table 8 except Collaborate, Measure, Recognise and Update, which I developed. These are “grounded codes” (Gibbs and Taylor, 2010) that emerged from the observation data, independent of pre-existing theories. Forecast, though identified as a search activity (Marchionini, 2006), is not a distinct search mode under Russell-Rose et al.’s classification. Russell-Rose et al. classified each Search Mode according to the three types of search activities developed by Marchionini, that is, Lookup, Learn and Investigate. Search Modes that entail finding information are classified under Lookup; those that involve knowledge acquisition and development are classified under Learn while others concerned with dissecting and combining information or using judgement are classified under Investigate. Lookup searches retrieve precise information to meet clearly defined needs whereas Learn and Investigate searches are exploratory in nature, with users analysing information and making sense of it (Russell-Rose and Tate, 2013, p.72). These three top-level categories were in turn used in classifying the “grounded codes” with the exception of Update, as it does not fit precisely into any category.
  • 27. Page | 22 Table 8: Codebook of Search modes (those highlighted in green are my own “grounded codes”) Code Description Locate LOOKUP: To find a specific (possibly known) item Verify LOOKUP: To confirm that an item meets some specific, objective criterion Monitor LOOKUP: To maintain awareness of the status of an item for the purposes of management or control Recognise LOOKUP: To identify an item based on prior knowledge Compare LEARN: To examine two or more items to identify similarities and differences Comprehend LEARN: To generate independent insight by interpreting patterns within a data set Explore LEARN: To investigate an item or data set for the purposes of knowledge discovery Measure LEARN: To determine the quantitative value of an item Analyse INVESTIGATE: To examine an item or data set to identify patterns and relationships Evaluate INVESTIGATE: To use judgement to determine the value of an item with respect to a specific goal Synthesise INVESTIGATE: To create a novel or composite artefact from diverse inputs Forecast INVESTIGATE: To extrapolate a future value of an item from current data (added from Marchionini's taxonomy) Collaborate INVESTIGATE: To seek the input of others in interpreting an item Update Editing the value of an item The “grounded codes” were deemed to be of a similar level of abstraction to, and conceptually different from, the “priori codes” to merit inclusion. Forecast – classified under Investigate by Marchionini – was added to describe the derivation of future values from existing data. Collaborate stood out as an Investigate search mode that draws on the expertise and professional judgement of colleagues to gain insight about an item. Measure was identified as a Learn search mode peculiar to quantitative data, typically involving performing calculations on a given data set. Recognise, classified under Lookup, was considered to be sufficiently different from Locate as it emphasises the use of prior knowledge gained through experience in identifying an item. Update, perhaps technically not a search activity as it involves adding or changing items, was considered important as it describes a distinct activity within data analysis and interpretation. Russell-Rose et al. (2011) proposed three properties that a Search Mode should possess: ‘consistency’; ‘orthogonality’; and ‘comprehensiveness’. While ‘consistency’ and ‘orthogonality’ can be evaluated without any data, ‘comprehensiveness’ can only be assessed after data has been analysed and as such, the degree to which the “grounded codes” possess the three characteristics are evaluated in Chapter 5.2.1.3.
  • 28. Page | 23 5 RESULTS AND DISCUSSION The participants were observed as they performed a regular task in their natural work settings. The observation data was coded separately for their User Goals and Search Modes, with coding themes derived from Attfield and Blandford’s Information Journey Model and Russell- Rose and Tate’s search and discovery framework respectively. The individually coded transcripts were combined within QDA Miner to derive an aggregate view of the data exploration and information retrieval behaviour patterns of the marketing professionals. Using the built-in tools of QDA Miner, the number of occurrences (frequency) and order of occurrences (sequence) of the codes were computed and the results exported in Microsoft Excel file format for analysis. 5.1 User Goal The User Goal provides insight into the need that motivates information-seeking activity and the different stages that lead to the fulfilment of the need. Such understanding can be leveraged by designing analytics tools to support users at each stage of their information seeking process as well as their transition between stages. 5.1.1 Code Frequency The code frequency conveys the total number of times each code occurred during the information seeking process, and is indicative of the relative significance of each User Goal to the process. Of the combined 491 occurrences of all goals, the predominant goal of the participants was Interpreting Information, occurring 161 times, followed by Acquiring Information at 155, Using Information at 101 and Recognising Need at 74 (Figure 3). This partially aligns with the assertion that in the Information Journey Model, information interpretation and use are the key drivers of information seeking behaviour (Russell-Rose and Tate, 2013, p.26), although information acquisition seems to play a comparably dominant role. Figure 3: Frequency of User Goals for All Participants 74 155 161 101 Recognising Need Acquiring Information Interpreting Information Using Information
  • 29. Page | 24 In fulfilling recognised needs, the participants were three times more engaged in information acquisition and interpreting activities as they were in using information. There are a range of probable reasons for the higher frequency of these activities. It is theoretically possible that the process of searching, retrieving and analysing data is demanding and labour-intensive relative to the process of using it once it is prepared. Neither time nor effort was measured however, so this cannot be corroborated by evidence. It could also be possibly due to difficulty in locating the required data or understanding its meaning. This is unlikely however, given the regularity of the observed tasks and “domain expertise” of the participants. Another plausible factor could be relative difficulty in using the data sources and analytic tools, making the process less efficient. A distinguishing feature of ‘technical novices’ compared to experts is a higher tendency of the former to reformulate queries, resulting in longer time and effort expended in acquiring information (Russell-Rose and Tate, 2013, p.4,6). The sole ‘technical novice’ in the group, Participant A, whose technical proficiency self-assessment was Basic, had a ratio of information acquisition to information usage of 2 (Figure 4), in line with the group average (Appendix H). Participant I, whose self-assessment was ‘Advanced’, had a higher ratio of 3, suggesting lower efficiency. On the other hand Participant K, similarly self- assessed as Advanced, had a below-average ratio of 1, suggesting higher efficiency. The results are thus insufficient to demonstrate a link to “technical expertise”. The possible effects of the relative complexity of the tasks have not been investigated, however. Alternatively, the higher proportion of engagement in acquiring and interpreting information could suggest that the participants are comfortable with both the content and the tools used in extracting and analysing them, hence they are willing and able to explore the data in enough detail to gain insight. The regularity of the tasks and the participants’ “domain expertise” lends some credence to this hypothesis but perhaps more important is the nature of the tasks – the performance of the marketing campaigns and factors that influenced them needed to be understood and explained for monitoring and reporting purposes, placing greater emphasis on information acquisition and interpretation. Thus the majority of activities involved exporting data from its source into a separate analytics tool as well as identifying specific items within a data set for examination (Appendix I). Conversely, using the results for insight or decision making (Attfield and Blandford, 2010, p.33), or condensing into a report, involved less iterations. Figure 4: Frequency of User Goals per Participant 0 5 10 15 20 25 30 35 40 45 Recognising Need Acquiring Information Interpreting Information Using Information
  • 30. Page | 25 It is striking that unlike other participants, the predominant goal observed in Participant H was Using Information, occurring 11 times compared to 9 and 8 for Acquiring and Interpreting Information respectively (Figure 4). The participant’s information seeking path is indicative, as seen in the code sequences heat map which uses brighter colours to indicate higher occurrences (Figure 5). Out of 35 total activities, 6 information acquisition activities were followed by information usage, without an apparent interpretation process. The nature of the task performed offers a possible explanation, as the “information seeking process” involved iteratively acquiring data from one source and inputting it into another before analysing the combined data. This is evident in statements such as “…and just go through, find the actual SegID and then input the numbers here…”, “…and then just put that number in here…”, “…and put it in here…” and “…so then basically once I fill in everything, I'm done with this data source…”. Other participants demonstrated similar information seeking behaviour albeit to a lesser extent (Appendix J) and this may be a reasonably pervasive process, as noted by Pirolli and Card (2005) in their study of intelligence analysis: “much day-to-day intelligence mainly consists of extracting information and repackaging it without much actual analysis”. Figure 5: Code Sequences - Participant H 5.1.2 Code Sequences The code sequences convey the number of times one code is followed by another and is indicative of the path followed in seeking information, that is, how users transition from one goal to another until their information need is met. Further analysis of the aggregated code sequences of all participants (Figure 6) yields results that are in line with the progressive steps of the Information Journey Model (Attfield and Blandford, 2010, p.30). The total number of times one code was followed by another is 485, with Recognising Need followed by Acquiring Information 62 times, which was in turn followed by Interpreting Information 100 times and the latter followed by Using Information 68 times. This suggests that when seeking information, the participants are likely to follow a linear path of need recognition – information acquisition – information interpretation – information usage; hence analytics tools could be optimised to smooth their journey along the path. 2 Number of times that the User Goal in a row (1) is immediately followed by the User Goal in the intersecting column (2) RecognisingNeed Acquiring Information Interpreting Information UsingInformation 1 Recognising Need 5 3 Acquiring Information 3 6 Interpreting Information 3 1 4 Using Information 4 3 2 1
  • 31. Page | 26 Figure 6: Code Sequences - All Participants Considering the theory of the dynamic model of information seeking which the Information Journey Model is built on (Russell-Rose and Tate, 2013, p.26), it is interesting that Interpreting Information was followed by Acquiring Information 38 times. Examples of information acquisition following interpretation without the explicit recognition of a need can be seen in Participants A (“but then if you look at this, this is showing that clicks on this tweet…saying we had 7000 clicks and then when you look at the Bitly one, so that records the actual clicks on link so 598 clicks, but then when you go back and look at this MatterSight report, it only actually delivers 7 registrations”) and D (“so this gives us averages, so weekly increase or decrease, so we can see that was a particularly high week, so you get a flow of...subs, we should have acquisitions in here as well, that is the dark blue so we can see we've kind of been hovering around the 1000 mark recently which is really nice”). Similarly, Using Information was followed by Acquiring Information 37 times, bypassing Recognising Need. This is apparent in Participant G (“the last thing we need to do here is add these numbers in here so I can see the UK split so now that I've done that, the UK numbers will change for the week just so I can have the trial and sub numbers together… so the next bit, I'll just then take the end figures here as well as this graph”) and Participant I (“that I would try and answer with the data in a certain view and I'll be breaking that out further, but also I'd try and take qualitative data to answer that”). These seem to support the standard and cognitive models of information seeking that emphasise the iterative acquisition and interpretation of information until an initially recognised need is fulfilled (Russell-Rose and Tate, 2013, p.24). The lower instances of acquiring and interpreting information leading to the recognition of a new need suggest that the participants’ goals were mostly static in nature, in contrast to goals that are dynamically modified as information is acquired and interpreted. Although information acquisition that does not directly proceed from a recognised need could alternatively suggest serendipitous discovery, where “[information is encountered] without explicitly looking for it” (Blandford, featured in Russell-Rose and Tate, 2013, p.43), this could not be confidently inferred from the data available. 5.1.3 Designing for User Goals Implementing specific features in business analytics software can support the observed information seeking processes, thereby improving the efficiency of users. 2 Number of times that the User Goal in a row (1) is immediately followed by the User Goal in the intersecting column (2) Recognising Need Acquiring Information Interpreting Information Using Information 1 Recognising Need 3 62 6 3 Acquiring Information 15 19 100 21 Interpreting Information 24 38 31 68 Using Information 23 37 25 10
  • 32. Page | 27 5.1.3.1 Acquiring and Interpreting Information User Goals that involve using information that has been acquired without need for interpretation, as demonstrated by Participant H, are viable candidates for automation. Standardised data that are regularly exported into business analytics software would benefit from direct connection to the data source, which could be refreshed with a single click. Marketing professionals could thus invest a greater proportion of their effort in using the information. Participants did not merely acquire information for usage however; monitoring and reporting on the performance of marketing campaigns involved a high proportion of interpretation in order to make sense of the data. Multiple data sets from one or several data sources were typically acquired for interpretation, a practice described as building a “shoebox” collection of all data items before they are interpreted for relevance and meaning (Pirolli and Card, 2005). Russell-Rose and Tate (2013, p.34) assert that enabling rapid population of the “shoebox” can support this process and this reinforces the utility of a seamless integration between analytics software and data sources, empowering users to acquire and interpret data in a single workflow. Moreover, judging the relevance and meaning of data items is an internal mental process that benefits from externalisation to prevent memory overload and potentially allow collaboration (Russell-Rose and Tate, 2013, p.34). One way of facilitating this is to reorganise and reformat the data to make it suitable for interpretation (Pirolli and Card, 2005) and simple-to-use tools in Microsoft Excel for filtering and sorting partly fulfil the requirement. On the other hand, more complex processing such as charts for visualisation or pivot tables for sub-selection and aggregation require more technical nous and could benefit from simplification. The relative ease of switching between tabular and graphical representation of data in Tableau, for instance, demonstrates a more refined implementation that could be adopted more widely.
  • 33. Page | 28 5.2 Search Mode Search Mode describes the different strategies employed by users in exploring data during the full cycle of their information seeking activity. A clearer understanding of such strategies and how they tend to be combined could be supported in analytics tools to enhance data exploration and information retrieval processes. 5.2.1 Code Frequency The code frequency conveys the total number of times each code occurred during the information seeking process, and is indicative of the relative significance of each Search Mode to the process. The number of times each code was used and the number of participants in which it appeared were computed to identify recurring themes that could be suitable candidates for generalisation. There was a late discovery of inconsistency with respect to Recognise, which was initially named Recall. The code was renamed part way through the coding process and it was not updated in previously coded participants. Although the figures for Recognise and Recall could have been added to derive the total frequency, it might not be a suitable solution for deriving the code sequences (section 4.2.2) as the computation process used in QDA Miner is unknown. Since the error could not be rectified due to the expiry of the trial licence of QDA Miner, the frequency of Recognise was deemed unreliable and thus discounted from the results. The remaining thirteen codes had a total frequency of 433, ranging between 9 and 61 (Figure 7), and each code could feature in a maximum of 10 participants (Figure 8). Figure 7: Number of Occurrences per Code
  • 34. Page | 29 Overall, participants performed more ‘exploratory searches’ than Lookup searches, with 313 and 96 occurrences respectively. 160 of the ‘exploratory searches’ were of a learning nature while 153 were investigative. A further 24 instances of updating items were conducted. The higher occurrence of ‘exploratory search’ activities was reflected in all ten participants with no significant variations between those performing analysis on web-based platforms and those importing data into Microsoft Excel for analysis (Appendix L). Moreover the highest occurring search activity in each participant was exploratory, with the exception of Participant G (Appendix L). The prevalence of Locate in Participant G is not correlated with the nature of the observation task, as this was not repeated in other participants who extracted weekly customer acquisition data from web-based platforms and analysed them in Microsoft Excel for management report, including Participants F and H (Chapter 4.2.4.3: Table 6). No explanation for the divergence can be inferred from the data. The results suggest that finding information is a minor component of the marketing professionals’ data exploration and information retrieval patterns of behaviour; the majority of their activities involve analysing and making sense of information once it has been located. The implications for data sources and business analytics software used by such professionals is the provision of improved features to enhance analytical and sensemaking processes rather than focusing on finding information (Russell-Rose and Tate, 2013, p.72). 5.2.1.1 Analyses of Most Frequent Priori Codes The ‘priori codes’ (those derived from Russell-Rose et al.’s Search Modes) with the highest number of occurrences offer a relatively high degree of confidence for generalisation, since they occur in reasonable quantities to suggest commonality. The most frequently occurring code was Compare at 61, appearing in all observed tasks except Participant J. As a Learn activity concerned with common and/or contrasting qualities between two or more items, it appears to be a vital element in the interpretation of quantitative data. It was a recurring theme in mainly three contexts: Figure 8: Number of Participants per Code
  • 35. Page | 30  Assessing periodic performance against set targets, as seen is such statements as “so at the moment we're 500 short of that target, actually we're about 400 short of that target cuz that's minus 100 (Participant F)”, “so we've got a column to tell us what we should be achieving and we can look to see if that's happening or not (Participant C) ” and “so again that's not quite hitting target but it's in line with weekly average (Participant D)”;  Assessing performance across different periods, for example “so we're seeing there were 1 or 2% of percentage point drop on a weekly basis then we get to week 10 and all of a sudden it drops by 40 percentage points (Participant I)”, “and then on week 29, right around here, we switched the sign up forms to just the new version of the sign up forms that performed better across the board on other sources, and you can see that on week 29 there was a jump (Participant H)” and “September got higher click-through (Participant A)”;  Assessing the validity of data sources, evidenced by comments like “what we've done is really said, if iJento is showing a huge number and MatterSight isn't, and it's big, we'll take some action in looking into it (Participant H)”, “and then I'll have a call on Thursday with the search agency and then we discuss the figures that they're getting through their tracking, Google Analytics, match it up against what we've got here, first to see how accurate it is (Participant G)” and “but then if you look at this, this is showing that clicks on this tweet, so it could be our handle or anywhere around it, saying we had 7000 clicks and then when you look at the Bitly one, so that records the actual clicks on link, so 598 clicks (Participant A)”. Evaluate also featured highly, garnering 60 occurrences across all 10 participants. There were two contexts in which this Investigate activity was repeatedly identified:  Drawing an objective conclusion as a follow-up to a comparison, which can be seen in extracts such as “so we've got a column to tell us what we should be achieving and we can look to see if that's happening or not. So, we can see that we're generally doing pretty well on registration barriers; on subscription barriers we're struggling a bit (Participant C)”, “so clearly Friday is our poor one, a poor day for click-through and open rate, and consistently I'm seeing here Wednesday and Thursday actually which is quite interesting, consistently we've found Monday is a good day to send news on, and Sundays people are on their iPads and they're highly engaged as well (Participant K)” and “so for example for the last few weeks, SEO has delivered quite a phenomenal amount of subs which is great, but we're trying to understand why it's been delivering more subs than usual...so that said to me it wasn't necessarily an increase it was just more accurate attribution, so that could be important because we've had quite good acquisition results over the last few weeks, and it's quite important to know why we've been up… so that was quite good cuz then we could say there hadn't been a big increase in SEO, it's down to more the programmes that we've run (Participant D)”;  Making subjective professional judgements, as demonstrated by “the US one is exactly the same or we might see it on the site and think that might work well (Participant A)”, “maybe we can modify that by including live Twitter content now or advertising particular Twitter niche fields based on their industry... that would do better, but just to say follow us on Twitter, it's a bit ambiguous, there's no real drive, call to action is not very strong (Participant K)” and “this one I probably wouldn't use because it's probably out of date, with the exception if I'm doing something with comment pieces, they tend to last for a bit longer (Participant C)”.
  • 36. Page | 31 With a frequency of 57, Explore had the third highest occurrence and featured in 8 of the 10 participants. Two common contexts were reflected in this Learn activity:  Sifting through a data set to gain insight about a specific item, illustrated by “has there been any sort of changes in the traffic, have we put more money towards something, have we done any optimisation, because we're seeing an increase in subscriptions, why is that the case (Participant H)”, “so we're seeing there were 1 or 2% of percentage point drop on a weekly basis then we get to week 10 and all of a sudden it drops by 40 percentage points, that then gives me to say why, and then I'd try and drill down further and say ok, what happened at that week (Participant I)” and “so from there I would just dig down and I can see where is this problem (Participant L)”;  Scanning through a data set for items that stand out without seeking a specific outcome, exemplified by “interestingly 6% have forgotten their password…the upgrade to standard and premium didn't get a click, then you've got the email briefings' done quite well, 11% (Participant K)”, “I just look at that as a reference, to see if I can see anything coming out, in terms of content (Participant C)” and “there are other bits, so this is quite interesting to see number of followers, so that would be how many people have started to follow @FT, which obviously is good because it means they'll then be getting the updates in their feed as well which might help with engagement; retweets are interesting because people are obviously interested in the content (Participant A)”. Locate, a Lookup activity, occurred 39 times across all 10 participants in two recurrent contexts:  Finding the value of an item, including “so I'll be looking at this section here and I'll be looking at what the total subs is, FT.com, how many individual subs we've acquired (Participant D)”, “I would start with this, so a visualisation, how are print subs doing (Participant I)” and “so looking at this grand totals for Welcome for registered users, we've got 28% open rate, 3.64% click through (Participant K)”.  Identifying a specific item within a data set, such as “so, I know pretty much off by heart the SegIDs in mind, so I can select them (Participant C)”, “this [Sub Source] is what I'm interested in (Participant F)” and “yeah do it side-by-side, and just go through, find the actual SegID (Participant H)”. Another Lookup activity, Monitor, occurred 39 times in 9 participants. Two recurring contexts were identified in the occurrences:  Maintaining passive awareness of the status of an item to provide a backdrop to assessed items, as seen in “the Asia and the US I'm not so worried about cuz they're nice to know but they're not for us to manage, therefore they're managed from those regions but I do take the number out of MatterSight just so I can see how they're going, just to see cross-region comparisons (Participant G)”, “then obviously we'll look at the number of subscribers that have cancelled, and payment failure, and then look at the net increase for the week so that's sort of quite good to know about but for me it's really how many acquisitions we've achieved (Participant D)” and “we do take insight from that, it's a temperature check more than anything else (Participant K)”.
  • 37. Page | 32  Maintaining active awareness of the status of an item in order to assess it, demonstrated by “so within Splunk we have a payment dashboard, so we can see for different time spans - go to 7 days - the new payment flow, and this will populate, so we have debit successes, debit failures and debit errors, so this is when there's a technical error. So we can see that at a glance at any point, and then we've got the reasons and the error messages that are coming through here, and then some more data, and then the same for the old flow (Participant L)”, “this was the SegID we put against this tweet, that was a registration push, so I started to look at this to see how many did it actually deliver (Participant A)” and “so we track these every week, so that we can see any fluctuation, try and identify where any marketing activity that's being done is affecting these SegIDs (Participant C)”. Analyse occurred 37 times in nine participants, with two contexts encountered repeatedly in this Investigate activity:  Determining factors contributing to the value of an item, exemplified by “hey we've achieved what looks like an extra 20, 30 subs, and in this case she came back and said that we had sent out, we had communicated with a lot more people because there'd been a glitch in previous weeks, and so that explains the large number (Participant D)”, “what channel did they come in from, why are they doing well...all of them came in from email and what letter series did they have, they didn't have a letter, they had emails so we then have a hypothesis or something to answer which is, we think people that come in through email and only respond by email are better, are retained better (Participant I)” and “try and identify where any marketing activity that's being done is affecting these SegIDs (Participant C)”.  Discovering patterns within a data set, illustrated by “I'd then go back to the pivot and see, I'd then try and see if I can break it down by currency and see, maybe put some, it's mostly ad-hoc, and say is it affecting one currency more than another (Participant L)”, “we can chart over time to see trends regionally and see whether the growth..., just by looking at their email behaviour, whether regions are growing in volume and in engagement levels as well, this is from January to August, I'd have to export this into a spreadsheet and chart it, that's what I would generally do, I take this data, export it and then play with it (Participant K)” and “the time that we push it out we might think about differently, so should we push it out more on a weekend because it's more of a video, and, it's different, apparently 9 o'clock on a Sunday night works quite well for Twitter as well (Participant A)”. Synthesise, although only occurring 18 times, was the only code other than Evaluate and Locate to feature in all 10 participants. One recurring context was identified in this Investigate activity:  Condensing multiple items into a management report, for example “so this is the full report that I send to [Senior Manager]...a much more condensed version (Participant F)”, “I'll just then take the end figures here as well as this graph, and include it in my weekly report which is in the form of an email (Participant G)” and “so out of this I write a weekly report for the Optimisation team, which looks at these SegIDs and any barriers and numbers and any reasons why that might be (Participant C)”. Russell-Rose et al.’s Search Modes were based on user scenarios derived from customers during the development of search and business intelligence applications on Endeca Latitude (Oracle, n.d.), an enterprise data discovery platform. Future work suggested by the researchers
  • 38. Page | 33 included “empirical research and observation of knowledge workers in context to validate and refine the discovery modes and triggers that give rise to the observed patterns of usage”. In this research, the ‘priori codes’ occurred within the observation data of marketing professionals, who were using a mixture of data sources and analytics tools to perform regular work tasks under normal working conditions. Hence the data supports the applicability of the Search Modes within a live work environment. The implications for marketing professionals include the design of data and analytics software for optimal user experience, based on their data exploration and information retrieval patterns of behaviour (Russell-Rose and Tate, 2013, p.76). By having a clearer understanding of the Search Modes of business users, interaction designers of analytics software would be better equipped to enhance the UI with relevant usability, functionality and content features (Rogers et al., 2011, p.15). 5.2.1.2 Designing Information Systems for Search Modes The observation tasks were highly concentrated on assessing the performance of marketing campaigns for monitoring and reporting purposes, which is the raison d'être for business analytics within the marketing function (SAS, n.d.). Therefore using the three highest occurring codes – Compare, Evaluate and Explore – some insight can be gained into ways business analytics tools might be designed to support the data exploration and information retrieval processes of marketing professionals. 5.2.1.2.1 Compare – Assessing periodic performance against set targets / Assessing performance across different periods Given that all participants described themselves as visual learners, it might be helpful to present the target and actual data in Microsoft Excel as column or bar charts to facilitate comparisons. The process could be automated by designing a dashboard-style worksheet containing the charts, which are linked by formulae to a table on a different worksheet where the data is updated periodically. Likewise, the dashboard could include a line chart to show trends over time, enabling the spotting of sharp changes at a glance. Building pivots for the tables and charts would facilitate deeper exploration and analysis, as double-clicking on any point of interest on the chart can access the underlying data. Other analytics tools such as QlikView might offer a richer user experience since the design of dashboards and exploratory capabilities available can be more intuitive than in Excel. That said, PowerPivot – available in Excel 2010 and above – offers more advanced features to compete more strongly in the business analytics market. The usability would be further enhanced by connecting the analytics tool to web-based data sources where applicable, enabling direct retrieval of data from within the tool using single- click refresh (Figure 9).
  • 39. Page | 34 5.2.1.2.2 Evaluate – Drawing an objective conclusion as a follow up to a comparison / Making subjective professional judgements Evaluation is a sensemaking activity engaged in by participants, following the retrieval and analysis of an item. It may be useful to capture the objective or subjective conclusions drawn for future reference and the ability to annotate data enables this, reducing reliance on memory recall (Russell-Rose and Tate, 2013, p.39). Excel provides the ability to add a comment to a cell (Figure 10) and although they can be viewed easily, the process of editing, extracting or collating them is arguably unwieldy. An added functionality to view all comments on a spreadsheet might be beneficial for retrieving evaluations without having to comb through individual cells. Figure 9: Workbook Connections Figure 10: Evaluation Comments
  • 40. Page | 35 5.2.1.2.3 Explore – Sifting through a data set to gain insight about a specific item The ease with which a data set can be explored depends on both the analytics tool deployed and the structure of the data. Building a relational data table that models the relationships between different sets of data would expedite investigations into how other items might have affected the item of interest. QlikView’s associative data model (QlikTech, n.d.) and Excel’s PowerPivot linked tables (Microsoft, n.d.) are some of the tools that support such features. By displaying related data items on an interactive dashboard that only requires clicking and selecting rather than writing queries in a technical language, serendipitous discovery of knowledge is facilitated (Russell-Rose et al., 2011, p.38). QlikView has the added benefit of displaying a ‘breadcrumb’ of selected fields in its ‘Current Selections’ pane, which enables users to keep track of the layers of selections and retrace their steps if needed (Figure 11), further encouraging exploration (Russell-Rose et al., 2011, p.79). 5.2.1.2.4 Explore – Scanning through a data set for items that stand out without seeking a specific outcome The visual learning preference of the participants suggests that a dashboard using charts or other graphical formats to visualise the data would expedite such exploration activities. This is supported by the observed behaviours of Participants K and L who made use of the visualisation features of their respective web-based data sources to identify items that stood out, and Participant I who used the chart on an Excel-based data cube to identify areas of interest for further investigation. Figure 11: Current Selections