Upstill_thesis_2000

Consistency, Clarity & Control:
Development of a new approach to
WWW image retrieval
Trystan Upstill
A subthesis submitted in partial fulﬁllment of the degree of
Bachelor of Information Technology (Honours) at
The Department of Computer Science
Australian National University
November 2000

c Trystan Upstill
Typeset in Palatino by TEX and LATEX 2 .

Except where otherwise indicated, this thesis is my own original work.
Trystan Upstill
24 November 2000

Acknowledgements
I would like to thank the ANU for providing financial support for my honours year
through the Paul Thistlewaite memorial scholarship. Paul was an inspiring lecturer
and I am privileged to have received a scholarship in his honour.
Thanks to my supervisors, Raj Nagappan, Nick Craswell and Chris Johnson, for the
continual flow of great ideas and support throughout the year.
Thankyou AltaVista, for not banning my IP address, following my constant and un-
relenting barrage on your image search engine.
Thanks to the honours gang, Vij, Nige, Matt, Derek, Mick, Tom, Mel, Pete & Jason,1
for a fun and eventful time during a long and taxing year. I wish you all the best for
the future and hope to keep in touch.
Thanks to all those from 5263, Bodhi, Nick, Andy, Andy, Ben, Jake, Josh, Josh & Jonno,
for making my life marginally less 5263.
Thanks to my other fellow compatriots, Carla, Jenny, Fiona, Tam & Nils for constantly
reminding me what a geek I am, and reminding me that some members of the human
race are female.
Thanks to my family, Mum, Dad and Detts, who somehow managed to put up with
me all year. Your support during my education has been immeasurable and my
achievements owe a lot to you.
And finally, last but not least, thankyou Beth. Your tremendous support and under-
standing has allowed me to maintain a degree of sanity throughout the year — now
lets go to the beach.
1
Honourary Member
v

Abstract
The number of digital images is expanding rapidly and the World-Wide Web (WWW)
has become the predominant medium for their transferral. Consequently, there ex-
ists a requirement for effective WWW image retrieval. While several systems exist,
they lack the facility for expressive queries and provide an uninformative and non-
interactive grid interface.
This thesis surveys image retrieval techniques and identiﬁes three problem areas in
current systems: consistency, clarity and control. A novel WWW image retrieval ap-
proach is presented which addresses these problems. This approach incorporates
client-side image analysis, visualisation of results and an interactive interface. The
implementation of this approach, the VISR or Visualisation of Image Search Results tool
is then discussed and evaluated using new effectiveness measures.
VISR offers several improvements over current systems. Consistency is aided through
consistent image analysis and result visualisation. Clarity is improved through a vi-
sualisation, which makes it clear why images were returned and how they matched
the query. Control is improved by allowing users to specify expressive queries and
enhancing system interaction.
The new effectiveness measures include a measure of visualisation precision and vi-
sualisation entropy. The visualisation precision measure illustrates how VISR clusters
images more effectively than a thumbnail grid. The visualisation entropy measure
demonstrates the stability of VISR over changing data sets. In addition to these mea-
sures, a small user study is performed. It shows that the spring-based visualisation
metaphor, upon which VISR’s display is based, can generally be easily understood.
vii

Contents
Acknowledgements v
Abstract vii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Domain 5
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Information Need . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Query Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Document Analysis and Retrieval . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.1 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8 Result Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8.1 Linear Lists and Thumbnail Grids . . . . . . . . . . . . . . . . . . 15
2.8.1.1 Image Representation . . . . . . . . . . . . . . . . . . . . 19
2.8.2 Information Visualisations . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.2.1 Example Information Visualisation Systems . . . . . . . 21
2.9 Relevance Judgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9.1 Information Foraging . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Survey of Image Retrieval Techniques 25
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 WWW Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 WWW Image Retrieval Problems . . . . . . . . . . . . . . . . . . . 26
3.2.2 Differences between WWW Image Retrieval and Traditional Im-
age Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Lessons to Learn: Previous Approaches to Image Retrieval . . . . . . . . 28
3.3.1 Phase 1: Early Image Retrieval . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Phase 2: Expressive Query Languages . . . . . . . . . . . . . . . . 30
ix

x Contents
3.3.2.1 Content-Based Image Retrieval Systems . . . . . . . . . 32
3.3.2.2 Phase 2 Summary . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Phase 3: Scalability through the Combination of Techniques . . . 35
3.3.3.1 Text and Content-Based Image Retrieval Systems . . . . 37
3.3.3.2 Phase 3 Summary . . . . . . . . . . . . . . . . . . . . . . 37
3.3.4 Phase 4: Clarity through User Understanding and Interaction . . 38
3.3.4.1 Image Retrieval Information Visualisation Systems . . . 38
3.3.4.2 Phase 4 Summary . . . . . . . . . . . . . . . . . . . . . . 39
3.3.5 Other Approaches to WWW Image Retrieval . . . . . . . . . . . . 40
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Improving the WWW Image Searching Process 43
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Flexible Image Retrieval and Analysis Module . . . . . . . . . . . . . . . 46
4.3 Transparent Cluster Visualisation Module . . . . . . . . . . . . . . . . . . 46
4.4 Dynamic Query Modification Module . . . . . . . . . . . . . . . . . . . . 46
4.5 Proposed Solutions to Consistency, Clarity and Control . . . . . . . . . . 47
4.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5.2 Clarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5.3 Control: Inexpressive Query Language . . . . . . . . . . . . . . . 48
4.5.4 Control: Coarse Grained Interaction . . . . . . . . . . . . . . . . . 48
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 VISR 51
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Flexible Image Retrieval and Analysis Module . . . . . . . . . . . . . . . 55
5.2.1 Retrieval Plugin Manager . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.1.1 Retrieval Plugin Stack . . . . . . . . . . . . . . . . . . . . 55
5.2.2 Analysis Plugin Manager . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.2.1 Analysis Plugin Stack . . . . . . . . . . . . . . . . . . . . 55
5.2.3 Web Document Retriever . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.4 Adjustment Translator . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 Transparent Cluster Visualisation Module . . . . . . . . . . . . . . . . . . 60
5.3.1 Spring-based Image Position Calculator . . . . . . . . . . . . . . . 60
5.3.1.1 Vector Sum vs. Spring Metaphor . . . . . . . . . . . . . 60
5.3.2 Image Location Conflict Resolver . . . . . . . . . . . . . . . . . . . 63
5.3.2.1 Jittering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.2.2 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.3 Display Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Dynamic Query Modification Module . . . . . . . . . . . . . . . . . . . . 66
5.4.1 Process Query Term Addition . . . . . . . . . . . . . . . . . . . . . 66
5.4.2 Process Analysis Modifications . . . . . . . . . . . . . . . . . . . . 66
5.4.3 Process Filter Modifications . . . . . . . . . . . . . . . . . . . . . . 69
5.4.4 Process Query Term Location Modification . . . . . . . . . . . . . 69

Contents xi
5.4.5 Process Zoom Modiﬁcation . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Example Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.1 Example Query One: ”Eiffel ’Object Oriented’ Book” . . . . . . . 72
5.5.2 Example Query Two: ”Clown Circus Tent” . . . . . . . . . . . . . 75
5.5.3 Example Query Three: ”Soccer Fifa Fair Play Yellow” . . . . . . . 77
5.5.4 Example Query Four: ”’All Black’ Haka Rugby” . . . . . . . . . . 79
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Experiments & Results 83
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.1 Visualisation Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.2 Visualisation Precision . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2.3 User Study Framework . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 VISR Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.1 Visualisation Entropy Experiment . . . . . . . . . . . . . . . . . . 87
6.3.2 Visualisation Precision Experiments . . . . . . . . . . . . . . . . . 90
6.3.2.1 Most Relevant Cluster Evaluation . . . . . . . . . . . . . 90
6.3.2.2 Multiple Cluster Evaluation . . . . . . . . . . . . . . . . 92
6.3.3 Visualisation User Study . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.4 Combined Evidence Image Retrieval Experiments . . . . . . . . . 97
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 Discussion 101
7.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2 Clarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.3.1 Inexpressive Query Language . . . . . . . . . . . . . . . . . . . . 103
7.3.2 Coarse Grained Interaction . . . . . . . . . . . . . . . . . . . . . . 103
8 Conclusion 105
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.2.1 Further Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A Example Information Visualisation Systems 109
A.1 Spring-based Information Visualisations . . . . . . . . . . . . . . . . . . . 109
A.2 Venn-diagram based Information Visualisations . . . . . . . . . . . . . . 111
A.3 Terrain-based Information Visualisations . . . . . . . . . . . . . . . . . . 112
A.4 Other Information Visualisations . . . . . . . . . . . . . . . . . . . . . . . 112
B Numerical Test Results 115
B.1 Visualisation Entropy Test Results . . . . . . . . . . . . . . . . . . . . . . 115
B.2 Visualisation User Study Test Results . . . . . . . . . . . . . . . . . . . . . 116
B.3 Multiple Cluster Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

xii Contents
C Sample Visualisation User Study 121
Bibliography 129

Chapter 1
Introduction
“What information consumes is rather obvious: it consumes the attention of its
recipients. Hence a wealth of information creates a poverty of attention, and a
need to allocate that attention efficiently among the overabundance of information
sources that might consume it.”
– H.A Simon
1.1 Motivation
Recently, there has been a huge increase in the number of images available on-line.
This can be attributed, in part, to the popularity of digital imaging technologies and
the growing importance of the World-Wide Web in today’s society. The WWW pro-
vides a platform for users to share millions of files with a global audience. Further-
more, digital imaging is becoming widespread through burgeoning consumer usage
of digital cameras, scanners and clip-art libraries [16]. As a consequence of these de-
velopments, there has been a surge of interest in new methods for the archiving and
retrieval of digital images.
While retrieving text documents presents its own problems, finding and retrieving
images adds a layer of complexity. The image retrieval process is hindered by dif-
ficulties involved with image description. When outlining image needs, users may
provide subjective, associative1 or incomplete descriptions. For example figure 1.1
may be described objectively as “a cat”, or “a cat with a bird on its head”. It could be
described bibliographically, as “Paul Klee”, the painter. Alternatively, it could be de-
scribed subjectively as “a happy colourful picture” or “a naughty cat”. It could also be
described associatively as “find the bird” or “the new cat-food commercial”. Each of
these queries arguably provide equally valid image descriptions. However, generally
Web page authors, when describing images, provide just a few of the permutations
describing image content.
1
describing an action portrayed by the image, rather than image content
1

2 Introduction
Figure 1.1: Example Image: “cat and bird” by Paul Klee.
Current commercial WWW image search engines provide a limited facility for image
retrieval. These engines are based on existing document retrieval infrastructure, with
minor modiﬁcations to the underlying architecture. An example of a current approach
to WWW image retrieval is the AltaVista [3] image search engine. AltaVista incorpo-
rates a text-based image search, allowing users to enter textual criteria for an image.
The retrieved results are then displayed in a thumbnail grid as shown in ﬁgure 1.2.
However, there is scope for improvement. Current WWW image retrieval systems
are limited to using textual descriptions of image content to retrieve images, with no
capabilities for retrieving images using visual features. Further, the image search re-
sults are presented in an uninformative and non-interactive thumbnail grid.
Figure 1.2: Altavista example grid. For the query “Trystan Upstill”.

Ü1.2 Approach 3
1.2 Approach
This dissertation presents a new approach to resolve weaknesses observed in current
WWW image retrieval systems. This new approach is implemented in the VISR (Vi-
sualisation of Image Search Results) tool.
A survey of current image retrieval systems reveals three key problem areas: consis-
tency, clarity and control. This thesis aims to ﬁnd solutions to these problems through
a new architecture:
¯ consistency: through client-side image analysis and result visualisation.
¯ clarity: through a visualisation, which makes it clear why images were returned
and how they matched the query.
¯ control: by allowing users to specify expressive queries and enhancing system
interaction.
Using new effectiveness measures, the resulting architecture is compared against tra-
ditional approaches to WWW image retrieval.
1.3 Contribution
This thesis contributes knowledge to several domains: WWW information retrieval,
image retrieval, information visualisation and information foraging.
Contributions are made through:
1. The identiﬁcation of the problem areas of consistency, clarity and control, from
current literature.
2. The creation of a new approach to WWW image retrieval and an effectiveness
comparison with the existing approach.
3. The implementation of a tool based on the new approach, VISR.
4. The proposal of two new evaluation measures: visualisation precision and visu-
alisation entropy.
5. The analysis of the VISR tool with respect to consistency, clarity and control and
the effectiveness measures.
1.4 Organisation
Chapter 2 introduces the domain of information retrieval. A framework that describes
traditional information retrieval is presented. A glossary of terms is provided.

4 Introduction
Chapter 3 presents a survey of current image retrieval systems. It contains an overview
of WWW image retrieval problems organised into logical phases.
Chapter 4 outlines novel modiﬁcations to the information retrieval process model.
This chapter introduces new system modules, their purposes and how they address
limitations outlined in chapter 3.
Chapter 5 describes the VISR tool. Example use cases are explored.
Chapter 6 presents evaluation criteria to measure the effectiveness of the VISR tool.
New evaluation techniques are presented, and an evaluation of system effectiveness
is performed.
Chapter 7 discusses the implications of the experimental results in Chapter 6 with
respect to WWW image retrieval problems.
Chapter 8 contains the conclusion. Contributions are described and future work is
proposed.
Appendix A contains a discussion of surveyed information visualisation systems.
Appendix B provides tables containing the full numerical results from the experi-
ments performed.
Appendix C contains a sample user study, used during the evaluation of the VISR
tool.

Chapter 2
Domain
“To look backward for a while is to refresh the eye, to restore it, and to render it
more fit for its prime function of looking forward. ”
– Margaret Fairless Barber
2.1 Overview
This dissertation is based in the domain of information retrieval. The process of com-
puter based information retrieval is complex and has been the focus of much research
over the last 50 years. This chapter contains a summary of this research as it relates to
this thesis, and a conceptual framework for the analysis of the information retrieval
process.
2.2 Glossary of Terms
document: any form of stored encapsulated data.
user: a person wishing to retrieve documents.
expert user: a professional information retriever wishing to retrieve documents (e.g.
a librarian).
visualisation: is the process of representing data graphically.
Information Visualisation: is the visualisation of document information.
cognitive process: is thinking or conscious mental processing in a user. It relates
specifically to our ability to think, learn and comprehend.
information need: the requirement to find information in response to a current prob-
lem [35].
query: an articulation of an information need [35].
Information Retrieval: the process of finding and presenting documents deduced
from a query.
5

6 Domain
relevance: user’s judgement of satisfaction of an information need.
match: system concept of document-query similarity.
professional description: a well described document, with thorough, complete and
correct textual meta-data.
layperson description: a non-professionally described document, potentially sub-
jective, incomplete or incorrect, this can be attributed to a lack of knowledge of
the retrieval process.
Information Foraging: a theory developed to understand the usage of strategies
and technologies for information seeking, gathering, and consumption in a fluid
information environment [51]. See section 2.9.1 for a concrete description.
recall: is the proportion of all relevant documents that are retrieved.
precision: is the proportion of all documents retrieved that are relevant.
clustering: is partitioning data into a number of groups in which each group collects
together elements with similar properties [18].
image: a document containing visual information.
image data: is the actual image.
image meta-data: is text which is associated with an image.
2.3 Information Retrieval
This thesis’ depiction of the traditional information retrieval model is given in figure 2.1.
In the initial stage of the retrieval process, the user has some information need. The user
then formalises this information need, through query creation. The query is submitted
to the system for query processing, where it is parsed by the system to deduce the doc-
ument requirements. Document index analysis and retrieval then begins, with the goal
of retrieving documents of relevance to the query. The documents are subsequently
presented to the user in a result visualisation, aiming to facilitate user identification of
relevant documents. The user then performs a relevance judgment as to whether the
retrieved document collection contains relevant documents. If the user’s information
need is satisfied, the retrieval process is finished. Conversely, if the user is not satis-
fied with the retrieved document collection, they may refine their original information
need, and the entire process is re-executed.

Ü2.3 Information Retrieval 7
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
document
links and
ranking
requirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.1: The traditional information retrieval process. The information ﬂow, depicted
by directed lines, describes communication between system and user processes. System pro-
cesses are operations performed by the information retrieval system. User processes are the
user’s cognitive operations during information retrieval.

8 Domain
2.4 Information Need
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
information flow
query
creation
satisfaction
m
easureinform
ation
need
expression
Figure 2.2: Information Need Analysis.
An information need occurs when a user desires information. To characterise poten-
tial information needs, we must appreciate why users are searching for documents,
what use they are making of these documents and how they make decisions on which
documents are relevant [16].
This thesis identifies several example information needs:
Specific need (answer or document): where one result will do.
Spread of documents: a collection of documents related to a specific purpose.
All documents in an area: a collection of all documents that match the criteria.
Clip need: a less specific need, where users desire a document that somehow relates
to a passage of text.
Specific needs
Example: ‘I want a map of Sydney’
In this situation a single comprehensive map of Sydney will do. If the retrieval en-
gine is accurate, the first document will fulfill the information need. Therefore, the
emphasis is on having the correct answer as the first retrieved result — high precision
at position 1.

Ü2.5 Query Creation 9
Spread of Documents
Example: ‘I want some Sydney attractions’
In this situation the user desires a collection of Sydney attractions, potentially in clus-
tered groups for quick browsing. The emphasis is on both high recall, to try and
present the user with all Sydney attractions, and clustering, to relate similar images.
All documents in an area
Example: ‘Give me all your documents concerning the Sydney Opera House’
In this situation the user wants the entire collection of documents containing the Syd-
ney Opera House. The emphasis in this case is on high recall, potentially sacriﬁcing
precision.
Clip need
Example: ‘I want a picture for my story about Sydney Opera House being a model anti-racism
employer’
In this situation the user desires something to do with the Sydney Opera House and
race issues as an insert for their story. In this case, users are not necessarily interested
in relevance, but rather fringe documents that may catch a reader’s eye.
2.5 Query Creation
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.3: Query Creation.

10 Domain
Following the formation of an information need, the user must express this need as a
query. A query may contain several query terms, where each term represents criteria
for the target documents. Web search engine users generally do not provide detailed
queries, with average queries containing 2.4 terms [30].
If a user is looking for documents regarding petroleum reﬁning on the Falkland Is-
lands, they may express their information need as:
Falkland Islands petrol
While an expert user may have a better understanding of how the retrieval system
works and thus express their query as:
+“Falkland Islands” petroleum oil reﬁning
The query processing must take these factors into account and cater to both groups of
users.
2.6 Query Processing
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.4: Query Processing.
System query processing is the parsing and encoding of a user’s query into a system-
compatible form. At this stage, common words may be stripped out and the query
expanded, adding term synonyms.

Ü2.7 Document Analysis and Retrieval 11
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
information flow
query
creation
satisfaction
m
easureinform
ation
need
expression
Figure 2.5: Document Analysis and Retrieval.
2.7 Document Analysis and Retrieval
Document Analysis and Retrieval is the stage at which the user’s query is compared
against the document collection index. It is typically the most computationally expen-
sive stage in the information retrieval process.
Common words, termed stopwords, may be removed prior to document indexing or
matching. Since stopwords occur in a large percentage of documents they are poor
discriminators, with little ability to differentiate documents in the collection. Fol-
lowing stopword elimination, document terms may be collapsed using stemming or
thesauri. These techniques are used to minimise the size of the document collection
index, and allow for the querying of all conjugates and synonyms of a term.
The terms are then indexed according to their frequencies both in the query and the
entire document collection. The two statistics most commonly stored in the docu-
ment collection index are Term Frequency and Document Frequency. Term Frequency
is a measure of the number of times a term appears in a document, while Document
Frequency measures the number of indexed documents containing a term.
2.7.1 Ranking
The vector space model is the ranking model of concern in this thesis. The vector
space is deﬁned by basis vectors which represent all possible terms. Documents and
queries are then represented by vectors in this space.

12 Domain
For example, if we have three very short documents:
Document 1: ‘Robot dogs’
Document 2: ‘Robot dog ankle-biting’
Document 3: ‘Subdued robot dogs’
Using the basis vectors:
‘Robot dog’ [1, 0, 0]
‘ankle-biting’ [0, 1, 0]
‘Subdued’ [0, 0, 1]
We can create three document vectors weighted by term frequency:
Document 1 = [1, 0, 0]
Document 2 = [1, 1, 0]
Document 3 = [1, 0, 1]
The vector space for these documents is depicted in ﬁgure 2.6.
robot dog
ankle-biting
subdued
document 1
docum
ent 2
docum
ent 3
Figure 2.6: Unweighted Vector Space. Since document 1 only contains “robot dog”, its vector
lies on the “robot dog” axes. Document 2 contains both “robot dog” and “ankle-biting”, as
such its vector lies between those axes. Document 3 contains “subdued” and “robot dog”, its
vector lies between those axes.
The alternative TF/DF weighting of the vectors space is:
Document 1 = [1/3, 0 , 0]
Document 2 = [1/3, 1/1, 0]
Document 3 = [1/3, 0 , 1/1]

Ü2.7 Document Analysis and Retrieval 13
robot dog
ankle-biting
subdued
document 1
document2
document3
Figure 2.7: TF/DF weighted Vector Space. This differs from ﬁgure 2.6 by using document
term frequencies to weight vector attraction. Since document 1 only contains “robot dog”, its
vector lies on the “robot dog” axes. Document 2 contains both “robot dog” and “ankle-biting”;
“ankle-biting” only appears in one document while “robot dog” appears in all three. This
results in the document vector having a higher attraction to the “ankle-biting” axes. Likewise,
document 3 contains “subdued” and “robot dog”, where “subdued” is less common than
“robot dog”, so its vector has a higher attraction to subdued.

14 Domain
The TF/DF weighted vector space for these documents is depicted in figure 2.7.
In the vector space model, document similarity is measured by calculating the degree
of separation between documents. The degree of separation is measured by calculat-
ing the angle difference, usually using the cosine rule. In these calculations a smaller
angle implies a higher degree of relevance. As such, similar documents are co-located
in the space, as shown in figure 2.8. Conceptually this leads to a clustering of inter-
related documents in the vector space [55].
document 3
sourcedocument
document1
document 2
basis vector 1
basisvector2
Figure 2.8: Vector Space Document Similarity Ranking. The vector space model implies that
document 1 is the most similar to the source document, while document 2 is the next most
similar, and document 3 the least. When querying a vector space model, the query becomes
the source document vector and documents with similar vectors are retrieved.
It is also possible not to generate basis vectors directly from all unique document
terms. Documents can be indexed according to a small number of basis vectors. This
is an application of synonym matching, but where partial synonyms are admitted. An
example of this is to index document 2 on the basis vectors ‘Irritating’ and ‘Friendly’,
as is depicted in figure 2.9.
One of the difficulties involved in vector space ranking is that it can be unclear which
terms matched the document and the extent of the matching. In image retrieval this
drawback, combined with the fact that images are associated with potentially arbi-
trary text, can lead to user confusion regarding why images were retrieved, see section 3.2.1.

Ü2.8 Result Visualisation 15
Friendly
Irritating
"robot dog"
ankle-biting
document2
Figure 2.9: Vector Space with basis vectors ‘Friendly’ and ‘Irritating’. In the example in
ﬁgure 2.9, prior to the ranking we know that “robot dog”s are moderately friendly and ankle-
biting is extremely irritating. Query terms are ranked in the vector space against partial syn-
onyms.
Other Models
Other models, which are not within the scope of this thesis are thoroughly described
in general information retrieval literature [55, 5, 20, 35]. These include Boolean, Ex-
tended Boolean and Probabilistic models.
2.8 Result Visualisation
Result visualisation in information retrieval is often overlooked in favour of improv-
ing document analysis and retrieval techniques. It is, however, an integral part of the
information retrieval process [7]. Information retrieval systems typically use linear list
result visualisations.
2.8.1 Linear Lists and Thumbnail Grids
Linear lists present a sorted list of retrieved documents ranked from most to least
matching. Thumbnail grids are often used for viewing retrieved image collections.
Thumbnail grids are linear lists split horizontally between rows, a process which is
analogous to words wrapping on a page of text . This representation is used to max-
imise screen real-estate. Images positioned horizontally next to each other are adjacent
in the ranking, while vertically adjacent images are separated by N ranks (where N
is the width of the grid). Thus, although the grid is a two dimensional construct,
thumbnail grids only represent a single dimension — the system’s ranking of images.

16 Domain
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.10: Result Visualisation.
Later it is shown that having no relationship between sequential images, and no query
transparency causes problems in current image retrieval systems 3.2.1.
To further maximise screen real-estate, zooming image browsers can be used. Combs
and Bederson’s [12] zooming image browser incorporates a thumbnail grid with a
large number of images at a low resolution. Users select interesting areas of the grid
and zoom in to ﬁnd relevant images. The zooming image browser did not outperform
other image browsers in evaluation. Frequently users selected incorrect images at the
highest level of zoom. Users were not prepared to zoom in to verify selections and
incur a zooming time penalty.
When using a vector space model with a thumbnail grid visualisation, vector evidence
is discarded. Figure 2.11 depicts a hypothetical thumbnail grid retrieved by an image
retrieval engine for the query “clown, circus, tent”. In this grid, black images are pic-
tures of “circus clown”s, dark grey images are pictures of “circus tent”s and light grey
images with borders are pictures of “clown tent”s. Figure 2.12 depicts the vector space
from which the images were taken. There are three clusters, each containing multiple
images, located at angles of equal distance from the query vector. When compressing
this evidence the ranking algorithm selects images in order of their proximity until
the linear list is full. This discards image vector details, and leads to a thumbnail grid
where similar images are not adjacent.

Figure 2.11: Example image grid. This example image grid is generated for the query “clown;
circus; tent”. Black images contain pictures of “circus clown”s, dark grey images contain
pictures of “circus tent”s and light grey bordered images contain pictures of “clown tent”s.
Similar images are not adjacent in the thumbnail grid.

18 Domain
Relevant
Image Set 2
Relevant
Image Set 1
Relevant
Image Set 3
1
2
3
Desired
Im
ages
angles 1 = 2 = 3
clown
circus
tent
Figure 2.12: Vector space for example images. This vector space corresponds to the image
grid in ﬁgure 2.11. The image collection 1 contains the black images, image collection 2 con-
tains the dark grey images and image collection 3 contains the light grey bordered images.
This vector evidence is lost when compressing the ranking into a grid.

2.8.1.1 Image Representation
Humans process objects and shapes at a much greater speed than text. Exploitation
of this capability can facilitate the identification of relevant images. Further, when
presenting images for inspection there is no substitute for the images themselves. As
such, it is important, when using an information visualisation for image search results,
to summarise images using their thumbnails.
2.8.2 Information Visualisations
Information visualisations are intended to strengthen the relationship between the
user and the system during the information retrieval process. They attempt to over-
come the limitations of linear rankings by providing further attributes to facilitate user
determination of relevant documents.
As cited by Stuart Card in 1996, ‘If Information access is a “killer app” for the 1990s [and
2000s] Information Visualisation will play an important role in its success”.
The traditional information retrieval process model, figure 2.1, is revised for informa-
tion visualisation. The model of information retrieval adapted for information visu-
alisation, is shown in figure 2.13. This model creates a new loop between the result
visualisation, relevance judgement and query creation. This enables users to swiftly
refine their query and receive immediate feedback from the result visualisation. This
new interaction loop can provide improved clarity and system-user interaction during
searching.
Displaying Multi-dimensional data
When representing multi-dimensional data, such as search results, it is desirable to
maximise the data dimensions displayed without confusing the user. Typically, vi-
sualisations are required to handle over three dimensions of data. This requires the
flattening of the data to a two or three dimensional graphical display.
The LyberWorld system [25] suggests that information visualisations created prior to
its inception, in 1994, were ‘limited’ to 2D graphics, as computer graphics systems
could not cope with 3D graphics. Hemmje argued that 3D graphics allow for “the
highest degree of freedom to visually communicate information” and that such vi-
sualisations are “highly demanded”. Indeed, recent research into visualisation has
adopted the development of 3D interfaces. However, problems have arisen from this
practice. This is due, in part, to the requirement that users have the spatial abilities
required to interpret a 3D system. Another drawback, is the user’s inability to view
the entire visualisation at once — the graphics at the front of the visualisation often
obscures the data at the back.
NIST [58] recently conducted a study into the time it takes users to retrieve documents

20 Domain
query
processing
document
analysis and
retrieval
result
visualisation
information
need
formalisedquery
relevance
judgement
datasetinformation
refinements
requirements
system processes
information flow
query
creation
satisfaction
m
easure
query
expression
query refinement
new information flow
detailed
document
analysis
Figure 2.13: Information Visualisation Modifications to Traditional Information Retrieval.
This diagram shows the modifications to the traditional information retrieval process used in
information visualisations. A new loop is added to allow users to refine or query the visuali-
sation, thereby avoiding a re-execution of the entire retrieval process.

from equivalent text, 2D and 3D systems. Results from this experiment illustrate that
there is a significant learning curve for users starting with a 3D interface. During the
experiment the 3D interface proved the slowest method for users accessing the data.
Swan et al. [63] also had problems with their 3D interface, citing that “[they] found
no evidence of usefulness for the[ir] 3-D visualisation”. The argument for and against
the use of 3 dimensions in information visualisations is not within the scope of this
thesis.
Interactive Interfaces
A dynamic visualisation interface can be used to aid in the comprehension of the in-
formation presented in a visualisation. Dynamic Queries and Filters are two ways of
achieving such an interface.
Dynamic Queries [1, 69] allow users to change parameters in a visualisation, with
immediate updates to reflect the changes. This direct-manipulation interface to queries
can be seen as an adoption of the WYSIWYG (What you see is what you get) model,
where a tight coupling between user action and displayed documents exist.
Filters are similar to Dynamic Queries; they allow users to provide extra document
criteria to the information visualisation. Documents that fulfill the criteria are then
highlighted.
2.8.2.1 Example Information Visualisation Systems
While there are many differing information visualisations for information retrieval
results, there are three prominent models: spring-based, Venn-based and terrain map
based. These models are described below.
Spring-based models separate documents using document discriminators [14]. Each
discriminator is attached to documents by springs which attract matching documents
— the degree of attraction is proportional to the degree of match. This clusters the
documents according to common discriminators. In this model the dimensions are
compressed using springs, with each spring representing a dimension. An in-depth
description of spring-based models is given is section 5.3.1. An example is shown
in figure 2.14. Systems that use this model include the VIBE system [49, 15, 36, 23],
WebVIBE [45, 43, 44], LyberWorld [25, 24], Bead [9] and Mitre [33]. A survey of these
visualisations is provided in appendix A.1.
Venn-based models are a class of information visualisations that allow users to in-
terpret or provide Boolean queries and results. In this model, the dimensions are
compressed using Venn diagram set relationships. Systems that use this model in-
clude InfoCrystal [61] and VQuery [31]. A survey of these visualisations is provided
in appendix A.2.

22 Domain
Terrain map models are information visualisations that illustrate the structure of the
document collection by showing different types of geography on a map. These visu-
alisations are based on Kohonen’s feature map algorithm [54]. Dimensions are com-
pressed into map features such as mountain ranges and valleys. An example visual-
isation is shown in figure 2.15. Two systems that use this model are: SOM [38] and
ThemeScapes [42]. A survey of these visualisations is provided in appendix A.3.
Other information visualisation models also exist:
¯ Clustering Models: depict relationships between clusters of documents [58, 13].
¯ Histographic Models: seek to visualise a large number of document attributes at
once [22, 68, 67].
¯ Graphical Plot Models: allow for a comparison of two document attributes [47,
62].
Systems that illustrate these visualisation properties can be found in the appendix A.4.
Figure 2.14: Spring-based Example: The VIBE System. In this example VIBE is being used
to visualise the “president; europe; student; children; economy” query. Documents are rep-
resented by different sized rectangles, with high concentration clusters in the visualisation
represented by large rectangles.
2.9 Relevance Judgements
Only a user can judge the relevance of images in the retrieved document collection.
Document Analysis and Retrieval systems do not understand relevance, only match-
ing documents to a request. Therefore, the final stage of information retrieval is the
cognitive user process of discovering relevant documents in the retrieved document
collection. The cognitive knowledge derived from searching through the retrieved
document collection for relevant documents can lead to a refinement of the visual-
isation, or to a refinement of the original information need. This demonstrates the

Ü2.9 Relevance Judgements 23
Figure 2.15: Terrain Map Example: The ThemeScapes system. In this example ThemeScapes
is being used to generate the geography of a document collection. The peaks represent topics
contained in many documents. Conversely, valleys represent topics contained in only a few
documents
iterative nature of information retrieval — the process is repeated until the user is sat-
isﬁed with the retrieved document collection.
Information foraging theory, developed by Pirolli et al. [50, 51], is a new approach
to examining the synergy between a user and a visualisation during relevance judge-
ment.
2.9.1 Information Foraging
Humans display foraging behaviour when looking for information. Information for-
aging behaviour is used to the study how users invest time to retrieve information.
Information foraging theory suggests that information foraging is analogous to food
foraging. The optimal information forager is the forager that achieves the best ratio of
beneﬁts to cost [51]. Thus, it is important to allow the user to allocate their time to the
most relevant documents [50].
Foraging activity is broken up into two types of interaction: within-patch and between-
patch. Patches are sources of co-related information. Conceptually patches could be
piles of papers on a desk or clustered collections of documents. Between-patch anal-
ysis examines how users navigate from one source of information to another, while
within-patch analysis examines how users maximize the use of relevant information
within a pile.

Chapter 3
Survey of Image Retrieval
Techniques
“Those who do not remember the past are condemned to repeat it.”
– George Santayana
3.1 Overview
Image retrieval is a specialisation of the information retrieval process, outlined in
chapter 2. This chapter presents a survey of current approaches to image retrieval.
This analysis enables an identification of core problems in current WWW image re-
trieval systems.
3.2 WWW Image Retrieval
Three of the large commercial WWW search engines; AltaVista, Yahoo and Lycos,
have recently introduced text-based image search engines. The following observa-
tions are based on direct experience with these engines.
¯ AltaVista [3] has developed the AltaVista Photo and Media Finder. This image re-
trieval engine provides a simple text-based interface (section 3.3.1) to an image
collection indexed from the general WWW community and AltaVista’s image
database partners. Their retrieval engine is based on the technology incorpo-
rated into their text document search engine. Modifications to this architecture
have been made to associate sections of Web page text to images, in order to
obtain image descriptions.
¯ Yahoo! [70] has developed the Image Surfer. This image retrieval engine contains
images categorised into a topic hierarchy. To retrieve images, users can navigate
this topic hierarchy, or perform find similar content-based (section 3.3.2) searches.
As with Yahoo!’s text document topic hierarchy, all images in the system are cat-
egorised manually. This reliance on image classification makes extensive WWW
image indexing intractable.
25

26 Survey of Image Retrieval Techniques
¯ Lycos [40] has incorporated image retrieval through a simple extension to their
text document retrieval engine. Following a user query, Lycos checks to see
whether retrieved pages contain image references. If so, the images are retrieved
and displayed to the user.
3.2.1 WWW Image Retrieval Problems
The WWW image retrieval problems have been grouped into three key areas: consis-
tency, clarity and control.
The citations in this section are to papers in the fields of image retrieval, information
visualisation and information foraging. The problems this thesis identifies in WWW
image retrieval are similar to problems in these fields.
¯ Consistency:
– System Heterogeneity
When executing a query over multiple search engines, or repeatedly over
the same search engine, users typically retrieve differing search results.
This is due to continual changes in the image collections and ranking al-
gorithms used. All WWW search engines use differing, confidential algo-
rithms to rank images. Further, these algorithms sometimes vary according
to image collection properties or system load. These continual changes can
lead to confusing inconsistencies in image search results.
– Unstructured and Uncoordinated Data
The image meta-data used by WWW image retrieval engines to perform
text-based image retrieval is unreliable. Most WWW meta-data is not pro-
fessionally described, and as such, may be incomplete, subjective or incor-
rect.
¯ Clarity:
– No Transparency
The linear result visualisations used by WWW image retrieval engines do
not transparently reveal why images are being retrieved [34, 28]. This limits
the user’s ability to refine their query expression. This situation is amplified
if the meta-data upon which the ranking takes place is misleading.
– No Relationships

Ü3.2 WWW Image Retrieval 27
– Reliance on Ranking Algorithms
WWW image retrieval systems incorporate confidential algorithms to com-
press multi-dimensional query-document relationship information (section
2.8.1) into a linear list. These algorithms are not well understood by users,
particularly algorithms that incorporate different types of evidence, e.g. a
combination of text and content analysis [2, 34, 28].
¯ Control:
– Inexpressive Query Language
£ Lack of Data Scalability
The large number of images indexed by WWW image retrieval engines
makes content-based image analysis techniques (section 3.3.2) difficult
to apply. Advanced image analysis techniques are computationally ex-
pensive to run. Further, the effectiveness of these algorithms declines
when used over a collection with a large breadth of content [56].
£ Lack of Expression
Existing infrastructure used by WWW search engines to perform im-
age retrieval provides a limited capacity for users to specify their pre-
cise image needs. Current systems allow only for text-based image
queries [2, 28].
– Coarse Grained Interaction:
£ Coarse Grained Interaction
In providing a search service over a high latency network, current
WWW image retrieval systems are limited to providing coarse grained
interaction. In current systems, users must submit a query, retrieve
results and then choose either to restate the query or perform a find
similar search. Searching is an iterative process, requiring continual re-
finement and feedback [28, 16]. These interfaces do not facilitate the
high degrees of user interaction required during the image retrieval
process.
£ Lack of Foraging Interaction
To enable effective information foraging, a result visualisation must al-
low users to locate patches of relevant information and then perform
detailed analysis of the information contained within a patch [51]. In
current WWW image retrieval engines, there is no grouping of like im-
ages, this prohibits any between patch foraging. Further there is no
way for users to view a subset of the retrieved information. Thus in-
formation foraging (see section 2.9.1) is not encouraged through the
visualisation.

3.2.2 Differences between WWW Image Retrieval and Traditional Image
Retrieval
There are several differences between image retrieval on the WWW and traditional
image retrieval systems. As opposed to WWW systems, in traditional systems:
¯ Consistency is a lesser concern
All systems incorporate an internally consistent matching algorithm, and re-
trieve images from a controlled image collection. Since a user interacting with
the system is always dealing with the same image matching tools, consistency
is a lesser concern.
¯ Quality descriptions are assured
As the retrieval system retrieves images from a controlled database, meta-data
quality is assured.
¯ No Communication Latencies
As the retrieval systems are generally co-located with the images and the user,
there is no penalty associated with search iterations.
3.3 Lessons to Learn: Previous Approaches to Image Retrieval
It is convenient for the analysis to group the progress of image retrieval into logical
phases. The phases of image retrieval development are shown in ﬁgure 3.1. Although
the progression is not entirely linear, the phases do represent distinct stages in the
evolution of image retrieval.
3.3.1 Phase 1: Early Image Retrieval
The earliest form of image retrieval is Text-Based Image Retrieval. These engines rely
solely on image meta-data to retrieve images, e.g. current WWW image search en-
gines [3, 40]. Traditional document retrieval techniques, such as vector space ranking,
are used to determine matching meta-data, and hence ﬁnd images. For more informa-
tion on database text-based image retrieval systems refer to [10].
Examples of text-based queries are:
‘Sydney Olympic Games’
‘Sir William Deane opening the Sydney Olympic Games’
‘Torch relay running in front of the ANU’
‘Happy Olympic Punters’
‘Pictures of Trystan Upstill, by the Honours Gang, taken during the Olympic Games’

Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 29
Phase 1:
Early Image
Retrieval
Phase 2:
Expressive Query
Languages
Phase 3:
Scalabilitythrough
the Combination of
Techniques
Phase 4:
Clarity through
User Understanding
and Interaction
Image Retrieval
Research
Phase 1: Can we
perform Image
Retrieval on the
World-Wide Web?
World-Wide Web
Image Retrieval
Phase 2: ?
Figure 3.1: The development of image retrieval This diagram shows the logical phases in
the information retrieval process. The section is structured according to these phases.
Although text-based image retrieval is the most primitive of all retrieval techniques,
it does posses useful traits. If professionally described image meta-data is available
during retrieval and analysis it can provide a comprehensive abstraction of a scene.
Additionally, since text-based image retrieval uses existing document retrieval tech-
niques, many different ranking and indexing models are already available. Further,
existing infrastructure can be used to perform image indexing and retrieval — an at-
tractive proposition for current WWW search engines.
Improvements
¯ Ability to Retrieve Images: provides a simple mechanism for image access and
retrieval.
Further Problems
¯ Consistency:
– Unstructured and Uncoordinated data: image retrieval effectiveness relies
on the quality of image descriptions [48]. Further, as it can be unclear which
sections of a WWW page are related to an image’s contents, problems arise
when trying to associate meta-data to images on WWW pages.
¯ Control:

– Inexpressive Query Language:
£ Lack of Expression: text-based querying may not allow the user to
specify a precise image need. There is no way to convey visual image
features to the image search engine.
3.3.2 Phase 2: Expressive Query Languages
Content-Based Image Retrieval enables users to specify graphical queries. The theory
behind its inception is that users have a precise mental picture of a desired image,
and as such, they should be able to accurately express this need [52]. Further, it is hy-
pothesised that this removed reliance on image meta-data minimises retrieval using
potentially incorrect, incomplete or subjective data.
Examples of content-based queries are:
Image properties: ‘Red Pictures’, ‘Pictures with this texture’
Image shapes: ‘Arched doorway’, ‘Shaped like an elephant’
Objects in image: ‘Pictures of elephants’, ‘Generic elephants’
Image sections: ‘Red section in top corner’, ‘Elephant shape in centre’
The six most frequently used query types in content-based image retrieval are:
Colour allows users to query an image’s global colour features. An example of
colour-based content querying is shown in ﬁgure 3.2. According to Rui et al.
[28], colour histograms are the most commonly used feature representation.
Other methods include Colour Sets which facilitate fast searching with an ap-
proximation to Histograms, and Colour Moments, to overcome the quantization
effects in Colour Histograms. To improve Colour Histograms, Ioka and Niblack
et al. provide methods for evaluating similar but not exact colours and Stricker
and Orengo propose cumulative colour histograms to reduce noise [28].
Texture is a visual pattern that approximates the appearance of a tactile surface. This
allows the user to specify whether an image appears rough and how much seg-
mentation there an image exhibits. An example of texture-based content query-
ing is shown in ﬁgure 3.3. According to Rui et al. [28], texture recognition can be
achieved using Haralick et al.’s co-occurrence matrix representations, Tamura et
al.’s computational approximations to visual texture properties or Simon and
Chang’s Wavelet transforms.
Colour Layout is advanced colour measurement, whereby users are given the ability
to show how colours are related to each other in a scene [48]. For example, a
query containing a gradient from orange to yellow could be used to retrieve a
sunset.

Figure 3.2: Example of a colour query match. This diagram demonstrates colour-based
content querying. In this case the user query is the text criteria“ﬁfa; fair; play; logo” and the
colour “yellow”.
Figure 3.3: Example of a texture query match. This diagram demonstrates texture-based
content querying. In this case the user desires more pictures on the same playing ﬁeld. The
grass texture is used to retrieve images from the same soccer match.

Shape allows users to query image shapes. An example of shape-based content
querying is shown in figure 3.4.
Figure 3.4: Example of a shape query match. This diagram demonstrates shape-based content
querying. In this case the user sketches a drawing containing a mountain.
Region-Based allows users to outline what types of properties they want in each area
of an image, thereby making the image analysis process recursive. An example
of simple region-based content querying is shown in figure 3.5.
Figure 3.5: Example of a region-based query match. This diagram demonstrates region based
content querying. In this case the user submits a query for an image containing trees on either
side of a mountain and a stream.
Object is a model where an object is deduced from a user supplied shape and an-
gle. This enables the retrieval of images that contain the specified shape in any
orientation.
3.3.2.1 Content-Based Image Retrieval Systems
QBIC (Query by Image Content)1 uses colour, shape and texture to match images
to user queries. The user can provide simple or advanced analytic criteria. Simple
criteria are requirements such as colour or texture, while advanced criteria can incor-
porate query-by-example, with “find more images like this”, or “find images like my
sketch”. To avoid difficulties involved in user descriptions of colours and textures
1
demo online at http://wwwqbic.almaden.ibm.com/cgi-bin/stamps-demo

QBIC contains a texture and colour library. This enables users to select colours, colour
distributions or choose desired textures as queries [19, 29].
NETRA allows users to navigate through categories of images. The query is refined
through a user selection of relevant image content properties. [16, 28, 41].
Excalibur is a query-by-example system. Users provide candidate images which are
matched using pattern recognition technology. Excalibur is a commercial application
development tool rather than a complete retrieval application. The Yahoo! web search
engine uses this technology to find similar images (section 3.2) [16, 28, 17].
Blobworld breaks images into blobs (see figure 3.6). By browsing a thumbnail grid
and specifying which blobs of images to keep, the user identifies blobs of interest and
areas of disinterest. This is used to refine the query [8, 66].
Figure 3.6: The Blobworld System. This screenshot from the Blobworld system illustrates the
process of picking relevant image blobs.
EPIC allows users to draw rectangles and label what they would like in each section
of the image, as shown in figure 3.7 [32].

Figure 3.7: The EPIC System. This screenshot illustrates the EPIC system’s query process.
Users describe their image need through labelled rectangles in the query window on the left.
ImageSearch allows users to place icons representing objects in regions of an im-
age. Users can also sketch pictures if they want a higher degree of control [37]. See
ﬁgure 3.8.
3.3.2.2 Phase 2 Summary
Improvements
¯ Consistency:
– Discard unstructured and uncoordinated data: since image meta-data
is never used to index or retrieve the images, problems relating to incom-
plete, incorrect or subjective descriptions are avoided. Further enrichment
is obtained through the ability to use content-based image analysis to query
many differing artifacts in an image.
¯ Control:
£ New Expression through Content-based Image Retrieval: through
the expressive nature of content-based image retrieval, more thorough
image criteria can be gained from the user. This provides the system
with more information with which to judge image relevance.
Further Problems
¯ Clarity:

Figure 3.8: The ImageSearch system. This screenshot illustrates the ImageSearch system’s
query process. The user positions icons symbolising what they would like in that region of an
image.
– Complex Interfaces: there is a comparatively large user cost incurred with
the creation of content-based queries. If users are required to produce a
sketch or an outline of the desired images, the time or skill required can
prove prohibitive.
¯ Control:
£ Content-based Image Retrieval algorithms do not scale well: content-
based image retrieval is less effective on large-breadth collections. Since
there are many deﬁnitions of similarity and discrimination, their power
degrades when using large breadth image collections as shown in ﬁg-
ure 3.9 [2, 28, 16]
3.3.3 Phase 3: Scalability through the Combination of Techniques
Bearing in mind the limitations of content-based image retrieval on large breadth im-
age collections, several systems have combined both text and content-based image
retrieval. It is hypothesized that content-based analysis can be used on larger image
collections when combined with text-based analysis. The rationale for this is that text-
based techniques can be used to specify a general abstraction of image contents, while
content-based criteria can be used to identify relevant images in the domain.

Figure 3.9: Misleading shape and texture . The ﬁrst image in this example is the query-by-
example image used as a content-based query. The other images in the grid were retrieved
through matching of shape, texture and colour (image from [56]).

3.3.3.1 Text and Content-Based Image Retrieval Systems
The combination of analysis techniques can either occur during initial query creation,
allowing users to initially specify both text and content-based image criteria, or after
retrieving a collection of images, allowing users to refine the image collection.
Text with Content Relevance Feedback: in these systems, the user initially provides
a text query. Using content-based image retrieval, they then tag relevant images
to retrieve more images like them.
Text and Content Searching: in these systems, both text and content retrieval occurs
at the same time. The user may express both text and content criteria in their
initial query.
Text with Content Relevance Feedback
Chabot, 2 developed by Ogle and Stonebraker, uses simplistic content and text anal-
ysis to retrieve images. Text criteria is used to retrieve an initial collection of images,
followed by content criteria to refine the image collection [48].
MARS is a system that learns from user interactions. The user begins by issuing a
text-based query, and then marks images in the retrieved thumbnail grid as either
relevant or irrelevant. The system uses these image judgements to find more relevant
images. The benefit of this approach is that it relieves the user from having to describe
desirable image features. Users only have to pick interesting image features [27].
Text and Content Searching
Virage incorporates plugin primitives that allow the system to be adapted to specific
image searching requirements. The Virage plugin creation engine is open-source,
therefore plugins can be created by end-users to suit their domain. The Virage en-
gine includes several “universal primitives” that perform colour, texture and shape
matching [16, 28].
Lu and Williams have incorporated both basic colour and text analysis into their im-
age retrieval system with encouraging results using a small database. One of their
major problems was in finding methods to combine evidence from colour and text
matching [39].
Improvements
2
This system has recently been renamed Cypress

¯ Consistency:
– Reduce effects of Unstructured and Uncoordinated data: the image meta-
data is only partially used to retrieve the images, with content-based image
retrieval used as a second criteria for the image analysis.
¯ Control:
£ Improved Expression: users can enter criteria for images through tex-
tual descriptions and visual appearance. Incorporating both text and
content-based image analysis allows for the consideration of all image
data during retrieval.
£ Improving the scalability of Content-based Image Retrieval: when
combining text-based analysis with content-based analysis, difficulties
involved in performing content-based image retrieval on large breadth
image collections are partially alleviated.
Further Problems
¯ Clarity:
– Reliance on Ranking Algorithms: combining rankings from several dif-
ferent types of analysis engines into a thumbnail grid can be difficult [2, 16,
4, 27].
– No Transparency: when using several analysis techniques it can be hard
for users to understand why images were matched. Without this evidence,
it may be difficult for users to ascertain faults in their query.
3.3.4 Phase 4: Clarity through User Understanding and Interaction
In response to the problems associated with the user understanding of retrieved im-
age collections, several systems have attempted to improve the clarity of the image re-
trieval process. These systems have incorporated information visualisations, outlined
in section 2.8.2, to convey image matching. It is in this light that phase 4 attempts to
improve system transparency, relationship maintenance and to reduce the reliance on
ranking algorithms.
3.3.4.1 Image Retrieval Information Visualisation Systems
The two projects examined in this section provide spring-based visualisations, similar
to the VIBE system in section A.1.
MageVIBE: uses a simplistic approach to image retrieval, implementing text-based
only querying of a medical database. Images in this visualisation are represented by
dots. The full image can be displayed by selecting a dot [36].

Figure 3.10: The ImageVIBE system. This screenshot illustrates the ImageVIBE visualisation
for a user query for an aeroplane in flight. Several modification query terms, such as vertical
and horizontal, are used to describe the orientation of the plane.
ImageVIBE: uses text-based and shape-based querying, but otherwise does not differ
from the original VIBE. ImageVIBE allows users to refine their text queries using con-
tent criteria, such as shapes, orientation and colour [11]. An ImageVIBE screenshot
depicting a search for an aircraft image is shown in figure 3.10.
There is yet to be any evaluation of the effectiveness of these systems.
Improvements
¯ Improved Transparency: providing a dimension for each aspect of the ranking,
enables users to deduce how the image matching occurred.
¯ Relationship Maintenance: the query term relationships between images are
maintained — images that are related to the same query terms, by the same
magnitude, are co-located.
¯ User Relevance Judgements: users select relevant images from the retrieved
image collection, rather than relying on a combination of evidence algorithm to
determine the best match.
Further Problems
¯ Complex Interfaces: systems must be simple. It has been shown that the tradi-
tional VIBE interface is too complex for general users [45, 43, 44].

3.3.5 Other Approaches to WWW Image Retrieval
The WWW has recently become the focus of phase 2 research in image retrieval. Two
such research systems are ImageRover and WebSEEK.
ImageRover is a system that spiders and indexes WWW images. A vector space
model of image features is created from the retrieved images [64, 57]. In this system
users browse topic hierarchies and can perform content-based ﬁnd similar searches.
The system has encountered index size and retrieval speed difﬁculties.
WebSEEK searches the Web for images and videos by extracting keywords from the
URL and associated image text, and generating a colour histogram. Category trees
are created using all rare keywords indexed in the system. Users can query the sys-
tem using colour requirements, providing keywords or by navigating a category tree
[59, 60].
3.4 Summary
Phase 1: Early Image
Retrieval
goal
search for images
problems
Unstructured +
Uncoordinated data
Lack of Expression
Phase 2: Expressive
Query Languages
problems
CBIR unscalable
Complex interfaces
Phase 1
Problems
Phase 3:Scalability
through technique
combination
Phase 2
Problems
goals
Phase 4: Clarity
through user
understanding
Phase 3
Problems
problems
problems
problems
goals
goals
transparency
combination of
evidence
Current WWW
Image Search Engines
goal
problems
search for images
WWW Retrieval
Issues
Chapter 4: Improving WWW
Image Retrieval
goals
complex
interfaces
(found in section 3.4.1)
Figure 3.11: Development of WWW Image Retrieval Problems. This diagram illustrates the
development of the WWW Image Retrieval problems as covered in this chapter. The problems
from each phase, and extra WWW retrieval issues must be addressed to create an effective
WWW image retrieval system.

Ü3.4 Summary 41
This chapter contained the development of the WWW image retrieval problems, as
shown in ﬁgure 3.11. The full list of problems requiring consideration during the
creation of a new approach to WWW image retrieval is then:
¯ Consistency:
– System Heterogeneity
– Unstructured and Uncoordinated Data
¯ Clarity:
– No Transparency
– No Relationships
– Reliance on Ranking Algorithms
¯ Control:
£ Lack of Expression
£ Lack of Data Scalability
– Coarse Grained Interaction:
£ Coarse Grained Interaction
£ Lack of Foraging Interaction
This chapter has provided a list of current WWW image retrieval problems and pre-
viously proposed solutions. These issues were decomposed into three key problems
areas of consistency, clarity and control. Following the identiﬁcation of these problems
a survey of previous image retrieval systems, sorted in logical phases of development
were presented. Each phase was viewed in the context of WWW image retrieval, and
how the phase dealt with the WWW image retrieval problems.
A new approach to WWW image retrieval is now presented. This approach attempts
to alleviate these problems to improve WWW image retrieval. In the chapter follow-
ing this discussion this thesis presents the VISR tool, an implementation of the new
approach to WWW image retrieval.

Chapter 4
Improving the WWW Image
Searching Process
“Although men flatter themselves with their great actions, they are not so often
the result of great design as of chance.”
– Francis, Duc de La Rochefoucauld: Maxim 57
4.1 Overview
Having outlined the conceptual framework for an information retrieval study in chap-
ter 2, and then presented a survey of image retrieval techniques in chapter 3, this thesis
now addresses the problem at hand — the creation of a new approach to WWW image
retrieval.
The traditional model of the information retrieval process, figure 2.1, must be revised
for the retrieval of images from the WWW. The new approach to WWW image re-
trieval is shown in figure 4.1.
Section a of figure 4.1 is the Flexible Image Retrieval and Analysis Module (section 4.2).
This module incorporates retrieval and analysis plugins used during image retrieval.
Section b of figure 4.1 is the Transparent Cluster Visualisation Module (section 4.3). A
visualisation is incorporated to facilitate user comprehension of the retrieved image
collection’s characteristics.
Section c of figure 4.1 is the Dynamic Querying Module (section 4.4). Through this
module the user is able to tweak their query and get immediate feedback from the
visualisation.
43

44 Improving the WWW Image Searching Process
Figure 4.1: Decomposition of Research Model of Information Retrieval. The new informa-
tion flows are depicted by dashed lines. This diagram can be compared with figure 2.1, the
traditional information retrieval process model. Section a of this diagram depicts the Flexible
Image Retrieval and Analysis Module. Section b depicts the Transparent Cluster Visualisation
Module. Section c depicts the Dynamic Query Modification Module.

Ü4.1 Overview 45
;;;;
;;;;
;;;;
;;;;;;;
;;;;;;;
;;;;;;;
query
processing
document
analysis and
retrieval
result
visualisation
information
need
formalquery
foraging for
Information
datasetinformation
refinements
requirements
system processes
information flow
query
creation
visualisation refinement
satisfaction
m
easure
query
expression
query refinement
G
plugin
analysis
enginesG
plugin
retrieval
engines
detailed
document
analysis
;;
;;
;;user cognitive area
server-side
client-side
Figure 4.2: Research Model with Process Locations. The flexible image retrieval and analysis
module resides on the client-side. To retrieve images, this module connects to several WWW
image search servers, via retrieval plugins, and downloads retrieved image collections. The
images are then pooled prior to analysis. This pool of images forms the image domain. The
transparent cluster visualisation and dynamic query modification modules also reside on the
client-side. This improves interaction available with current non-distributed visualisations,
where the whole information retrieval process has to be re-executed before the image collec-
tion is updated with user modifications.

4.2 Flexible Image Retrieval and Analysis Module
This module separates the retrieval and analysis responsibilities, thereby allowing for
more flexible and consistent image analysis.
This module resides on the client-side (see figure 4.2). A retrieval plugin is used to
retrieve an initial collection of images from a WWW image search engine. These im-
age are downloaded to the client machine and form the image domain. The image
domain is then analysed by user specified analysis plugins. This pluggable interface
allows for any number of specified retrieval or analysis engines to be used during the
image retrieval and analysis phase. For example, a collection of image meta-data and
image content analysis techniques may be provided.
The design of this module in the VISR tool implementation is provided in section 5.2.
4.3 Transparent Cluster Visualisation Module
This module visualises the relationships between retrieved images and their corre-
sponding search terms. This removes the requirement for the combination of evidence
by providing a transparent visualisation. Furthermore, to allow for easy identification
of images, thumbnails are used to provide image overviews. Users click on the thumb-
nails to view the full image. To alleviate visualisation latencies, this module resides
on the client-side (see figure 4.2).
Screenshots of the VISR transparent cluster visualisation are provided in section 5.5.
4.4 Dynamic Query Modification Module
The dynamic query module allows users to modify queries and immediately view the
resulting changes in the visualisation. This provides a facility for the re-weighting of
query terms, the tweaking of analysis parameters, the zooming of the visualisation
and the application of filters to the image collection.
Experiments have shown that users will only continue to forage for data if the search
continues to be profitable [51]. Thus it is important to have low latencies for query
modifications and system interaction. WWW image retrieval system interaction suf-
fers from high latencies. Distributing the system as shown in figure 4.2 provides lower
interaction latencies.

Ü4.5 Proposed Solutions to Consistency, Clarity and Control 47
4.5 Proposed Solutions to Consistency, Clarity and Control
4.5.1 Consistency
Current WWW search engines use varied ranking techniques on meta-data which is
often incomplete or incorrect. This can confuse users.
System Heterogeneity
The flexible image retrieval and analysis module provides a consistent well-understood
set of tools for image analysis. When results from these tools are incorporated into the
transparent cluster visualisation, images are always displayed in the same manner.
This implies that if two search engines returned the same image, the images would be
co-located in the display.
Unstructured and Uncoordinated data
The flexible image retrieval and analysis module does not accommodate noisy meta-
data. It does, however, deal with it in a consistent fashion. The use of consistent
plugins and the transparent cluster visualisation may allow for swift identification of
noise in the image collection.
4.5.2 Clarity
Current WWW search engines provide thumbnail grid result visualisations. Thumb-
nail grids do not express why images were retrieved or how retrieved images are
related and thereby make it harder to find relevant images [34, 15].
No Transparency
The transparent cluster visualisation facilitates user understanding of why images are
retrieved and which query terms matched which documents. This assists the user in
deciphering the rationale for the retrieved image collection and avoids user frustra-
tion by facilitating the “what to do next” decision. A key issue in image retrieval is how
images are perceived by users [28]. Educating users about the retrieval process assists
them to understand how the system is matching their queries, and thereby how they
should form and refine their queries.
No Relationships
The maintenance of image relationships enables the clustering of related images. This
allows users to find similar images quickly.
Reliance on Ranking Algorithms
The maintenance of per-term ranking information, reduces the reliance on ranking
algorithms. When using the transparent cluster visualisation there is no combination
of evidence except in the search engine, which is only required to derive an initial
quality rating, either matching or not so.

4.5.3 Control: Inexpressive Query Language
Current WWW search engines limit the user’s ability to specify their exact image need.
For example, because image analysis is costly, most systems do not allow users to
specify image content criteria. Further, a reduction of effectiveness is observed during
the scaling of these techniques across large breadth collections [56].
Lack of Expression
The client-side distribution of the analysis task in the flexible retrieval and analysis
module reduces WWW search engine analysis costs. Through the use of the image
domain, expensive content-based image retrieval techniques and other analysis is per-
formed over a smaller image collection. Further, the use of these techniques does not
require modifications to the underlying WWW search engine infrastructure.
Lack of Data Scalability
In the proposed flexible analysis module, the user is able to nominate several analysis
techniques that operate concurrently during image matching. Through third-party
analysis plugins, users can perform any type of analysis.
4.5.4 Control: Coarse Grained Interaction
Current WWW search engines provide non-interactive interfaces to the retrieval pro-
cess. This provides users with minimal insight into how the retrieval process occurs
and renders them unable to focus a search on an interesting area of the result visuali-
sation.
Coarse Grained Interaction
New modes of interaction and lower latencies are achieved through the use of client-
side analysis, visualisation and interface. When interacting with the dynamic query
modification module the user’s changes are reflected immediately in the visualisation.
All tasks that do not require new documents to be retrieved are completed with low
latencies. Thus, features such as dynamic filters, query re-weighting and zooming can
be implemented effectively.
Lack of Foraging Interaction
Foraging interaction is encouraged though the transparent cluster visualisation’s abil-
ity to cluster and zoom. Between-patch foraging is aided through the grouping of
similar images. Within-patch foraging is facilitated through the ability to examine a
single cluster in greater detail. Through zooming, users are able to perform a more
thorough investigation of the images contained within a cluster. An example of this
practice is shown in figure 4.3.

Ü4.6 Summary 49
r r
between-patch scanning identifies relevant patch within-patch scanning identifies relevant image
Figure 4.3: Foraging Concentration.. The user scans all clusters of images to locate the rel-
evant image cluster. In this case the black, light grey and dark grey squares are all checked
for relevance. This process is termed between-patch foraging. Following the selection of a po-
tentially relevant patch, the user begins within-patch foraging. This is shown in the zoomed
window. Through within-patch foraging the user is able to locate the relevant image.
4.6 Summary
This chapter proposed a new approach to WWW image retrieval. Using the frame-
work outlined in chapter 2, solutions were proposed to the image retrieval problems
identified in chapter 3. These solutions shape the new approach to WWW image
retrieval. The new approach contained three theoretical modules: flexible image re-
trieval and analysis, transparent cluster visualisation and the dynamic query modifi-
cation. The flexible image retrieval and analysis module provided a new mechanism
for comprehensive, extensible image retrieval on the WWW. The transparent cluster
visualisation provided a new approach to visualising retrieved document collections.
The dynamic query modification module provides new mechanisms for user inter-
action during the retrieval process. Following the description of these modules this
section presented theoretical evidence to support the use of these modules to alleviate
the WWW image retrieval problems.
The next chapters cover the implementation of these modules in the VISR tool and
effectiveness evaluation experiments.

Chapter 5
VISR
“Always design a thing by considering it in its next larger context — a chair in
a room, a room in a house, a house in an environment, an environment in a city
plan.”
– Eliel Saarinen
5.1 Overview
This chapter introduces the architecture of the VISR tool. The three conceptual mod-
ules, described in chapter 4 are now implemented. This chapter is broken down into
the design of each of these modules: the flexible image retrieval and analysis mod-
ule is section 5.2, the transparent cluster visualisation module is section 5.3 and the
dynamic query modification module is section 5.4. Following the description of the
module designs, a series of use cases demonstrate the functionality of the VISR tool.
The figures in this chapter follow the conventions outlined in the diagrams below.
Figure 5.1 is the legend for the information flow diagrams and figure 5.2 is the legend
for the state transition diagrams.
implemented
module
optional
module
data
store
data
flow
internal
operation
multiple
operations
Figure 5.1: Information Flow Diagram Legend.
51

52 VISR
string internal state
state change
string external state
Figure 5.2: State Transition Diagram Legend.
The information flow of the VISR tool is shown in figure 5.3, while the state transition
diagram, figure 5.4, describes the flow of system execution.

Ü5.1 Overview 53
Flexible Image
Retrieval and
Analysis Module
(section 5.2)
Query Processor
Transparent Cluster
Visualisation
Module
(section 5.3)
Dynamic
Query Module
(section 5.4)
query
queryterms
document analysis
data
User
WWW Search
Engines
Web Data
The Internet
request id
+
analysis
data
+
docum
ent data
request id +
query terms
requestid
analysis
data
+
docum
entdata
visualisation
modifications
analysis
modifications
query
modifications
analysisdata+
documentdata+
queryterms
searchrequest
webdatalinks
requestwebdata
webdata
Figure 5.3: VISR Architecture Information Flow Diagram. This figure illustrates the data
flow between modules in the VISR tool. The section numbers marked in the figure repre-
sent sections in this chapter discussing those processes. Note: no link is required from dy-
namic query module to query processor because all input into dynamic query module is in a
machine-readable form.

54 VISR
Query Processing
Image Retrieval
and Analysis
Transparent Cluster
VisualisationCreation
Dynamic Query
Mode
Termination
search
request
query
processing
complete
retrieval
and analysis
complete
visualisation
displayed
analysis
modification
request
visualisation
modification
request
user
satisfied
Figure 5.4: VISR Architecture State Transition Diagram. This figure illustrates the flow of
execution of top-level tasks in the VISR tool. VISR is initialised when a search request is
received. The query is processed and image retrieval and analysis occurs. This is the process
of retrieving and analysing an image collection using query criteria. Following the completion
of retrieval and analysis, the transparent cluster visualisation is created. After the visualisation
is displayed, the system enters dynamic query mode where the user may choose to modify the
visualisation or the retrieval and analysis criteria. When the user is satisfied with the results,
VISR terminates.

Ü5.2 Flexible Image Retrieval and Analysis Module 55
5.2 Flexible Image Retrieval and Analysis Module
The information flow diagram for the Flexible Image Retrieval and Analysis Module
is shown in figure 5.5, while the state transition diagram is shown in figure 5.6. The
structure of this section is illustrated by the information flow diagram, while the state
transition diagram illustrates the flow of execution.
5.2.1 Retrieval Plugin Manager
The Retrieval Plugin Manager manages all system retrieval plugins. Upon a search
request, the plugin manager determines which retrieval plugins are able to fulfill the
request, either in whole or in part, and sends the appropriate query terms to the re-
trieval engines. Following the completion of retrieval, the retrieved image collection
is pooled. This pool of images forms the image domain.
5.2.1.1 Retrieval Plugin Stack
The plugins connect to their corresponding retrieval engine, translate queries into a
format acceptable to the engine and submit the query. The links retrieved from the
engines are pooled by the plugin, and sent to the Web document retriever for retrieval.
This uses existing Web search infrastructure to retrieve from a large collection of im-
ages.
Implemented Retrieval Plugins
VISR contains a WWW retrieval plugin for the AltaVista image search engine [3]. Al-
taVista only supports text-based image retrieval, as such, queries must contain at least
one text analysis criteria, this may however, be accompanied by multiple content cri-
teria.
5.2.2 Analysis Plugin Manager
The Analysis Plugin Manager manages all the analysis plugins in the system. The
query terms are analysed by their corresponding analysis plugins.
If there is no plugin for a given query type, the system can be set to default to text, or
to ignore the query term. If one plugin services multiple query terms, they are queued
at the desired analysis plugin.
5.2.2.1 Analysis Plugin Stack
The plugins access the search document repository and retrieve the document collec-
tion stored by Web document retriever. The documents are analysed on a per query-

56 VISR
QueryProcessor
Retrieval
Plugin
Manager
(section5.2.1)
query terms
+ request id
Analysis
Plugin
Manager
(section5.2.2)
Retrieval
PluginStack
(section5.2.1.1)
queryterms+ requestid
SearchEngine
Interface1
WWWSearch
Engines
TheInternet
searchdata
repository
queryterms
+requestid
Analysis
PluginStack
(section5.2.2.1)
docum
ents
requestid
query terms
+
request id
+
analysis parameters
TransparentCluster
Visualisation
Module
(section5.3)
queryterms
+requestid
analysisdata
repository
query
term
s
+
analysis
param
eters
WebDocument
Retriever
(section5.2.3)
documentlinks
documentlinks
queryterms
DynamicQuery
Module
(section5.4)
section5.2section5.3
section5.4
User
request id +
query term
+ ranking
requestid+ documents
Adjustment
Translator
(section5.2.4)
newqueryterms
newanalysis
parameters
documentlinks
documents
Overview
cacheddocument
repository
documentlinks
documents
Figure5.5:FlexibleImageRetrieval&AnalysisModuleInformationFlowDiagram.Thisfigureillustratesthedataflowbetweenprocesses
intheVISRFlexibleImageRetrievalandAnalysisModule.Thisfigureisadetailedillustrationofthismodule.Itsrelationtotherestofthe
VISRtool,figure5.3,isillustratedinthetoplefthandcorner.

RetrievalPluginsExecution
AnalysisPluginsExecution
Query
Processing
ImageRetrieval
andAnalysis
TransparentCluster
DynamicQuery
Mode
Termination
QueryProcessing
TransparentCluster
retrieval
complete
DynamicQuery
Mode
DetermineModification
Requirements
analysis
complete
retrieval
modification
required
retrievalnot
required
query
modification
desired
Overview
Figure5.6:FlexibleImageRetrieval&AnalysisModuleStateTransitionDiagram.ThisfigureillustratestheflowofexecutionoftheFlexible
ImageRetrievalandAnalysistasks.Followingqueryprocessing,theImageRetrievalandAnalysistaskiscalled.Thisstageexecutesthe
retrievalplugins,followingthecompletionofretrievaltheanalysispluginsareexecuted.Followingthecomputationofanalysisrankingsthe
resultvisualisationisnotified.Iftheuserselectstomodifytheanalysisthroughthedynamicquerymodule,thenewanalysisrequirementsare
analysed.Ifthemodificationrequiresanewimagedomain,theretrievalpluginsarere-executedwiththenewqueryterms.Ifthemodification
doesnotrequireanewimagedomain,theanalysispluginisre-executedwithdifferentanalysissettings.

58 VISR
Source Quality
Image URL 34%
Image Name 50%
Title 62%
Alt text 86%
Anchor text 87%
Heading 54%
Surrounding text 34%
Entire text 33%
Table 5.1: Keyword source qualities from [46]
term basis; with each query term ranked individually and stored in the analysis data
repository.
One of the key problems in performing text-based image analysis on the WWW is
how to associate Web page text to images. The association of HTML meta-data to im-
ages retrieved from Web pages is a complex problem. This task becomes even more
arduous because HTML meta-data can be incomplete or incorrect. When using multi-
ple tags in HTML documents to rank images it is important to take the quality of each
source into account when indexing an image.
Lu and Williams [39] use bibliographic data from HTML documents to derive im-
age text relevance. They use a simple product based on unfounded quality measures
to calculate the relevance of document sections to an image. They provide no experi-
mental evidence to support their rankings.
Mukherjea and Cho [46] use a combination of bibliographic and structural informa-
tion embedded in the HTML document to ﬁnd image relevant text. They then ex-
perimentally determine the quality of each image source. The ratings they found are
presented in table 5.1.
The text-based analysis plugin in the VISR tool uses all sections of the HTML docu-
ment to associate meta-data. Mukherjea and Cho’s text quality measures are used to
scale document section meta-data relevance.
Content-based Analysis Plugin
VISR contains a colour content-based image analysis plugin. This plugin performs a
simple colour analysis of images, given a user speciﬁed colour. This plugin provides
proof-of-concept content-based analysis. Other content-based analysis plugins to per-
form more advanced analysis can be incorporated into the system.
Colour analysis is performed using basic histographic analysis, where image colour

components are separated into a specified number of buckets. The higher the number
of buckets, the more accurate the colour comparison. The ranking algorithm matches
red, green and blue levels between images. The retrieved image with the highest
number of pixels of the specified colour is used to normalise the ranking for all other
images.
5.2.3 Web Document Retriever
Given a URL, the Web document retriever downloads Web pages using a utility called
GNU wget. Prior to downloading, the locally cached Web page and image library is
checked to see whether the pages have been previously retrieved, if not, downloading
begins. After the Web pages are downloaded, they are parsed to find image URLs. If
the image or the Web page no longer exists, the Web document retriever discards
page information. If the image link exists in the page, the Web document retriever
downloads the image for further analysis.
5.2.4 Adjustment Translator
The Adjustment Translator takes incoming adjustment requests and determines whether
the adjustment requires a re-retrieval of documents or the re-analysis of the image col-
lection.

60 VISR
5.3 Transparent Cluster Visualisation Module
The information flow diagram for the Transparent Cluster Visualisation module is
shown in figure 5.7, while the state transition diagram is shown in figure 5.8. The
structure of this section is illustrated by the information flow diagram, while the state
transition diagram illustrates the flow of execution.
5.3.1 Spring-based Image Position Calculator
Given query term matching analysis data, the spring-based image position calculator
positions images in the visualisation. The visualisation is based on a spring model
developed by Olsen and Korfhage [49] for the original VIBE. This was formalised by
Hoffman to produce the Radial Visualization (RadViz) [26]. In RadViz, reference
points are equally spaced around the perimeter of a circle. The data set is then dis-
tributed in the circle according to its attraction to the reference points.
In VISR, the distribution occurs thorough query terms applying forces to the images in
the collection. Springs are attached such that each image is connected to every query
term, and images are independent of each other. The query terms remain static while
the images are pulled towards the query terms according to how relevant the query
terms are to the image. When these forces reach an equilibrium, the images are in their
final positions. The conceptual model of this visualisation can be seen in figure 5.9.
Image Space

Ü5.3 Transparent Cluster Visualisation Module 61
FlexibleImage
Retrievaland
Analysis
(section5.2)
queryterms
+requestid
analysisdata
repository
requestid
docum
entsand
rankings
section5.2section5.3
section5.4
User
DynamicQuery
Modification
(section5.4)
visualisation
context
Spring-basedImage
PositionCalculator
(section5.3.1)
ImageLocation
ConflictResolver
(section5.3.2)
Display
Generator
(section5.3.3)
User
inform
ation
space
(analysis
data
+
docum
entdata
+
query
term
s)visualisation
modifications
requestid
requestid+
imagelocations+
visualisation
settings
querytermslocation+
imagelocation+
zoom
factor
visualisation
settings
imagelocations
Figure5.7:TransparentClusterVisualisationModuleInformationFlowDiagram.Thisfigureillustratesthedataflowbetweenprocessesin
theVISRTransparentClusterVisualisationModule.Thisfigureisadetailedlookatthismodule.ItsrelationtotherestoftheVISRtool,figure
5.3,isillustratedinthetoplefthandcorner.

62 VISR
DetermineImageLocations
ResolveImageConflicts
Query
Processing
ImageRetrieval
andAnalysis
TransparentCluster
DynamicQuery
Mode
Termination
ImageRetrieval
andAnalysis
retrievaland
analysis
complete
DynamicQuery
Mode
GenerateDisplay
visualisation
settings
changed
imagelocations
determined
imagelocation
conflicts
resolved
visualisation
displayed
Figure5.8:TransparentClusterVisualisationModuleStateTransitionDiagram.ThisﬁgureillustratestheﬂowofexecutionoftheTranspar-
entClusterVisualisationModuletasks.Followingthecompletionofretrievalandanalysis,theimagelocationsaredetermined.Followingthe
calculationofimagelocations,overlappingimagesareresolvedandthedisplayisgenerated.Iftheuserchoosestomodifythevisualisationin
dynamicquerymode,thevisualisationmustre-calculateimagepositions.

Ü5.3 Transparent Cluster Visualisation Module 63
ondly, the spring metaphor, where images have no attraction to the centre of the vi-
sualisation, and are pulled freely towards whatever query terms they contain. The
query terms can be represented as vectors leaving the centre of the circle.
Vector Sum Metaphor:
ÔÚ×
Ò
½
ØÓØ Ð´ µÕ (5.1)
Where
ÔÚ× is the vector position of an image
Ò is the number of query terms
is the scalar attraction to query term
Õ is the vector position of query term
ØÓØ Ð´ µ is the total attraction the image has to query terms
Spring Metaphor:
Ô××Ù Ø Ø
Ò
½
´ Ô× Õ µ ¼ (5.2)
Where
Ô× is the vector position of an image.
Ô× Õ is the net force . This force moves Ô× until converges to 0. This gives the
final value of Ô×.
The system is able to be configured to use either the spring or vector sum metaphor.
The vector sum metaphor is less useful than the spring metaphor because there are
less unique positions for image and there tends to be a large cluster of images located
near the centre of the display. Vector sum visualisations are more useful for picking
out interesting query terms or outlying images in the image collection, rather than
clusters of images.
5.3.2 Image Location Conflict Resolver
The image location conflict resolver incorporates techniques that allow the user to
view all images, even if they overlap. This process examines the visualisation context,
checking for overlapping images. Overlapping images are indicated by a blue border
as shown in figure 5.11. This thesis presents two techniques to deal with overlapping
images: Jittering, where images are separated from each other, and Animation, where
overlapping images are animated, with a specified delay, from one overlapping image

Upstill_thesis_2000

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (13)

Similaire à Upstill_thesis_2000

Similaire à Upstill_thesis_2000 (20)

Upstill_thesis_2000