SlideShare a Scribd company logo
First Experiments Searching Spontaneous Czech Speech
Pavel Ircing
Department of Cybernetics
University of West Bohemia
Plzen, Czech Republic
ircing@kky.zcu.cz
Douglas W. Oard
College of Information
Studies/UMIACS
University of Maryland
College Park, Maryland
oard@glue.umd.edu
Jan Hoidekr
Department of Cybernetics
University of West Bohemia
Plzen, Czech Republic
hoidekr@kky.zcu.cz
Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Con-
tent Analysis and Indexing
General Terms
Experimentation
Keywords
Speech retrieval; Spontaneous speech
1. INTRODUCTION
This paper reports on experiments with the ļ¬rst available
Czech IR test collection. The collection consists of a con-
tinuous stream from automatic transcription of spontaneous
speech (see [3] for details) and the task of the IR system is
to identify appropriate replay points where the discussion
about the queried topic starts. The collection thus lacks
clearly deļ¬ned document boundaries. Moreover, the accu-
racy of the transcription is limited (around 35% word error
rate), mostly due to the nature of the speechā€”interviews
with Holocaust survivors, which are sometimes emotional,
accented, and exhibiting age-related speech impediments.
This collection therefore oļ¬€ers an excellent opportunity to
explore both eļ¬€ects present in Czech (e.g., morphology) and
eļ¬€ects that result from processing spontaneous speech. It
was also used in the CL-SR track at the CLEF 2006 evalu-
ation campaign (http://www.clef-campaign.org/).
2. METHODS
Retrieval from a speech stream with unknown topic bound-
aries is an interesting challenge, but that is not our princi-
pal focus in these experiments. We therefore transformed
the collection into artiļ¬cially deļ¬ned set of ā€œdocumentsā€
by removing all recognized pauses between words and then
sliding a 3-minute window over the transcripts with a 1-
minute step size. This resulted in a collection of 11,377
overlapping passages, each containing an average of 390 rec-
ognized words (denoted as the asr ļ¬eld) and a set of au-
tomatically produced Czech translations (using techniques
described in [2]) for 20 automatically assigned thesaurus key-
words (using techniques described in [4]) (the ak ļ¬eld). Each
Copyright is held by the author/owner(s).
SIGIRā€™07, July 23ā€“27, 2007, Amsterdam, The Netherlands.
ACM 978-1-59593-597-7/07/0007.
word stem lemma
asr 0.0256 0.0494 0.0506
ak 0.0018 0.0022 0.0023
asr.ak 0.0241 0.0447 0.0467
Table 1: Mean GAP, long queries.
ļ¬eld was indexed separately, and a uniļ¬ed index (asr.ak)
was also constructed.
Twenty-nine topics were initially created in English in
the usual TREC-style format (<title>, <desc> and <narr>
ļ¬elds), translated into Czech by a native speaker, and then
checked for natural expression by a second native speaker.
We performed monolingual experiments with ā€œlongā€ queries
constructed by concatenating the words from all three topic
ļ¬elds.
A morphological analyser was used to obtain the infor-
mation about the lemma (linguistic root form), stem (ap-
proximation to that root form using truncation alone) and
part-of-speech for each Czech word [1] . Three variants of
the collection were indexed, one with only words, one with
only lemmas and one with only stems. Part-of-speech tags
were used as a basis for stopword removalā€”as we could not
ļ¬nd any decent stoplist for Czech, we simply removed all
words that were tagged as preposition, conjunction, particle
or interjection. In each case, identical processing was done
for the queries. We used Lemur to implement a simple tf.idf
model with blind feedback (using Lemurā€™s standard parame-
ters). Length normalization was not performed because the
collection preprocessing resulted in documents with nearly
identical lengths.
3. EVALUATION
Relevance assessors identiļ¬ed appropriate start times by
interactively searching using manually assigned English the-
saurus terms and the same automatically transcribed con-
tent, ultimately conļ¬rming their decisions by listening to the
audio when the automatically produced transcripts were not
sufficiently accurate to make a deļ¬nitive judgment. Table 1
reports the mean Generalized Average Precision (mGAP),
which is computed in a manner similar to mean average pre-
cision (for details see [3]).
Indexing the ak ļ¬eld, alone or in combination with asr,
proved not to be helpful (although the apparent reduction
when indexed together is not statistically signiļ¬cant (p >
0.05)). Manual examination of a few ak ļ¬elds indeed indi-
cates a low density of terms that appear as if they match
SIGIR 2007 Proceedings Poster
835
0
0.05
0.1
0.15
0.2
0.25
0.3
1166
1181
1185
1187
1225
1286
1288
1310
1311
1321
1508
1620
1630
1663
1843
word
stem
lemma
0
0.05
0.1
0.15
0.2
0.25
0.3
2198
2253
3004
3005
3009
3014
3015
3017
3018
3020
3025
3033
4000
14312
word
stem
lemma
Figure 1: GAP by topic, asr field, long queries.
the content of the passage, but additional analysis will be
needed before we can ascribe blame between the transcrip-
tion, classiļ¬cation and translation stages in the cascade that
produced those keyword assignments. We therefore focus on
results obtained using the asr ļ¬eld alone for the remainder
of our analysis.
It is apparent that some form of linguistic preprocessing
is indeed crucial for Czech. Both lemmatization and stem-
ming boosted the performance almost by a factor of two
in comparison with the word runs, and a Wilcoxon signed-
rank test shows that diļ¬€erence to be statistically signiļ¬cant
(p < 0.005). The slight apparent advantage of the lemma run
over the stem run is not statistically signiļ¬cant (p > 0.05).
As Figure 1 shows, substantial variation in GAP is evident
across topics. The four topics with the highest GAP values
(1225, 1630, 2198, 3014) each contain highly discriminative
terms that were correctly transcribed. Topic 1630 exhibits
an enormous diļ¬€erence between word matching and match-
ing either stems or lemmas, a vivid reminder of how the
recall-enhancing eļ¬€ect of linguistic analysis can dominate
averaged measures (a similar eļ¬€ect is also apparent for topic
1310). While a few cases of adverse eļ¬€ects from linguis-
tic analysis are visible (most notably with topics 1225 and
1181), these eļ¬€ects are generally relatively small. The occa-
sional diļ¬€erences between stems and lemmas suggests that
combining evidence from both might help in some cases.
Unsuccessful topics generally either asked about abstract
concepts without using many discriminative terms (e.g., topic
1288: ā€œstrengthening faith during the Holocaustā€), or the
discriminative terms for the topic happened to be missing
from the collection. For example, topic 3018 contained a sin-
gle discriminative term that was simply spelled diļ¬€erently
in the ASR lexicon (and consequently in the transcripts).
Manually conforming the spelling in the topic to that found
in the lexicon would have increased the GAP for that topic
(with lemma) from 0.0026 to 0.1175.
Interestingly, it turned out that every term that we (man-
ually) judged to be highly discriminating in our analysis of
successful and unsuccessful topics was a named entity (NE).
This prompted us to perform a more systematic analysis of
the vocabulary coverage for the NEs present in all 29 topics.
If we leave out the NEs that are widespread in the collection
and thus useless for IR (Jew, Holocaust, Hitler, etc.), there
are 42 NEs in the topic set; only 13 of them are present in
the ASR lexicon, only 11 of those 13 actually appeared any-
where in the transcripts, and only 5 of those 11 substantially
contributed to successful IR (or, if we manually conform the
spelling in topic 3018, 6 of 12). The overall ā€œquery rare
named entity error rateā€ for this collection is therefore (42-
5)/42=88%, more than double the overall word error rate.
Rare NEs are quite naturally not well represented in the
materials from which ASR systems are trained; integrating
phone-lattice term detection with large-vocabulary recogni-
tion oļ¬€ers one promising research direction. Inconsistent
spelling is probably the more easily rectiļ¬ed problem; anno-
tators of ASR training materials are typically not domain
experts, and in some cases valid alternate transliterations
(e.g., from Yiddish roots) result in disagreement even among
experts. One useful approach would be to adjust the top-
ics to conform to the ASR lexicon, thus simulating a similar
process an interactive searcher could perform if notiļ¬ed that
one of their query terms is outside the known vocabulary.
4. NEXT STEPS
In addition to the ideas above for dealing with rare terms,
another obvious next step would be to optimize our system
design to better reļ¬‚ect the task characteristics that moti-
vated the design of the mean GAP measure. We have shown
that passage retrieval can indeed sometimes get us in the
right neighborhood, but overlapping passages may not be
the best way of identifying optimal replay start times. An-
other question that we need to explore is whether some other
retrieval model might be more eļ¬€ective. Finally, extending
our work to include on the far larger CLEF 2007 Czech
news test collection will allow us to enrich our comparison
between lemmas and stems for Czech indexing.
5. ACKNOWLEDGMENTS
This work was supported in part by projects MSMT LC536,
GACR 1ET101470416 and NSF IIS-0122466.
6. REFERENCES
[1] J. HajicĢŒ. Disambiguation of Rich Inflection.
(Computational Morphology of Czech). Karolinum,
Prague, 2004.
[2] C. Murray et al. Leveraging Reusability: Cost-eļ¬€ective
Lexical Acquisition for Large-scale Ontology
Translation. In Proceeding of ACL 2006, pages 945ā€“952,
Sydney, Australia, 2006.
[3] D. Oard et al. Overview of the CLEF-2006
Cross-Language Speech Retrieval Track. In CLEF 2006
- revised selected papers - Springer LNCS, 2007.
[4] S. Olsson, D. Oard, and J. HajicĢŒ. Cross-Language Text
Classiļ¬cation. In Proceedings of SIGIR 2005, pages
645ā€“646, Salvador, Brazil, 2005.
SIGIR 2007 Proceedings Poster
836

More Related Content

Similar to Analysis And Indexing General Terms Experimentation

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
Ā 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESIJCSES Journal
Ā 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
Ā 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSijaia
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERkevig
Ā 
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of JustificationsComparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of JustificationsTimdeBoer16
Ā 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Ā 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...cseij
Ā 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
Ā 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ā 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
Ā 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
Ā 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
Ā 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationDarian Pruitt
Ā 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...NALESVPMEngg
Ā 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
Ā 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfJemalNesre1
Ā 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovySagar Dabhi
Ā 
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in SurgeryAutomatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in SurgeryDR.P.S.JAGADEESH KUMAR
Ā 

Similar to Analysis And Indexing General Terms Experimentation (20)

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Ā 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
Ā 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
Ā 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
Ā 
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of JustificationsComparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Ā 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
Ā 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
Ā 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
Ā 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
Ā 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
Ā 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
Ā 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Ā 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
Ā 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
Ā 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Ā 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
Ā 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
Ā 
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in SurgeryAutomatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Ā 

More from Ashley Hernandez

5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.Ashley Hernandez
Ā 
A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.Ashley Hernandez
Ā 
500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.Ashley Hernandez
Ā 
2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.Ashley Hernandez
Ā 
250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.Ashley Hernandez
Ā 
12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.Ashley Hernandez
Ā 
7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing PromptsAshley Hernandez
Ā 
300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.Ashley Hernandez
Ā 
3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.Ashley Hernandez
Ā 
A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.Ashley Hernandez
Ā 
4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.Ashley Hernandez
Ā 
2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.Ashley Hernandez
Ā 
1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.Ashley Hernandez
Ā 
A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.Ashley Hernandez
Ā 
A Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast EssayA Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast EssayAshley Hernandez
Ā 
A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.Ashley Hernandez
Ā 
06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.Ashley Hernandez
Ā 
7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.Ashley Hernandez
Ā 
5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.Ashley Hernandez
Ā 
A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.Ashley Hernandez
Ā 

More from Ashley Hernandez (20)

5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.
Ā 
A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.
Ā 
500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.
Ā 
2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.
Ā 
250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.
Ā 
12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.
Ā 
7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts
Ā 
300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.
Ā 
3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.
Ā 
A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.
Ā 
4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.
Ā 
2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.
Ā 
1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.
Ā 
A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.
Ā 
A Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast EssayA Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast Essay
Ā 
A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.
Ā 
06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.
Ā 
7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.
Ā 
5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.
Ā 
A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.
Ā 

Recently uploaded

Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Celine George
Ā 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
Ā 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
Ā 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryEugene Lysak
Ā 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345beazzy04
Ā 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
Ā 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxCeline George
Ā 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfVivekanand Anglo Vedic Academy
Ā 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
Ā 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptxmansk2
Ā 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya - UEM Kolkata Quiz Club
Ā 
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...Nguyen Thanh Tu Collection
Ā 
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...Nguyen Thanh Tu Collection
Ā 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
Ā 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPCeline George
Ā 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
Ā 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
Ā 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
Ā 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno
Ā 

Recently uploaded (20)

Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Ā 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
Ā 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
Ā 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
Ā 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
Ā 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
Ā 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
Ā 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
Ā 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
Ā 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
Ā 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Ā 
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...
GIƁO ƁN Dįŗ Y THƊM (Kįŗ¾ HOįŗ CH BƀI BUį»”I 2) - TIįŗ¾NG ANH 8 GLOBAL SUCCESS (2 Cį»˜T) N...
Ā 
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...
50 Đį»€ LUYį»†N THI IOE Lį»šP 9 - NĂM Hį»ŒC 2022-2023 (CƓ LINK HƌNH, FILE AUDIO Vƀ ĐƁ...
Ā 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
Ā 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Ā 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Ā 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Ā 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Ā 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Ā 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Ā 

Analysis And Indexing General Terms Experimentation

  • 1. First Experiments Searching Spontaneous Czech Speech Pavel Ircing Department of Cybernetics University of West Bohemia Plzen, Czech Republic ircing@kky.zcu.cz Douglas W. Oard College of Information Studies/UMIACS University of Maryland College Park, Maryland oard@glue.umd.edu Jan Hoidekr Department of Cybernetics University of West Bohemia Plzen, Czech Republic hoidekr@kky.zcu.cz Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Con- tent Analysis and Indexing General Terms Experimentation Keywords Speech retrieval; Spontaneous speech 1. INTRODUCTION This paper reports on experiments with the ļ¬rst available Czech IR test collection. The collection consists of a con- tinuous stream from automatic transcription of spontaneous speech (see [3] for details) and the task of the IR system is to identify appropriate replay points where the discussion about the queried topic starts. The collection thus lacks clearly deļ¬ned document boundaries. Moreover, the accu- racy of the transcription is limited (around 35% word error rate), mostly due to the nature of the speechā€”interviews with Holocaust survivors, which are sometimes emotional, accented, and exhibiting age-related speech impediments. This collection therefore oļ¬€ers an excellent opportunity to explore both eļ¬€ects present in Czech (e.g., morphology) and eļ¬€ects that result from processing spontaneous speech. It was also used in the CL-SR track at the CLEF 2006 evalu- ation campaign (http://www.clef-campaign.org/). 2. METHODS Retrieval from a speech stream with unknown topic bound- aries is an interesting challenge, but that is not our princi- pal focus in these experiments. We therefore transformed the collection into artiļ¬cially deļ¬ned set of ā€œdocumentsā€ by removing all recognized pauses between words and then sliding a 3-minute window over the transcripts with a 1- minute step size. This resulted in a collection of 11,377 overlapping passages, each containing an average of 390 rec- ognized words (denoted as the asr ļ¬eld) and a set of au- tomatically produced Czech translations (using techniques described in [2]) for 20 automatically assigned thesaurus key- words (using techniques described in [4]) (the ak ļ¬eld). Each Copyright is held by the author/owner(s). SIGIRā€™07, July 23ā€“27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. word stem lemma asr 0.0256 0.0494 0.0506 ak 0.0018 0.0022 0.0023 asr.ak 0.0241 0.0447 0.0467 Table 1: Mean GAP, long queries. ļ¬eld was indexed separately, and a uniļ¬ed index (asr.ak) was also constructed. Twenty-nine topics were initially created in English in the usual TREC-style format (<title>, <desc> and <narr> ļ¬elds), translated into Czech by a native speaker, and then checked for natural expression by a second native speaker. We performed monolingual experiments with ā€œlongā€ queries constructed by concatenating the words from all three topic ļ¬elds. A morphological analyser was used to obtain the infor- mation about the lemma (linguistic root form), stem (ap- proximation to that root form using truncation alone) and part-of-speech for each Czech word [1] . Three variants of the collection were indexed, one with only words, one with only lemmas and one with only stems. Part-of-speech tags were used as a basis for stopword removalā€”as we could not ļ¬nd any decent stoplist for Czech, we simply removed all words that were tagged as preposition, conjunction, particle or interjection. In each case, identical processing was done for the queries. We used Lemur to implement a simple tf.idf model with blind feedback (using Lemurā€™s standard parame- ters). Length normalization was not performed because the collection preprocessing resulted in documents with nearly identical lengths. 3. EVALUATION Relevance assessors identiļ¬ed appropriate start times by interactively searching using manually assigned English the- saurus terms and the same automatically transcribed con- tent, ultimately conļ¬rming their decisions by listening to the audio when the automatically produced transcripts were not sufficiently accurate to make a deļ¬nitive judgment. Table 1 reports the mean Generalized Average Precision (mGAP), which is computed in a manner similar to mean average pre- cision (for details see [3]). Indexing the ak ļ¬eld, alone or in combination with asr, proved not to be helpful (although the apparent reduction when indexed together is not statistically signiļ¬cant (p > 0.05)). Manual examination of a few ak ļ¬elds indeed indi- cates a low density of terms that appear as if they match SIGIR 2007 Proceedings Poster 835
  • 2. 0 0.05 0.1 0.15 0.2 0.25 0.3 1166 1181 1185 1187 1225 1286 1288 1310 1311 1321 1508 1620 1630 1663 1843 word stem lemma 0 0.05 0.1 0.15 0.2 0.25 0.3 2198 2253 3004 3005 3009 3014 3015 3017 3018 3020 3025 3033 4000 14312 word stem lemma Figure 1: GAP by topic, asr field, long queries. the content of the passage, but additional analysis will be needed before we can ascribe blame between the transcrip- tion, classiļ¬cation and translation stages in the cascade that produced those keyword assignments. We therefore focus on results obtained using the asr ļ¬eld alone for the remainder of our analysis. It is apparent that some form of linguistic preprocessing is indeed crucial for Czech. Both lemmatization and stem- ming boosted the performance almost by a factor of two in comparison with the word runs, and a Wilcoxon signed- rank test shows that diļ¬€erence to be statistically signiļ¬cant (p < 0.005). The slight apparent advantage of the lemma run over the stem run is not statistically signiļ¬cant (p > 0.05). As Figure 1 shows, substantial variation in GAP is evident across topics. The four topics with the highest GAP values (1225, 1630, 2198, 3014) each contain highly discriminative terms that were correctly transcribed. Topic 1630 exhibits an enormous diļ¬€erence between word matching and match- ing either stems or lemmas, a vivid reminder of how the recall-enhancing eļ¬€ect of linguistic analysis can dominate averaged measures (a similar eļ¬€ect is also apparent for topic 1310). While a few cases of adverse eļ¬€ects from linguis- tic analysis are visible (most notably with topics 1225 and 1181), these eļ¬€ects are generally relatively small. The occa- sional diļ¬€erences between stems and lemmas suggests that combining evidence from both might help in some cases. Unsuccessful topics generally either asked about abstract concepts without using many discriminative terms (e.g., topic 1288: ā€œstrengthening faith during the Holocaustā€), or the discriminative terms for the topic happened to be missing from the collection. For example, topic 3018 contained a sin- gle discriminative term that was simply spelled diļ¬€erently in the ASR lexicon (and consequently in the transcripts). Manually conforming the spelling in the topic to that found in the lexicon would have increased the GAP for that topic (with lemma) from 0.0026 to 0.1175. Interestingly, it turned out that every term that we (man- ually) judged to be highly discriminating in our analysis of successful and unsuccessful topics was a named entity (NE). This prompted us to perform a more systematic analysis of the vocabulary coverage for the NEs present in all 29 topics. If we leave out the NEs that are widespread in the collection and thus useless for IR (Jew, Holocaust, Hitler, etc.), there are 42 NEs in the topic set; only 13 of them are present in the ASR lexicon, only 11 of those 13 actually appeared any- where in the transcripts, and only 5 of those 11 substantially contributed to successful IR (or, if we manually conform the spelling in topic 3018, 6 of 12). The overall ā€œquery rare named entity error rateā€ for this collection is therefore (42- 5)/42=88%, more than double the overall word error rate. Rare NEs are quite naturally not well represented in the materials from which ASR systems are trained; integrating phone-lattice term detection with large-vocabulary recogni- tion oļ¬€ers one promising research direction. Inconsistent spelling is probably the more easily rectiļ¬ed problem; anno- tators of ASR training materials are typically not domain experts, and in some cases valid alternate transliterations (e.g., from Yiddish roots) result in disagreement even among experts. One useful approach would be to adjust the top- ics to conform to the ASR lexicon, thus simulating a similar process an interactive searcher could perform if notiļ¬ed that one of their query terms is outside the known vocabulary. 4. NEXT STEPS In addition to the ideas above for dealing with rare terms, another obvious next step would be to optimize our system design to better reļ¬‚ect the task characteristics that moti- vated the design of the mean GAP measure. We have shown that passage retrieval can indeed sometimes get us in the right neighborhood, but overlapping passages may not be the best way of identifying optimal replay start times. An- other question that we need to explore is whether some other retrieval model might be more eļ¬€ective. Finally, extending our work to include on the far larger CLEF 2007 Czech news test collection will allow us to enrich our comparison between lemmas and stems for Czech indexing. 5. ACKNOWLEDGMENTS This work was supported in part by projects MSMT LC536, GACR 1ET101470416 and NSF IIS-0122466. 6. REFERENCES [1] J. HajicĢŒ. Disambiguation of Rich Inflection. (Computational Morphology of Czech). Karolinum, Prague, 2004. [2] C. Murray et al. Leveraging Reusability: Cost-eļ¬€ective Lexical Acquisition for Large-scale Ontology Translation. In Proceeding of ACL 2006, pages 945ā€“952, Sydney, Australia, 2006. [3] D. Oard et al. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In CLEF 2006 - revised selected papers - Springer LNCS, 2007. [4] S. Olsson, D. Oard, and J. HajicĢŒ. Cross-Language Text Classiļ¬cation. In Proceedings of SIGIR 2005, pages 645ā€“646, Salvador, Brazil, 2005. SIGIR 2007 Proceedings Poster 836