SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
1
Carolin Odebrecht &
Florian Zipser
Humboldt-Universität zu Berlin
ANNIS workshop
2014-08-26
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
2
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Imports existing corpora
● Corpora already have to be annotated, ANNIS only uses
what's there
● No NLP!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
3
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Makes corpora searchable
● One query language for all corpora (AQL)
● Abstraction over linguistic data necessary
● But: Corpora have different annotations → query has to
match the annotations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
4
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Displays corpora
● Many visualizations available
● Corresponding to type of annotation (syntactic trees,
phrase trees (RST), grids, coreferences ...)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
5
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
6
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
– ANNIS does not know any semantics
→ „NN“, „NP“, „sentence“, „word“, „my favorite
annotation“ … are just sequences of characters
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
7
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
– ANNIS does not know any semantics
→ „NN“, „NP“, „sentence“, „word“, „my favorite
annotation“ … are just sequences of characters
– You need to be exact
→ e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the
blank)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
8
ANNIS basics
ANNIS basics
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
9
Enter query
Corpus list
Previous
queries
Virtual
Keyboard
(e.g. arabic)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
10
Sample queries
(corresponding
to corpus)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
11
Query result
Visualizations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
12
Corpus
metadata
Corpus
metadata
window
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
13
Document
metadata
Document
metadata
window
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
14
ANNIS basics
● Basic principles of AQL (ANNIS Query
Language)
– Attributes and values
● Searching for exact character sequences
● Searching for patterns
– Combinatory search
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
15
Demo corpus
● Corpus for demonstration: pcc2 (a sub corpus
of pcc)
https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg
● Potsdam Commentary Corpus
– German Newspaper commentaries
'Märkische Allgemeine Zeitung'
https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html
– Multiple annotations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
16
ANNIS basics
● Different types of annotations
– Token annotation
– Span annotation
– Pointing relation
– Hierarchy annotation
(trees)
To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
17
ANNIS basics
● Different types of annotations
– Token annotation
– Span annotation
– Pointing relation
– Hierarchy annotation
(trees)
To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
18
Exact word forms
● Token annotation
– Exact sequence
searching for a word form
"Jugendlichen"
"jugendlichen"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
19
Exact word forms
● Token annotation
– Exact sequence
searching for a word form
"Jugendlichen" 3 hits
"jugendlichen" 0 hits
→ tok="jugendlichen"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
20
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos = "NN"
attribute value
– Attributes can have more than one value
– Searching for all values of an attribute
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
21
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos="NN"
pos="ADJA"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
22
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos="NN" 62 hits
pos="ADJA" 18 hits
searching for all values of an attribute
pos 399 hits
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
23
Exact span
annotation
● Span annotation
– Exact sequence
searching for sentences
Sent="s"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
24
Exact span
annotation
● Span annotation
– Exact sequence
searching for sentences
Sent="s" 28 hits
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
25
Metadata
● Sent="s" 28 hits
– necessary to know which annotations are in a
corpus
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
26
Pattern
● Token annotation
– Patterns
. matches any single character
* zero or more of the preceding element
searching for the beginning a of word
/Jugend.*/
/jugend.*/
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
27
Pattern
● Token annotation
– Patterns
. matches any single character
* zero or more of the preceding element
searching for the beginning a of word
/Jugend.*/ 5 hits ("Jugendlichen" 3 hits)
Jugendlichen Jugendliche
/jugend.*/ 0 hits ("jugendlichen" 0 hits)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
28
Pattern
● Token annotation
– patterns
searching for all nouns
pos=/N./ includes NN & NE
searching for all adjectives
pos=/ADJ./ includes ADJA & ADJD
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
29
Pattern
● Token annotation
– patterns
searching for all nouns
pos=/N./ 73 hits (pos="NN" 62 hits)
searching for all adjectives
pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
30
Relations between
annotations
● Span annotation
searching for all NPs
cat="NP" 41 hits (pos="NN" 62 hits)
e.g. Die Jugendlichen in Zossen
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
31
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" 41 hits
pos="APPR" 19 hits
e.g. Die Jugendlichen in Zossen
→ no relation between the two information!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
32
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" #1
pos="APPR" #2
e.g. Die Jugendlichen in Zossen
→ NP includes APPR
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
33
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" &
pos="APPR" &
#1_i_#2
e.g. Die Jugendlichen in Zossen
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
34
Hierarchy relations
● Relations between attributes
searching for all NPs which are objects
cat="NP"
e.g. Die Jugendlichen in Zossen -->subject!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
35
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
– NP → node annotation
– OA → edge annotation
To ke n To ke n To ke n
Sp a n
N o d e
Ed ge
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
36
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
cat="NP"
the syntactic function in the tree
func="OA"
→ Note: At least there are two elements which
relate in a way to each other!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
37
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
node & cat="NP" & #1 >[func="OA"] #2
e.g. ein Musikcafé -->object!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
38
Used Relations
● Relations we used:
A _i_ B A includes B
A > B A dominates B
A >[func=“OA“] B A dominates B and B is an
object
The full list of relations can be found in ANNIS
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
39
What's new in
ANNIS
What's new in ANNIS
version 3.1.7
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
40
What's new in
ANNIS
●
Simplified syntax (AQL)
●
Frequency analysis (Visualisierung)
●
Expand match context (Visualisierung)
●
Equality and Inequality (AQL)
●
Variables (AQL)
●
Complex OR expression (AQL)
●
Document browser (Visualisierung)
●
CSV export (Visualisierung)
●
Tooltip for corpus names (Visualisierung)
●
Report problem (Visualisierung)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
41
Simplified syntax
●
Question:
„Die“ followed by „Jugendlichen“ both being dominated
by a prepositional phrase which is dominated by a
sentence
So far:
cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 >
#4 & #3 . #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
42
Simplified syntax
●
Question:
„Die“ followed by „Jugendlichen“ both being dominated
by a prepositional phrase which is dominated by a
sentence
So far:
cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 >
#4 & #3 . #4
Simplified:
cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
43
Frequency analysis
●
Question:
– How many words tagged as „NN“, „ADJA“ or „ADV“
does a corpus contain?
– What are the most frequent part-of-speech tags
followed by a noun?
– What are the most frequent part-of-speech tags in a
prepositional phrase, which is in a sentence?
– ...
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
44
Frequency analysis
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
45
Frequency analysis
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
46
Frequency analysis
Attention:
A frequency analysis has to be bound to a query!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
47
Frequency analysis
● What are the most
frequent part-of-speech
tags followed by a noun?
● What are the most frequent
part-of-speech tags in a
prepositional phrase,
which is in a sentence?
pos . pos="NN"
cat="S" > cat="PP" > pos
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
48
Expand match
context
● Even more than 25 is possible, it's a free text
field
● Sometimes the context is too small
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
49
Equality and
Inequality
●
Equality „==“ and inequality „!=“ for attributes
Question (inequality):
two different part-of-speech tags, one directly following
the other
pos . pos & #1 != #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
50
Equality and
Inequality
● Equality „==“ and inequality „!=“ for attributes
● Question (equality):
two same part-of-speech tags, one directly following the
other
pos . pos & #1 == #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
51
Equality and
Inequality
● Equality „==“ and inequality „!=“ for attributes
Question (inequality):
two different part-of-speech tags, one directly following the
other
pos . pos & #1 != #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
52
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
53
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
54
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
Variables and numbers can be mixed:
cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
55
Complex OR
expression
● Question (simple OR):
A part-of-speech tag which is a noun, an attributive
adjective or an article
pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
56
Complex OR
expression
pos="NN" | pos="ADJA" | pos= "ART"
● Question (simple OR):
A part-of-speech tag which is a noun, an attributive
adjective or an article
● OR for expressions
pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
57
Complex OR
expression
(cat="S" > cat="PP") | cat="NP"
● Question (complex OR):
A prepositional phrase, which is dominated by a sentence,
or just a nominal phrase
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
58
Complex OR
expression
a#cat="PP" &
(b#pos="NN" | b#pos="ADJA" | b#pos= "ART") &
#a > #b
● Question (nested OR):
A prepositional phrase, which dominates a noun, an
attributive adjective or an article
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
59
Complex OR
expression
a#cat="PP" &
(b#pos="NN" | b#pos="ADJA" | b#pos= "ART") &
#a > #b
● Question (nested OR):
A prepositional phrase, which dominates a noun, an
attributive adjective or an article
Attention:
All expressions in brackets have to use the same variable
… & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
60
Document browser
● Displays the entire text of a document
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
61
Document browser
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
62
CSV export
● Export data for futher processing
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
63
Tooltips for corpus
names
● Sometimes corpus names can get very long
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
64
Report problem
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
65
Get ANNIS
● ANNIS comes in two flavors
– A server version
– A desktop version (ANNIS kickstarter)
– Both are downloadable at:
http://www.sfb632.uni-potsdam.de/annis/
● ANNIS is open source (Apache license 2.0) and
hosted on github
– https://github.com/korpling/ANNIS
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
66
Thanks for your attention!
Any questions?
carolin.odebrecht@hu-berlin.de,
f.zipser@gmx.de

Contenu connexe

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

ANNIS workshop sfb 2014

  • 1. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 1 Carolin Odebrecht & Florian Zipser Humboldt-Universität zu Berlin ANNIS workshop 2014-08-26
  • 2. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 2 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Imports existing corpora ● Corpora already have to be annotated, ANNIS only uses what's there ● No NLP!
  • 3. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 3 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Makes corpora searchable ● One query language for all corpora (AQL) ● Abstraction over linguistic data necessary ● But: Corpora have different annotations → query has to match the annotations
  • 4. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 4 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Displays corpora ● Many visualizations available ● Corresponding to type of annotation (syntactic trees, phrase trees (RST), grids, coreferences ...)
  • 5. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 5 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL
  • 6. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 6 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters
  • 7. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 7 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters – You need to be exact → e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the blank)
  • 8. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 8 ANNIS basics ANNIS basics
  • 9. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 9 Enter query Corpus list Previous queries Virtual Keyboard (e.g. arabic)
  • 10. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 10 Sample queries (corresponding to corpus)
  • 11. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 11 Query result Visualizations
  • 12. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 12 Corpus metadata Corpus metadata window
  • 13. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 13 Document metadata Document metadata window
  • 14. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 14 ANNIS basics ● Basic principles of AQL (ANNIS Query Language) – Attributes and values ● Searching for exact character sequences ● Searching for patterns – Combinatory search
  • 15. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 15 Demo corpus ● Corpus for demonstration: pcc2 (a sub corpus of pcc) https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg ● Potsdam Commentary Corpus – German Newspaper commentaries 'Märkische Allgemeine Zeitung' https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html – Multiple annotations
  • 16. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 16 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  • 17. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 17 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  • 18. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 18 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" "jugendlichen"
  • 19. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 19 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" 3 hits "jugendlichen" 0 hits → tok="jugendlichen"
  • 20. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 20 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos = "NN" attribute value – Attributes can have more than one value – Searching for all values of an attribute
  • 21. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 21 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" pos="ADJA"
  • 22. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 22 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" 62 hits pos="ADJA" 18 hits searching for all values of an attribute pos 399 hits
  • 23. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 23 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s"
  • 24. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 24 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s" 28 hits
  • 25. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 25 Metadata ● Sent="s" 28 hits – necessary to know which annotations are in a corpus
  • 26. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 26 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ /jugend.*/
  • 27. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 27 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ 5 hits ("Jugendlichen" 3 hits) Jugendlichen Jugendliche /jugend.*/ 0 hits ("jugendlichen" 0 hits)
  • 28. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 28 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ includes NN & NE searching for all adjectives pos=/ADJ./ includes ADJA & ADJD
  • 29. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 29 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ 73 hits (pos="NN" 62 hits) searching for all adjectives pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
  • 30. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 30 Relations between annotations ● Span annotation searching for all NPs cat="NP" 41 hits (pos="NN" 62 hits) e.g. Die Jugendlichen in Zossen
  • 31. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 31 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" 41 hits pos="APPR" 19 hits e.g. Die Jugendlichen in Zossen → no relation between the two information!
  • 32. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 32 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" #1 pos="APPR" #2 e.g. Die Jugendlichen in Zossen → NP includes APPR
  • 33. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 33 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" & pos="APPR" & #1_i_#2 e.g. Die Jugendlichen in Zossen
  • 34. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 34 Hierarchy relations ● Relations between attributes searching for all NPs which are objects cat="NP" e.g. Die Jugendlichen in Zossen -->subject!
  • 35. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 35 Hierarchy relations ● Relations between attributes searching all NPs which are objects – NP → node annotation – OA → edge annotation To ke n To ke n To ke n Sp a n N o d e Ed ge
  • 36. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 36 Hierarchy relations ● Relations between attributes searching all NPs which are objects cat="NP" the syntactic function in the tree func="OA" → Note: At least there are two elements which relate in a way to each other!
  • 37. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 37 Hierarchy relations ● Relations between attributes searching all NPs which are objects node & cat="NP" & #1 >[func="OA"] #2 e.g. ein Musikcafé -->object!
  • 38. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 38 Used Relations ● Relations we used: A _i_ B A includes B A > B A dominates B A >[func=“OA“] B A dominates B and B is an object The full list of relations can be found in ANNIS
  • 39. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 39 What's new in ANNIS What's new in ANNIS version 3.1.7
  • 40. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 40 What's new in ANNIS ● Simplified syntax (AQL) ● Frequency analysis (Visualisierung) ● Expand match context (Visualisierung) ● Equality and Inequality (AQL) ● Variables (AQL) ● Complex OR expression (AQL) ● Document browser (Visualisierung) ● CSV export (Visualisierung) ● Tooltip for corpus names (Visualisierung) ● Report problem (Visualisierung)
  • 41. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 41 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4
  • 42. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 42 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4 Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  • 43. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 43 Frequency analysis ● Question: – How many words tagged as „NN“, „ADJA“ or „ADV“ does a corpus contain? – What are the most frequent part-of-speech tags followed by a noun? – What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? – ...
  • 44. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 44 Frequency analysis
  • 45. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 45 Frequency analysis
  • 46. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 46 Frequency analysis Attention: A frequency analysis has to be bound to a query!
  • 47. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 47 Frequency analysis ● What are the most frequent part-of-speech tags followed by a noun? ● What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? pos . pos="NN" cat="S" > cat="PP" > pos
  • 48. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 48 Expand match context ● Even more than 25 is possible, it's a free text field ● Sometimes the context is too small
  • 49. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 49 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  • 50. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 50 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes ● Question (equality): two same part-of-speech tags, one directly following the other pos . pos & #1 == #2
  • 51. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 51 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  • 52. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 52 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  • 53. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 53 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
  • 54. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 54 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug Variables and numbers can be mixed: cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
  • 55. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 55 Complex OR expression ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  • 56. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 56 Complex OR expression pos="NN" | pos="ADJA" | pos= "ART" ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article ● OR for expressions pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  • 57. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 57 Complex OR expression (cat="S" > cat="PP") | cat="NP" ● Question (complex OR): A prepositional phrase, which is dominated by a sentence, or just a nominal phrase
  • 58. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 58 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article
  • 59. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 59 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article Attention: All expressions in brackets have to use the same variable … & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
  • 60. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 60 Document browser ● Displays the entire text of a document
  • 61. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 61 Document browser
  • 62. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 62 CSV export ● Export data for futher processing
  • 63. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 63 Tooltips for corpus names ● Sometimes corpus names can get very long
  • 64. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 64 Report problem
  • 65. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 65 Get ANNIS ● ANNIS comes in two flavors – A server version – A desktop version (ANNIS kickstarter) – Both are downloadable at: http://www.sfb632.uni-potsdam.de/annis/ ● ANNIS is open source (Apache license 2.0) and hosted on github – https://github.com/korpling/ANNIS
  • 66. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 66 Thanks for your attention! Any questions? carolin.odebrecht@hu-berlin.de, f.zipser@gmx.de