Driving Behavioral Change for Information Management through Data-Driven Gree...
Nif practical
1. 1
The NIF format (hands on)
Annotating Strings and Documents using the
NLP Interchange Format
2. 2
Practical session outcomes
• Participants will learn to use NIF API to
annotate strings and documents using
the following wrappers:
–OpenNLP
–Stanford Core NLP
–Snowball Stemmer
–DBpedia Spotlight
• Query your corpus using SPARQL
4. 4
Snowball Stemmer Wrapper
• Stemming algorithm is a process
for removing suffixes from words.
–CONNECT
• CONNECTED
• CONNECTION
• CONNECTING
• CONNECTIONS
5. 5
Snowball Stemmer Wrapper
java -jar snowball.jar -f text -i 'I am
connected.'
• -f is used to define the format
• -i is used to define the input
10. 10
Annotating Strings: Step-by-step
• 1. Open the USB stick folder
• 2. Decompress the “session-nif.zip” folder
• 3. Open the “NIF_DATATHON” folder and
decompress
“NIF_tutorial_hands_on_jars.zip”
• Open the prompt command, and use the
commands from the next slide in the “jar”
folder.
11. 11
Available Wrappers
• To annotate documents, use the local wrappers (USB Stick)
java -jar opennlp.jar -f text -i 'This is a test.' -modelFolder ../model/
java -jar stanford.jar -f text -i 'This is a test.'
java -jar snowball.jar -f text -i 'This is my favorite test.'
java -jar spotlight.jar -f text -i 'Welcome to Germany.' -confidence 0.2
• To annotate small strings, you can try the on-line services:
http://spotlight.nlp2rdf.aksw.org/spotlight?
f=text&i=Welcome+to+Germany.&t=direct&confidence=0.3&prefix=http://yourDomain.org/
• http://snowball.nlp2rdf.aksw.org/snowball?
f=text&i=This+is+my+favorite+test.&t=direct&prefix=http://yourDomain.org/
• http://stanford.nlp2rdf.aksw.org/stanfordcorenlpn?
f=text&i=This+is+a+test.&t=direct&prefix=http://yourDomain.org/
• http://opennlp.nlp2rdf.aksw.org/opennlp?
f=text&i=This+is+a+test.&t=direct&modelFolder=model&prefix=http://yourDomain.org
12. 12
Reading and Writing Files
• Write results in a file:
“--outfile myAnnotatedFile.ttl“
• Read a document as input
“--intype file -i /path/myDoc”
13. 13
POS tagger for multiple languages
• The -modelFolder parameter set the folder
that contains the POS tagging OpenNLP
trained models and tokenization.
• Different languages can be found at
OpenNLP website
http://opennlp.sourceforge.net/models-
1.5/http://opennlp.sourceforge.net/models-1.5/
30. 30
Querying your own NIF annotated
corpus
1. Annotate your string using one of the
wrappers
2. Save your annotated sentence to a file
(using “--outfile”)
3. Open Twinkle
4. Query your corpus using Twinkle
31. 31
• Query your annotated corpus:
– nif:Context
– nif:Sentence
– nif:anchorOf
– nif:oliaCategory
– nif:oliaLink
… or practice with Brown Corpus!