SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Exploring Higher Order Dependency Parsers

             Pranava Swaroop Madhyastha

   Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman


                   September 6, 2011
Introduction

     ◮   Dependency Grammar.
           ◮   Binary asymmetric relations - Head and Modifier - Highly
               lexical relationships.
     ◮   A quick example:




     ◮   Projective Constraint
     ◮   Graph Based Dependency Parsing
           ◮   Arc-Factored Parsing
Problem Description?



    ◮   Augmentation of Features
          ◮   Semantic features
          ◮   Morpho-syntactic features
    ◮   Higher order parsing
          ◮   Context availability
          ◮   horizontal and vertical context availability

    ◮   Motivation
          ◮   Semi-supervised dependency parsing and improvements.
          ◮   Using well defined linguistic components.
What is Higher Order Dependency Parsing
    ◮   First-order model - decomposition of the tree into head and
        modifier dependencies.
    ◮   Second-order models - inclusion of sibling relation of the
        modifier tokens along with head and modifier or inclusion of
        head and modifier and children of the modifier.
    ◮   Third-order models - one level up.




    ◮   An illustration
Still Why?
Features


    ◮   For a given φ - a feature vector and w - the list of related
        parameters, each part is scored as

                              Part(x, p) = w .φ(x, p)                  (1)
    ◮   Each of these contributing feature vectors would be scored by
        calculating the individual features in this fashion:
           ◮   dir.pos(h).pos(m)
           ◮   dir.form(h).pos(m)
           ◮   and so on ...
    ◮   The most basic feature patterns consider the surface form,
        part-of-speech, lemma and other morphosyntactic attributes
        of the head or the modifier of a dependency.
Experimentation done with:

    ◮   English - Penn Treebank
          ◮   Section 2 to 10 as training set - a set of 15000 sentences.
          ◮   Random sets of sentences from sections 15, 17, 19, 25 of the
              Penn Treebank as development data - a set of 1000 sentences.
          ◮   Test set was chosen from Sections 0, 1, 21, 23 of the penn
              treebank - a set of 2000 sentences.
    ◮   Czech - Prague Dependency Treebank
          ◮   The sentences were chosen from pdt2-full-automorph dataset.
          ◮   The training set consisted of train1 - train5 splits - a set of
              15,000 sentences..
          ◮   The development set consisted of train6 and train7 splits - a
              set of 1000 sentences.
          ◮   The test set was made up of dtest and etest parts - a set of
              2000 sentences.
Experimentation

    ◮   Fine and Coarse Grained Wordsenses
    ◮   Approximation
    ◮   For English:
          ◮   Both Fine and Coarse Grained Wordsense extraction make use
              of WordNet::SenseRelate package.
          ◮   Fine grained wordsense basically restricts a word to a particular
              sense - Word - noun and first sense (extracted from the
              wordnet)
          ◮   Coarse Grained wordsense is a more generic wordsense
              description Word - the semantic file to which the word belongs
              to.
    ◮   For Czech:
          ◮   Only Fine Grained Wordsense extraction (approximately).
          ◮   extracted by using the sempos which is already tagged in the
              prague dependency treebank.
Results for the Wordsense augmentation experiment

    ◮   Sibling based parsers show a statistically significant
        improvement.
    ◮   For English with Fine Grained wordsense addition - Third
        order grand-sibling based parser gives an improvement of
        +0.81 percent (Unlabeled Accuracy Score). A closer
        statistical examination showed that sibling based interactions
        which are close to each other have better precision.
    ◮   For English with Coarse Grained wordsense addition - the
        second order sibling based parser gives an improvement of
        approximately +1.09 percent.
    ◮   Again for Czech with fine grained wordsense augmentation,
        the 3rd order sibling based parser gives an improvement of
        approximately +1.20 percent.
Results for Morphosyntactic augmentation experiment




    ◮   Morphosyntactic augmentation was basically used directly by
        extracting tags from the corpus.
    ◮   For Czech, instead of the 15 Letter tagset, we tried out a
        subset (which includes - Person, Number, POSSGender,
        Tense, Voice and Case)
    ◮   For English we integrated the fine grained part-of-speech.
Results




     ◮    Both for English and Czech, there is a significant
          improvement in the parsing accuracy when it is parsed with
          the grandchild based algorithms.
     ◮    For Czech, the third order grand sibling based algorithm
          shows an improvement of +1.72 percent.
     ◮    For English, the third order grand sibling based algorithm
          shows an improvement of +1.21 percent.
Conclusion



    ◮   Semantic features work better with sibling based parsers
        (larger horizontal contexts).
    ◮   Morpho-syntactic features work better with grandchild based
        parsers (larger vertical contexts).
    ◮   Features can be instrumental in several tasks, which include
        accurate labeling of semantic roles and other related tasks.
    ◮   Linguistic information can be better handled by a higher order
        parsing algorithm.
Future Work




    ◮   Higher order parsers with labels (we have not yet tested
        labeled accuracy scores).
    ◮   Joint extraction of word-senses and semantic roles.
    ◮   Experimentation with lexical clusters.
    ◮   Thorough experimentation of several features.
    ◮   Maximum and Minimum order requirements.
Thanks

Contenu connexe

Tendances

Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Shahriar Rafee
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...Ilia Karpov
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_SpringMizumoto Atsushi
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationAndi Wu
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
static dictionary technique
static dictionary techniquestatic dictionary technique
static dictionary techniquePaneliya Prince
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishJinho Choi
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 

Tendances (20)

Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
static dictionary technique
static dictionary techniquestatic dictionary technique
static dictionary technique
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in English
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
 

Similaire à Presentation

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationSurabhi Verma
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AISATHYANARAYANAKB
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesAndrés Vargas
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
Nikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russianNikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russianAIST
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Association for Computational Linguistics
 
Types of parsers
Types of parsersTypes of parsers
Types of parsersSabiha M
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsJinho Choi
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 

Similaire à Presentation (20)

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
NLP-my-lecture (3).ppt
NLP-my-lecture (3).pptNLP-my-lecture (3).ppt
NLP-my-lecture (3).ppt
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectives
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
semeval2016
semeval2016semeval2016
semeval2016
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russianNikolay Karpov - Single-sentence readability prediction in russian
Nikolay Karpov - Single-sentence readability prediction in russian
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
P99 1067
P99 1067P99 1067
P99 1067
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Presentation

  • 1. Exploring Higher Order Dependency Parsers Pranava Swaroop Madhyastha Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman September 6, 2011
  • 2. Introduction ◮ Dependency Grammar. ◮ Binary asymmetric relations - Head and Modifier - Highly lexical relationships. ◮ A quick example: ◮ Projective Constraint ◮ Graph Based Dependency Parsing ◮ Arc-Factored Parsing
  • 3. Problem Description? ◮ Augmentation of Features ◮ Semantic features ◮ Morpho-syntactic features ◮ Higher order parsing ◮ Context availability ◮ horizontal and vertical context availability ◮ Motivation ◮ Semi-supervised dependency parsing and improvements. ◮ Using well defined linguistic components.
  • 4. What is Higher Order Dependency Parsing ◮ First-order model - decomposition of the tree into head and modifier dependencies. ◮ Second-order models - inclusion of sibling relation of the modifier tokens along with head and modifier or inclusion of head and modifier and children of the modifier. ◮ Third-order models - one level up. ◮ An illustration
  • 6. Features ◮ For a given φ - a feature vector and w - the list of related parameters, each part is scored as Part(x, p) = w .φ(x, p) (1) ◮ Each of these contributing feature vectors would be scored by calculating the individual features in this fashion: ◮ dir.pos(h).pos(m) ◮ dir.form(h).pos(m) ◮ and so on ... ◮ The most basic feature patterns consider the surface form, part-of-speech, lemma and other morphosyntactic attributes of the head or the modifier of a dependency.
  • 7. Experimentation done with: ◮ English - Penn Treebank ◮ Section 2 to 10 as training set - a set of 15000 sentences. ◮ Random sets of sentences from sections 15, 17, 19, 25 of the Penn Treebank as development data - a set of 1000 sentences. ◮ Test set was chosen from Sections 0, 1, 21, 23 of the penn treebank - a set of 2000 sentences. ◮ Czech - Prague Dependency Treebank ◮ The sentences were chosen from pdt2-full-automorph dataset. ◮ The training set consisted of train1 - train5 splits - a set of 15,000 sentences.. ◮ The development set consisted of train6 and train7 splits - a set of 1000 sentences. ◮ The test set was made up of dtest and etest parts - a set of 2000 sentences.
  • 8. Experimentation ◮ Fine and Coarse Grained Wordsenses ◮ Approximation ◮ For English: ◮ Both Fine and Coarse Grained Wordsense extraction make use of WordNet::SenseRelate package. ◮ Fine grained wordsense basically restricts a word to a particular sense - Word - noun and first sense (extracted from the wordnet) ◮ Coarse Grained wordsense is a more generic wordsense description Word - the semantic file to which the word belongs to. ◮ For Czech: ◮ Only Fine Grained Wordsense extraction (approximately). ◮ extracted by using the sempos which is already tagged in the prague dependency treebank.
  • 9. Results for the Wordsense augmentation experiment ◮ Sibling based parsers show a statistically significant improvement. ◮ For English with Fine Grained wordsense addition - Third order grand-sibling based parser gives an improvement of +0.81 percent (Unlabeled Accuracy Score). A closer statistical examination showed that sibling based interactions which are close to each other have better precision. ◮ For English with Coarse Grained wordsense addition - the second order sibling based parser gives an improvement of approximately +1.09 percent. ◮ Again for Czech with fine grained wordsense augmentation, the 3rd order sibling based parser gives an improvement of approximately +1.20 percent.
  • 10. Results for Morphosyntactic augmentation experiment ◮ Morphosyntactic augmentation was basically used directly by extracting tags from the corpus. ◮ For Czech, instead of the 15 Letter tagset, we tried out a subset (which includes - Person, Number, POSSGender, Tense, Voice and Case) ◮ For English we integrated the fine grained part-of-speech.
  • 11. Results ◮ Both for English and Czech, there is a significant improvement in the parsing accuracy when it is parsed with the grandchild based algorithms. ◮ For Czech, the third order grand sibling based algorithm shows an improvement of +1.72 percent. ◮ For English, the third order grand sibling based algorithm shows an improvement of +1.21 percent.
  • 12. Conclusion ◮ Semantic features work better with sibling based parsers (larger horizontal contexts). ◮ Morpho-syntactic features work better with grandchild based parsers (larger vertical contexts). ◮ Features can be instrumental in several tasks, which include accurate labeling of semantic roles and other related tasks. ◮ Linguistic information can be better handled by a higher order parsing algorithm.
  • 13. Future Work ◮ Higher order parsers with labels (we have not yet tested labeled accuracy scores). ◮ Joint extraction of word-senses and semantic roles. ◮ Experimentation with lexical clusters. ◮ Thorough experimentation of several features. ◮ Maximum and Minimum order requirements.