SlideShare une entreprise Scribd logo
1  sur  19
Discourse Annotation for
Arabic
Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr
Supervisor: Amal Al-Saif
Natural Language Processing - CS465
11-6-1434 H
Outline
• Leeds Arabic Discourse Treebank
• Discourse Annotation
• Arabic language characteristics
• Discourse relations
• Characteristics of Modern Standard Arabic
• Arabic Discourse Connectives
• Agreement Studies
• Discourse Connective Recognition
• Result of Discourse Connective Recognition
• Discourse Relation Recognition
• Result of Discourse Relation Recognition
• Conclusion
Leeds Arabic Discourse Treebank
• The Leeds Arabic Discourse Treebank LADTB v1 is the first
discourse Treebank for MSA
• LADTB has similar annotation principles as PDTB project for
English, Turkish, Hindi and Chinese discourse TB
• Although LADTB was built to be a gold standard for automatic
discourse processing studies
Discourse Annotation
• Discourse relations such as CAUSAL or CONTRAST
relations between textual units play an important role in
producing a coherent discourse.
• In defining discourse connectives as lexical expressions that
relate two text segments (arguments) that express abstract
entities such as events, belief, facts or propositions ( /lkn/but,
/Aw/or).
Contrast
Causal
Discourse Annotation
• Applications using discourse annotation:
• Automatic summarization
• Question answering
• Sentiment analysis
• Readability assessment
• Arabic discourse connectives are ambiguity.
• Explicit discourse connectives.
• The variety of Arabic discourse connectives.
• The annotation principles designed to annotate discourse
connectives in English in the PDTB2, can be applied to
reliably annotate discourse connectives in Arabic newswire.
• Machine learning models can be used to identify discourse
connectives and relations in Arabic newswire.
• Supervised machine learning models can identify Arabic
discourse connectives and their relations with high reliability.
Arabic Language Characteristics
Discourse Relations
• Explicit discourse relations:
[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2
• Implicit discourse relations:
[He has to stay in bed.]Arg1 [He has the flu.]Arg2
Characteristics of Modern Standard Arabic
Characteristics of Modern Standard Arabic
Al-maSdar noun:
Characteristics of Modern Standard Arabic
EnglishAl-masdar nounMorph. PatternRoot
swimmingSbh
reflectionEks
experimentJrb
warHrb
defenceDfe
Al-maSdar noun:
• Word order in Arabic. (verb –subject –object)
• Punctuations in Arabic.
Characteristics of Modern Standard Arabic
Arabic Discourse Connectives
• Conjunctions ( /lkn/but, /Aw/or or /w/and)
• Adverbial ( /TAlmA.. f../as-long-as)
• Prepositional phrases, prepositions also can link discourse segments
when one or both arguments are al-maSdar nouns.
some nouns such as ( /ntyjp/result, /ks.yp/fear and
/bqyp/desire) are used as discourse connectives in Arabic.
The discourse connectives in Arabic might occur:
• Individually such as ( /lkn/however).
• In conjunction with other connectives using the coordinating conjunction
/w/and such as ( /lkn w qbl/however and before).
• As multiple connectives without conjunction such as ( /AlA bEd/
except after).
Agreement Studies
• TASK I :
measures whether annotators agree on the binary decision on
whether an item constitutes a discourse connective in context.
• TASK 2:
measures whether annotators agree on which discourse relation an
identified connective expresses.
The agreement was measured for the distinction of discourse vs.
non-discourse usage, relation assignment and argument
assignment:
agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
Discourse Connective Recognition
• Surface Features (SConn).
• Lexical features of surrounding words (Lex).
Arg1DCArg2
• Part of Speech features (POS).
• Syntactic category of related phrases (Syn).
non-discourse usage of w/and ( / ¯almdrsh
kbyrh w ˇgmylh/ the school is very large and beautiful).
• Al-Masdar feature.
Result
Acurr KFeatures
68.9 0Baseline (not conn)
75.7 0.48Conn onlyM1
Tokenization by white space + auto tagger
85.6 0.62Conn + SConn + LexM2
87.6 0.69Conn + SConn + Lex + POSM3
88.5 0.70Conn + SConn + Lex + POS + MasdarM4
ATB – based features
86.2 0.65Conn + SConn + LexM5
91.2 0.79Conn + SConn + Lex + Syn/POSM6
92.4 0.82Conn + Sconn + Lex + Syn/POS + MasdarM7
91.2 0.79Conn + Sconn + SynM8
91.2 0.79Sconn + Lex + Syn + MasderM9
Discourse Relation Recognition
• Words and POS of arguments.
• Masdar.
• Tense and Negation.
• Length, Distance and Order Features.
• Production Rules.
Result
Acurr KFeaturesRef
All connectives (6039)
52.5 0Baseline (CONJUNCTION)
77.2 0.60Conn only (1)M1
78.8 0.66Conn + Conn_f + Arg_f (37)M2
78.3 0.65Conn + Conn_f + Arg_f + Production
rules (1237)
M3
Excluding wa at BOP (3813)
35 0Baseline (CONJUNCTION)
74.3 0.65Conn only (1)M1
77 0.69Conn + Conn_f + Arg_f (37)M2
76.7 0.69Conn + Conn_f + Arg_f + Production
rules (1237)
M3
Result
Acurr KFeaturesRef
All connectives (6039)
62.4 0Baseline (EXPANSION)
88.7 0.78Conn only (1)M1
88.7 0.78Conn + Conn_f + Arg_f (37)M2
Excluding wa at BOP (3813)
41.8 0Baseline (EXPANSION)
82.7 0.74Conn only (1)M1
83.5 0.75Conn + Conn_f + Arg_f (37)M2
Conclusion:
We talked about Arabic discourse annotation;
discourse connective and relations. We also show
Arabic language characteristics which related to this
subject and the result.

Contenu connexe

En vedette

En vedette (9)

Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Similaire à Discourse annotation for arabic

Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
syila239
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
mimisy
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attia
Mohammed Attia
 

Similaire à Discourse annotation for arabic (20)

DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith Ruby
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
 
Supporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwareSupporting the authoring process with linguistic software
Supporting the authoring process with linguistic software
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense Disambiguation
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
Incrementality
IncrementalityIncrementality
Incrementality
 
OWL briefing
OWL briefingOWL briefing
OWL briefing
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attia
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Metaprogramming patterns
Metaprogramming patternsMetaprogramming patterns
Metaprogramming patterns
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Discourse annotation for arabic

  • 1. Discourse Annotation for Arabic Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr Supervisor: Amal Al-Saif Natural Language Processing - CS465 11-6-1434 H
  • 2. Outline • Leeds Arabic Discourse Treebank • Discourse Annotation • Arabic language characteristics • Discourse relations • Characteristics of Modern Standard Arabic • Arabic Discourse Connectives • Agreement Studies • Discourse Connective Recognition • Result of Discourse Connective Recognition • Discourse Relation Recognition • Result of Discourse Relation Recognition • Conclusion
  • 3. Leeds Arabic Discourse Treebank • The Leeds Arabic Discourse Treebank LADTB v1 is the first discourse Treebank for MSA • LADTB has similar annotation principles as PDTB project for English, Turkish, Hindi and Chinese discourse TB • Although LADTB was built to be a gold standard for automatic discourse processing studies
  • 4. Discourse Annotation • Discourse relations such as CAUSAL or CONTRAST relations between textual units play an important role in producing a coherent discourse. • In defining discourse connectives as lexical expressions that relate two text segments (arguments) that express abstract entities such as events, belief, facts or propositions ( /lkn/but, /Aw/or). Contrast Causal
  • 5. Discourse Annotation • Applications using discourse annotation: • Automatic summarization • Question answering • Sentiment analysis • Readability assessment
  • 6. • Arabic discourse connectives are ambiguity. • Explicit discourse connectives. • The variety of Arabic discourse connectives. • The annotation principles designed to annotate discourse connectives in English in the PDTB2, can be applied to reliably annotate discourse connectives in Arabic newswire. • Machine learning models can be used to identify discourse connectives and relations in Arabic newswire. • Supervised machine learning models can identify Arabic discourse connectives and their relations with high reliability. Arabic Language Characteristics
  • 7. Discourse Relations • Explicit discourse relations: [He took my photo,]Arg2 [while]DC [I was having dinner]Arg2 • Implicit discourse relations: [He has to stay in bed.]Arg1 [He has the flu.]Arg2
  • 8. Characteristics of Modern Standard Arabic
  • 9. Characteristics of Modern Standard Arabic Al-maSdar noun:
  • 10. Characteristics of Modern Standard Arabic EnglishAl-masdar nounMorph. PatternRoot swimmingSbh reflectionEks experimentJrb warHrb defenceDfe Al-maSdar noun:
  • 11. • Word order in Arabic. (verb –subject –object) • Punctuations in Arabic. Characteristics of Modern Standard Arabic
  • 12. Arabic Discourse Connectives • Conjunctions ( /lkn/but, /Aw/or or /w/and) • Adverbial ( /TAlmA.. f../as-long-as) • Prepositional phrases, prepositions also can link discourse segments when one or both arguments are al-maSdar nouns. some nouns such as ( /ntyjp/result, /ks.yp/fear and /bqyp/desire) are used as discourse connectives in Arabic. The discourse connectives in Arabic might occur: • Individually such as ( /lkn/however). • In conjunction with other connectives using the coordinating conjunction /w/and such as ( /lkn w qbl/however and before). • As multiple connectives without conjunction such as ( /AlA bEd/ except after).
  • 13. Agreement Studies • TASK I : measures whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context. • TASK 2: measures whether annotators agree on which discourse relation an identified connective expresses. The agreement was measured for the distinction of discourse vs. non-discourse usage, relation assignment and argument assignment: agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
  • 14. Discourse Connective Recognition • Surface Features (SConn). • Lexical features of surrounding words (Lex). Arg1DCArg2 • Part of Speech features (POS). • Syntactic category of related phrases (Syn). non-discourse usage of w/and ( / ¯almdrsh kbyrh w ˇgmylh/ the school is very large and beautiful). • Al-Masdar feature.
  • 15. Result Acurr KFeatures 68.9 0Baseline (not conn) 75.7 0.48Conn onlyM1 Tokenization by white space + auto tagger 85.6 0.62Conn + SConn + LexM2 87.6 0.69Conn + SConn + Lex + POSM3 88.5 0.70Conn + SConn + Lex + POS + MasdarM4 ATB – based features 86.2 0.65Conn + SConn + LexM5 91.2 0.79Conn + SConn + Lex + Syn/POSM6 92.4 0.82Conn + Sconn + Lex + Syn/POS + MasdarM7 91.2 0.79Conn + Sconn + SynM8 91.2 0.79Sconn + Lex + Syn + MasderM9
  • 16. Discourse Relation Recognition • Words and POS of arguments. • Masdar. • Tense and Negation. • Length, Distance and Order Features. • Production Rules.
  • 17. Result Acurr KFeaturesRef All connectives (6039) 52.5 0Baseline (CONJUNCTION) 77.2 0.60Conn only (1)M1 78.8 0.66Conn + Conn_f + Arg_f (37)M2 78.3 0.65Conn + Conn_f + Arg_f + Production rules (1237) M3 Excluding wa at BOP (3813) 35 0Baseline (CONJUNCTION) 74.3 0.65Conn only (1)M1 77 0.69Conn + Conn_f + Arg_f (37)M2 76.7 0.69Conn + Conn_f + Arg_f + Production rules (1237) M3
  • 18. Result Acurr KFeaturesRef All connectives (6039) 62.4 0Baseline (EXPANSION) 88.7 0.78Conn only (1)M1 88.7 0.78Conn + Conn_f + Arg_f (37)M2 Excluding wa at BOP (3813) 41.8 0Baseline (EXPANSION) 82.7 0.74Conn only (1)M1 83.5 0.75Conn + Conn_f + Arg_f (37)M2
  • 19. Conclusion: We talked about Arabic discourse annotation; discourse connective and relations. We also show Arabic language characteristics which related to this subject and the result.