SlideShare une entreprise Scribd logo
1  sur  5
Arabic
Most of the developed techniques for topic modeling are language agnostic.
The models can train on any vocabulary. However, once trained, they can be used only with
documents having the same fixed vocabulary specific to the training.
A trained model cannot handle unknown tokens and cannot be easily applied to other languages.
Moreover, in Neural Topic Models deploying various embeddings to represent the input corpus,
these models’ performance depends on the quality of the obtained embeddings specific to the
training language.
This language dependence creates different challenges for different languages that dictate
different handling.
Recent advancements in applying topic modeling to Arabic texts include:
•Improved pre-processing techniques:
•Advanced models:.
•Incorporation of Sentiment analysis:
•Incorporation of Named Entity Recognition:
•Incorporation of Word Embedding:
•Handling unstructured data:
•Handling dialectal Arabic:
•The first stage is dataset acquisition.
•The second stage is for preprocessing the datasets. Preprocessing includes tokenization,
removing punctuation, removing stopwords, tagging, and constructing n-grams. Text normalization
is essential.
•In this study, and for simplicity, only 1-gram tokens were included, and all but noun tokens were
removed.
•This preprocessing results in a smaller number of tokens and slightly shorter documents which
challenges the neural models whose performance is impacted by the corpus’s vocabulary size and
the length of the documents [19]. The preprocessing steps were implemented using CAMEL tools
[15].
Proposal.pptx

Contenu connexe

Similaire à Proposal.pptx

Similaire à Proposal.pptx (20)

Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
NPL.pptx
NPL.pptxNPL.pptx
NPL.pptx
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Encouraging autonomy through technology-enhanced tools
Encouraging autonomy through technology-enhanced toolsEncouraging autonomy through technology-enhanced tools
Encouraging autonomy through technology-enhanced tools
 
Learning Analytics and Spelling Acquisition in German - the Path to Indivdual...
Learning Analytics and Spelling Acquisition in German - the Path to Indivdual...Learning Analytics and Spelling Acquisition in German - the Path to Indivdual...
Learning Analytics and Spelling Acquisition in German - the Path to Indivdual...
 
Oop.pptx
Oop.pptxOop.pptx
Oop.pptx
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
 
APP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptxAPP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptx
 
600Desc
600Desc600Desc
600Desc
 
600Desc
600Desc600Desc
600Desc
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMCrafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Proposal.pptx

  • 2. Most of the developed techniques for topic modeling are language agnostic. The models can train on any vocabulary. However, once trained, they can be used only with documents having the same fixed vocabulary specific to the training. A trained model cannot handle unknown tokens and cannot be easily applied to other languages. Moreover, in Neural Topic Models deploying various embeddings to represent the input corpus, these models’ performance depends on the quality of the obtained embeddings specific to the training language. This language dependence creates different challenges for different languages that dictate different handling.
  • 3. Recent advancements in applying topic modeling to Arabic texts include: •Improved pre-processing techniques: •Advanced models:. •Incorporation of Sentiment analysis: •Incorporation of Named Entity Recognition: •Incorporation of Word Embedding: •Handling unstructured data: •Handling dialectal Arabic:
  • 4. •The first stage is dataset acquisition. •The second stage is for preprocessing the datasets. Preprocessing includes tokenization, removing punctuation, removing stopwords, tagging, and constructing n-grams. Text normalization is essential. •In this study, and for simplicity, only 1-gram tokens were included, and all but noun tokens were removed. •This preprocessing results in a smaller number of tokens and slightly shorter documents which challenges the neural models whose performance is impacted by the corpus’s vocabulary size and the length of the documents [19]. The preprocessing steps were implemented using CAMEL tools [15].

Notes de l'éditeur

  1. Recent advancements in topic modeling for Arabic texts have focused on several areas, including: Improved pre-processing techniques: Researchers have developed new pre-processing techniques specifically designed for Arabic text that can help to improve the effectiveness of topic modeling algorithms. These techniques include text normalization, stemming, and removing diacritical marks. Advanced models: Researchers have developed advanced topic modeling algorithms specifically designed for Arabic text, such as the Arabic Latent Dirichlet Allocation (LDA) model and BERT-based models which can effectively identify topics in Arabic texts. Incorporation of Sentiment analysis: Some recent research have been incorporating Sentiment analysis techniques to the topic modeling algorithms to get a better understanding of the topics and the authors' attitudes towards the topic. Incorporation of Named Entities Recognition: With the help of Named Entities Recognition techniques, researchers have been able to extract important entities from the text and use them in topic modeling to get a more specific understanding of the topics. Handling unstructured data: some recent studies have been focused on handling unstructured data such as social media text, which is highly relevant for topic modeling on Arabic texts. Handling dialectal Arabic: Some recent works have been focusing on handling dialectal Arabic, which is different than the formal Arabic and has its own vocabulary, grammar and syntax. Incorporation of Word Embedding: Researchers have been using word embeddings to improve topic modeling performance. Word embeddings map words to high-dimensional vectors and can be used to capture semantic and syntactic information about words, which can help to improve the accuracy of topic modeling algorithms.