SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Text Summarization - Machine Learning
    TEXT SUMMARIZATION
1   Kareem El-Sayed Hashem
    Mohamed Mohsen Brary
TEXT SUMMARIZATION
   Goal: reducing a text with a computer program in
    order to create a summary that retains the most
    important points of the original text.




                                                           Text Summarization - Machine Learning
   Summarization Applications
     summaries of email threads
     action items from a meeting
     simplifying text by compressing sentences




                                                       2
WHAT TO SUMMARIZE?
SINGLE VS. MULTIPLE DOCUMENTS
   Single Document Summarization
       Given a single document produce




                                                               Text Summarization - Machine Learning
         Abstract
         Outline

         Headline




   Multiple Document Summarization
       Given a group of document produce a gist of the
        document
         A series of news stories of the same event
         A set of webpages about some topic or question


                                                           3
QUERY-FOCUSED SUMMARIZATION
& GENERIC SUMMARIZATION
   Generic Summarization
       Summarize the content of a document




                                                                       Text Summarization - Machine Learning
   Query-focused Summarization
     Summarize a document with respect to an
      information need expressed in a user query
     A kind of complex question answering
           Answer a question by summarizing a document that has
            the information to construct the answer




                                                                   4
SUMMARIZATION FOR QUESTION
ANSWERING:
   Snippets
       Create snippets summarizing a web page for a query




                                                                        Text Summarization - Machine Learning
   Multiple Documents
       Create answer to complex questions summarizing
        multiple documents.
         Instead of giving a snippet for each document
         Create a cohesive answer that combines information from

          each document




                                                                    5
EXTRACTIVE SUMMARIZATION
& ABSTRACTIVE SUMMARIZATION
   Extractive Summarization:
       Create the summary from phrases or sentences in the
        source document(s)




                                                                  Text Summarization - Machine Learning
   Abstractive Summarization
       Express the ideas in the source document using
        different words




                                                              6
SUMMARIZATION: THREE STAGES
 Content Selection: choose sentences to extract
  from the document




                                                       Text Summarization - Machine Learning
 Information Ordering: choose an order to place
  them in the summary
 Sentence Realization: clean up the sentence




                                                   7
UNSUPERVISED CONTENT SELECTION
   Intuition Dating Back to Luhn (1958):
       Choose sentences that have distinguished or
        informative words




                                                                   Text Summarization - Machine Learning
   Two Approaches to Define distinguished words
       tf-idf: weigh each word wi in document j by tf-idf



       Topic signature: choose smaller set of distinguished
        words
           Log-likelihood ratio (LLR)


                                                               8
TOPIC SIGNATURE-BASED CONTENT
SELECTION WITH QUERIES

   Choose words that are informative either
       By log-likelihood ratio (LLR)




                                                       Text Summarization - Machine Learning
       Or by appearing in the query




       Weigh a sentence by weight of its words:


                                                   9
SUPERVISED CONTENT SELECTION
   Given
       A labeled training set of good summaries for each
        document




                                                               Text Summarization - Machine Learning
   Align
       The sentences in the document with sentences in the
        summary
   Extract Features
     Position
     Length of sentence
     Word informativeness
     Cohesion
                                                              10
SUPERVISED CONTENT SELECTION
   Train
       A binary classifier (put sentence in summary? Yes or
        no)




                                                                Text Summarization - Machine Learning
   Problems
       Hard to get labeled training data
       Alignment is difficult
       Performance not better that unsupervised algorithm




                                                               11
EVALUATING SUMMARIES: ROUGE
   ROUGE “ Recall Oriented Understudy for
    Gisting Evaluation ”




                                                    Text Summarization - Machine Learning
   Internal metric for automatically evaluating
    summaries
     Based on BLEU (a metric used for machine
      translation)
     Not as good as human evaluation.
     But much more convenient

                                                   12
EVALUATING SUMMARIES: ROUGE
   Given a document D, and an automatic
    summary X:




                                                           Text Summarization - Machine Learning
     Have N humans produce a set of reference
      summaries of D
     Run System, giving automatic summary X
     What percentage of the bigrams from the reference
      summaries appear in X?




                                                          13
EXAMPLE
 Human 1: water spinach is a green leafy
  vegetable grown in the tropics.




                                                 Text Summarization - Machine Learning
 Human 2: water spinach is a semi-aquatic
  tropical plant grown as a vegetable.
 Human 3: water spinach is a commonly eaten
  leaf vegetable of Asia.

   System: water spinach is a leaf vegetable
    commonly eaten in tropical areas of Asia.

   ROUGE -2=                = 12/28 = 0.43     14
ANSWERING HARDER QUESTION:
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION

   The (bottom-up) snippet method
       Find a set of relevant documents




                                                                 Text Summarization - Machine Learning
       Extract informative sentences form the documents
       Order and modify the sentences into an answer



   The(top-down) information extraction method
       Build specific answers for different questions types:
         Definition questions
         Biography questions

         Certain medical questions

                                                                15
QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION




                                Text Summarization - Machine Learning
                               16
MAXIMAL MARGINAL RELEVANCE (MMR)
 An iterative method for content selection from
  multiple documents




                                                          Text Summarization - Machine Learning
 Iteratively (greedily) choose the best sentence to
  insert in the summary/answer so far:
       Relevant: maximally relevant to the user query
           High cosine similarity to the query
       Novel: minimally redundant with the summary so
        far:
           Low cosine similarity to the summary




                                                         17
   Stop when desired length
LLR + MMR CHOOSING INFORMATIVE YET
NON-REDUNDANT SENTENCES

   One of many ways to combine the intuitions of
    LLR and MMR:




                                                           Text Summarization - Machine Learning
     Score each sentence based on LLR (including query
      words)
     Include the sentence with highest score in the
      summary
     Iteratively add into the summary high-scoring
      sentences that are not redundant with the summary
      so far.


                                                          18
INFORMATION ORDERING
   Chronological ordering:
       Order sentences by the date of the document “ for
        summarizing news”




                                                               Text Summarization - Machine Learning
   Coherence:
     Choose ordering that make neighboring sentences
      similar(by cosine similarity)
     Choose ordering in which neighboring sentences
      discuss the same entity


   Topical ordering
                                                              19
       Learn the ordering of topics in the source document
DOMAIN-SPECIFIC ANSWERING:
THE INFORMATION EXTRACTION METHOD

   A good biography of a person contains:
       A person’s birth/death, fame factor, education …etc




                                                               Text Summarization - Machine Learning
   A good definition contains
       Type or category “ The Hajj is a type of ritual ”
   A medical answer about a drug’s use contains:
     The problem : medical condition
     The intervention : drug or procedure
     The outcome : the result of the study




                                                              20
INFORMATION THAT SHOULD BE IN THE
ANSWER FOR 3 KINDS OF QUESTIONS




                                     Text Summarization - Machine Learning
                                    21
ARCHITECTURE FOR ANSWERING COMPLEX
QUESTIONS




                                      Text Summarization - Machine Learning
                                     22
Text Summarization - Machine Learning
                                                                             23
              NLP Stanford course.
REFERENCES:
                    
Text Summarization - Machine Learning
                        THANK YOU 
                                      24

Contenu connexe

Tendances

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeGaetano Rossiello, PhD
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text SummarizationTho Phan
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Syntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address CodeSyntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address Codesanchi29
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learningAbu Kaisar
 

Tendances (20)

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
NLP
NLPNLP
NLP
 
TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Text categorization
Text categorizationText categorization
Text categorization
 
Language models
Language modelsLanguage models
Language models
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Syntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address CodeSyntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address Code
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Text mining
Text miningText mining
Text mining
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learning
 

Similaire à Text summarization

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguisticsAhmad Mashhood
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizerijcsit
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news articleIJDKP
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxKevinSims18
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
Article Summarizer
Article SummarizerArticle Summarizer
Article SummarizerJose Katab
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
 
Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationMasud Rahman
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with WikipediaAshish Kulkarni
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Association for Computational Linguistics
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Side final 2
Side final 2Side final 2
Side final 2ARYA TM
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 

Similaire à Text summarization (20)

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizer
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news article
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docx
 
K0936266
K0936266K0936266
K0936266
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 
Supporting program comprehension with source code summarization
Supporting program comprehension with source code summarizationSupporting program comprehension with source code summarization
Supporting program comprehension with source code summarization
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Side final 2
Side final 2Side final 2
Side final 2
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 

Dernier

Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 

Dernier (20)

Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 

Text summarization

  • 1. Text Summarization - Machine Learning TEXT SUMMARIZATION 1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  • 2. TEXT SUMMARIZATION  Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning  Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  • 3. WHAT TO SUMMARIZE? SINGLE VS. MULTIPLE DOCUMENTS  Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline  Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  • 4. QUERY-FOCUSED SUMMARIZATION & GENERIC SUMMARIZATION  Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning  Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  • 5. SUMMARIZATION FOR QUESTION ANSWERING:  Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning  Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  • 6. EXTRACTIVE SUMMARIZATION & ABSTRACTIVE SUMMARIZATION  Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning  Abstractive Summarization  Express the ideas in the source document using different words 6
  • 7. SUMMARIZATION: THREE STAGES  Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning  Information Ordering: choose an order to place them in the summary  Sentence Realization: clean up the sentence 7
  • 8. UNSUPERVISED CONTENT SELECTION  Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning  Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  • 9. TOPIC SIGNATURE-BASED CONTENT SELECTION WITH QUERIES  Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  • 10. SUPERVISED CONTENT SELECTION  Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning  Align  The sentences in the document with sentences in the summary  Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  • 11. SUPERVISED CONTENT SELECTION  Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning  Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  • 12. EVALUATING SUMMARIES: ROUGE  ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning  Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  • 13. EVALUATING SUMMARIES: ROUGE  Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  • 14. EXAMPLE  Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning  Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable.  Human 3: water spinach is a commonly eaten leaf vegetable of Asia.  System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia.  ROUGE -2= = 12/28 = 0.43 14
  • 15. ANSWERING HARDER QUESTION: QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION  The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer  The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  • 16. QUERY-FOCUSED MULTI-DOCUMENT SUMMARIZATION Text Summarization - Machine Learning 16
  • 17. MAXIMAL MARGINAL RELEVANCE (MMR)  An iterative method for content selection from multiple documents Text Summarization - Machine Learning  Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17  Stop when desired length
  • 18. LLR + MMR CHOOSING INFORMATIVE YET NON-REDUNDANT SENTENCES  One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  • 19. INFORMATION ORDERING  Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning  Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity  Topical ordering 19  Learn the ordering of topics in the source document
  • 20. DOMAIN-SPECIFIC ANSWERING: THE INFORMATION EXTRACTION METHOD  A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning  A good definition contains  Type or category “ The Hajj is a type of ritual ”  A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  • 21. INFORMATION THAT SHOULD BE IN THE ANSWER FOR 3 KINDS OF QUESTIONS Text Summarization - Machine Learning 21
  • 22. ARCHITECTURE FOR ANSWERING COMPLEX QUESTIONS Text Summarization - Machine Learning 22
  • 23. Text Summarization - Machine Learning 23 NLP Stanford course. REFERENCES: 
  • 24. Text Summarization - Machine Learning THANK YOU  24