isd312-09-summarization

Pertemuan 9: Summarization
12 Desember 2011

 Summarization
 Diberikan sebuah dokumen (korpus), ringkas dalam
kata-kata yang mewakili isinya
 Extractive summarization
 kata-kata kunci
 Generative summarization
 Kalimat ringkasan

Information Retrieval – ISD312 Summarization 2

 Simple statistics
 Most frequent words

import nltk
from __future__ import division
from nltk.book import *


import nltk
from __future__ import division
from nltk.book import *

def kataKunci(df, ambang):
max = 0
for vocab in df.keys():
if max < df[vocab]:
max = df[vocab]
for vocab in df.keys():
if df[vocab] / max > ambang:
print vocab,
print ''


 Frase, Kumpulan kata
 Collocations
 Jaringan kata dalam dokumen


 Membangkitkan kalimat
 Simple statistics
 Tabel statistik kemunculan kata
 Statistik Bayesian
 Probabilitas sebuah kata pada awal kalimat
 Probabilitas sebuah kata mengikuti kata lainnya
 Metode lain
 N-gram
 POS-tag


The rapid growth of the Internet has resulted in enormous
amounts of information that has become more difficult to access
efficiently. Internet users require tools to help manage this vast
quantity of information. The primary goal of this research is to
create an efficient and effective tool that is able to summarize
large documents quickly. This research presents a linear time
algorithm for calculating lexical chains which is a method of
capturing the “aboutness” of a document. This method is
compared to previous, less efficient methods of lexical chain
extraction. We also provide alternative methods for extracting
and scoring lexical chains. We show that our method provides
similar results to previous research, but is substantially more
efficient. This efficiency is necessary in Internet search
applications where many large documents may need to be
summarized at once, and where the response time to the end
user is extremely important.


import os
os.chdir('pathtotugas')
import tugas
reload(tugas)


import nltk
data = 'Sebuah contoh kalimat yang ingin
dianalisis menggunakan NLTK'
tokens = nltk.word_tokenize(data)
text = nltk.Text(tokens)


 http://www.nltk.org/book
 http://tjerdastangkas.blogspot.com/search/label/isd312


isd312-09-summarization

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à isd312-09-summarization

Similaire à isd312-09-summarization (20)

Plus de Anung Ariwibowo

Plus de Anung Ariwibowo (20)

Dernier

Dernier (20)

isd312-09-summarization