SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
1




—                       —



    Hilofumi Yamamoto




    December 13, 2007
2




•
    –
    –
    –
    – 1000


•                    (Goodenough, 1981)
                 (
             )
3




•   —
•
•
•


•
4




•
•
•
•
5




•   (2005)—
•   (2006)—
•
•
•
•
6




•
•
•   (   , 1983;   , 1989)
•
7




                                                                                                          )                                       )
                        )                        )                       07
                                                                           )                            86            4) 44)                 ) 205
                      05                       51                       0                            (1
                                                                                                       0             2
                                                                                                                   11 •11              18 (1
                                                                                                                                            8
                (   •9                   (   •9                   (   •1                                        (•     (            (1
            8                      q=
                                     8                        8                        d   =8                =8                          =8
       :# =                      e@                       0d=                      =&0                     MU l2V=8            =8 78E:#
 8   E                       8                         =&                       8e                      6b      ;           @i: ?
                46                            56                          79                    38            20          44       17
 ¡




                             ¡




                                                       ¡




                                                                                ¡



                                                                                                        ¡

                                                                                                                   ¡




                                                                                                                               ¡

                                                                                                                                        ¡
900                         950                      1000                1050       1100                           1150            1200         1250
8




1.
2.         (1976)
     •
     •
3.         (1991)


4.       (1998)
9




•


•
•
•
→
10




•
•             9484
    (                                        )
• kh              (β   )
•         (                     )      t2c


•
•       (48732)        (1408)       (49)
11




/$N / Fb /$K / =U /$O / Mh / $K / $1$j / 2) /$N / E`$l / $k / N^ / :# /$d / 2r$/ / $i$`




•     –            –            –        ...
                                                       ..
12




•
•
•
13




•
•
     (    , 1983)
•
     (    , 1996)


    idf (inverse document frequency)
                     (       )
14



idf (Sp¨rck Jones, 1972)
       a

                        N
     idf (t, N ) = log
                       df (t)


                         N
  idf (ari, N ) = log            (1)
                      df (ari)
                      9484
                = log            (2)
                      1201
                = log 7.89..     (3)
                = 2.07..         (4)
15



idf (Sp¨rck Jones, 1972)
       a

                          N
       idf (t, N ) = log
                         df (t)


                            N
idf (uguisu, N ) = log               (5)
                       df (uguisu)
                       9484
                 = log               (6)
                        101
                 = log 93.90..       (7)
                  = 4.54..           (8)
16



                 3500
                                        L-Shape Freq-Type

                 3000


                 2500
number of type




                 2000


                 1500


                 1000


                 500


                   0
                        0   200 400 600 800 100012001400160018002000
                                         frequency
17



                 1200             idf
                                        J-Shape IDF-Type


                 1000


                            idf
                 800
number of type



                                        idf
                            idf
                 600


                 400


                 200


                   0
                        1    2    3    4    5    6    7    8   9
                            inverse document frequency (idf)
18




•                                (     )




•


• tfidf

         w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )
19



                                     (cw)


           w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )      (9)
                              √
      cidf (t1 , t2 , N ) =    idf (t1 , N ) idf (t2 , N )   (10)
         ctf (t1 , t2 , K) = 1 + log |{k : t1 , t2 ∈ k}|     (11)


• K

• (10)

• (11)     K

•
20



                                           cidf

                                                               ˙
                        1000
frequency of patterns




                        800



                        600



                        400



                        200



                          0
                               0   1   2   3      4    5   6   7   8   9
                                                  cidf
21



                                                     (cw)


                                       |N |
ictf (t1 , t2 , N ) = 1 + log                                                   (12)
                               |{n : t1 , t2 ∈ n}|
     cw(t1 , t2 ) = ctf (t1 , t2 , K) ictf (t1 , t2 , N ) cidf (t1 , t2 , N )   (13)

         • K                                     N

         •

         •                       K

         •                       N
22



                                             cw
                                   900
                                                              ¨       ‚¯”£
                                                                         1
cumulative frequency of patterns              8                          2
                                   800                                   3
                                                                         4
                                   700        1                          5
                                                                         6
                                                                         7
                                   600                                   8
                                              3
                                   500

                                   400
                                              7
                                              2
                                   300

                                   200        5                        cw     z
                                              6

                                   100        4

                                    0
                                         0   10   20 30 40 50 60 70 80            90 100
                                                  co-occurrence weight (cw)
23



1σ




         16
     (        )
24
25
26
27
28
          {      |   }              (1)
                                     {    | }
        t1 –t2         cw       z   ctf   idf (t1 )   idf (t2 )
(24)       –         86.06   3.33    10      3.18        4.63
           –         65.15   1.76     5      3.18        3.26
           –         64.32   1.70     2      3.43        4.69
           –         63.36   1.62     2      3.18        4.92
           –         61.87   1.51     2      3.18        4.69
           –         60.36   1.40     4      3.18        3.18
           –         55.34   1.02     2      3.18        4.37
(11)       –         54.69   1.33     3      3.18        4.63
           –         52.40   1.12     3      3.18        3.26
           –         51.40   1.03     1      3.18        8.06
           –         51.28   1.02     2      3.43        4.63
(15)       –         80.25   3.74     8      3.18        4.63
           –         55.90   1.54     2      3.18        3.83
           –         54.92   1.46     8      3.18        2.08
           –         54.35   1.40     2      3.18        3.95
           –         52.42   1.23     2      3.18        3.37
           –         50.48   1.05     1      3.18        7.77
  (3)   N/A
29
           {      |   }              (2)
                                      {    | }
         t1 –t2         cw       z   ctf   idf (t1 )   idf (t2 )
(5)         –         72.27   3.34     4      3.43        4.63
            –         52.17   1.44     2      3.43        3.95
            –         51.68   1.40     2      3.43        3.71
            –         51.00   1.33     2      3.43        3.43
            –         49.48   1.19     4      3.43        2.08
            –         48.33   1.08     1      3.43        6.59
            –         47.56   1.01     1      3.43        6.38
(6)      N/A
(9)      N/A
  (24)      –         63.56   1.64    3       3.43        4.63
            –         62.38   1.55    3       3.43        3.14
            –         62.18   1.53    4       3.18        4.63
            –         56.96   1.14    1       3.43        9.16
30




•


•   (cw)   z       1σ

      1σ(16    )
•


•
31




•
•
•


•
    http://etymology.jp/waka/poem.cgi
    XML(SVG)
•

Contenu connexe

En vedette (12)

Asialex201103slide02
Asialex201103slide02Asialex201103slide02
Asialex201103slide02
 
Sciencecafe02
Sciencecafe02Sciencecafe02
Sciencecafe02
 
Ch2010slide01
Ch2010slide01Ch2010slide01
Ch2010slide01
 
Kokken20100303
Kokken20100303Kokken20100303
Kokken20100303
 
Oec2012032001slide01
Oec2012032001slide01Oec2012032001slide01
Oec2012032001slide01
 
Wollongong02
Wollongong02Wollongong02
Wollongong02
 
Jinmon2007slide02
Jinmon2007slide02Jinmon2007slide02
Jinmon2007slide02
 
Ch2011slide01
Ch2011slide01Ch2011slide01
Ch2011slide01
 
Tokyotech20130715
Tokyotech20130715Tokyotech20130715
Tokyotech20130715
 
Ch2008slide01
Ch2008slide01Ch2008slide01
Ch2008slide01
 
Goiken2007slide
Goiken2007slideGoiken2007slide
Goiken2007slide
 
AyeteValdiviaCarlos_videoescollit+mpeg
AyeteValdiviaCarlos_videoescollit+mpegAyeteValdiviaCarlos_videoescollit+mpeg
AyeteValdiviaCarlos_videoescollit+mpeg
 

Similaire à Ch2007slide02

Summer notes by_kolay
Summer notes by_kolaySummer notes by_kolay
Summer notes by_kolay
Ko Lay
 
ข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศ
chukiat008
 
온라인 뉴스시장의 구조적 문제
온라인 뉴스시장의 구조적 문제온라인 뉴스시장의 구조적 문제
온라인 뉴스시장의 구조적 문제
Jeong-Soo KANG
 
Kza Presentatie (1)
Kza Presentatie (1)Kza Presentatie (1)
Kza Presentatie (1)
plinnebank
 
Sap fico-configuration-guide
Sap fico-configuration-guideSap fico-configuration-guide
Sap fico-configuration-guide
Chanchal Singha
 
2011/1/24~1/28投資週報
2011/1/24~1/28投資週報2011/1/24~1/28投資週報
2011/1/24~1/28投資週報
利全 蔡
 
100617_statistics1
100617_statistics1100617_statistics1
100617_statistics1
ocha_kaneko
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
ELMIR IVAN OZUNA LOPEZ
 

Similaire à Ch2007slide02 (20)

Biblio animação setembro
Biblio animação setembroBiblio animação setembro
Biblio animação setembro
 
A linguistic survey on _Itako Bushi_ (1806)
A linguistic survey on _Itako Bushi_ (1806)A linguistic survey on _Itako Bushi_ (1806)
A linguistic survey on _Itako Bushi_ (1806)
 
Pcd0405 (07)
Pcd0405 (07)Pcd0405 (07)
Pcd0405 (07)
 
アルゴリズムイントロダクション 8章
アルゴリズムイントロダクション 8章アルゴリズムイントロダクション 8章
アルゴリズムイントロダクション 8章
 
4R2012 preTest5A
4R2012 preTest5A4R2012 preTest5A
4R2012 preTest5A
 
Rmpiとsnowで 並列処理
Rmpiとsnowで 並列処理Rmpiとsnowで 並列処理
Rmpiとsnowで 並列処理
 
Summer notes by_kolay
Summer notes by_kolaySummer notes by_kolay
Summer notes by_kolay
 
ข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศ
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Apre 2 t08
Apre 2 t08Apre 2 t08
Apre 2 t08
 
온라인 뉴스시장의 구조적 문제
온라인 뉴스시장의 구조적 문제온라인 뉴스시장의 구조적 문제
온라인 뉴스시장의 구조적 문제
 
Day 6 graphing review
Day 6 graphing reviewDay 6 graphing review
Day 6 graphing review
 
Kza Presentatie (1)
Kza Presentatie (1)Kza Presentatie (1)
Kza Presentatie (1)
 
Prism vol.103
Prism vol.103Prism vol.103
Prism vol.103
 
دليل مصور ومختصر لفهم الإسلام
دليل مصور ومختصر لفهم الإسلام دليل مصور ومختصر لفهم الإسلام
دليل مصور ومختصر لفهم الإسلام
 
Sap fico-configuration-guide
Sap fico-configuration-guideSap fico-configuration-guide
Sap fico-configuration-guide
 
2011/1/24~1/28投資週報
2011/1/24~1/28投資週報2011/1/24~1/28投資週報
2011/1/24~1/28投資週報
 
Microsoft Power Point Reliability2
Microsoft Power Point   Reliability2Microsoft Power Point   Reliability2
Microsoft Power Point Reliability2
 
100617_statistics1
100617_statistics1100617_statistics1
100617_statistics1
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
 

Ch2007slide02

  • 1. 1 — — Hilofumi Yamamoto December 13, 2007
  • 2. 2 • – – – – 1000 • (Goodenough, 1981) ( )
  • 3. 3 • — • • • •
  • 5. 5 • (2005)— • (2006)— • • • •
  • 6. 6 • • • ( , 1983; , 1989) •
  • 7. 7 ) ) ) ) 07 ) 86 4) 44) ) 205 05 51 0 (1 0 2 11 •11 18 (1 8 ( •9 ( •9 ( •1 (• ( (1 8 q= 8 8 d =8 =8 =8 :# = e@ 0d= =&0 MU l2V=8 =8 78E:# 8 E 8 =& 8e 6b ; @i: ? 46 56 79 38 20 44 17 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 900 950 1000 1050 1100 1150 1200 1250
  • 8. 8 1. 2. (1976) • • 3. (1991) 4. (1998)
  • 10. 10 • • 9484 ( ) • kh (β ) • ( ) t2c • • (48732) (1408) (49)
  • 11. 11 /$N / Fb /$K / =U /$O / Mh / $K / $1$j / 2) /$N / E`$l / $k / N^ / :# /$d / 2r$/ / $i$` • – – – ... ..
  • 13. 13 • • ( , 1983) • ( , 1996) idf (inverse document frequency) ( )
  • 14. 14 idf (Sp¨rck Jones, 1972) a N idf (t, N ) = log df (t) N idf (ari, N ) = log (1) df (ari) 9484 = log (2) 1201 = log 7.89.. (3) = 2.07.. (4)
  • 15. 15 idf (Sp¨rck Jones, 1972) a N idf (t, N ) = log df (t) N idf (uguisu, N ) = log (5) df (uguisu) 9484 = log (6) 101 = log 93.90.. (7) = 4.54.. (8)
  • 16. 16 3500 L-Shape Freq-Type 3000 2500 number of type 2000 1500 1000 500 0 0 200 400 600 800 100012001400160018002000 frequency
  • 17. 17 1200 idf J-Shape IDF-Type 1000 idf 800 number of type idf idf 600 400 200 0 1 2 3 4 5 6 7 8 9 inverse document frequency (idf)
  • 18. 18 • ( ) • • tfidf w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )
  • 19. 19 (cw) w(t, K, N ) = (1 + log tf (t, K)) idf (t, N ) (9) √ cidf (t1 , t2 , N ) = idf (t1 , N ) idf (t2 , N ) (10) ctf (t1 , t2 , K) = 1 + log |{k : t1 , t2 ∈ k}| (11) • K • (10) • (11) K •
  • 20. 20 cidf ˙ 1000 frequency of patterns 800 600 400 200 0 0 1 2 3 4 5 6 7 8 9 cidf
  • 21. 21 (cw) |N | ictf (t1 , t2 , N ) = 1 + log (12) |{n : t1 , t2 ∈ n}| cw(t1 , t2 ) = ctf (t1 , t2 , K) ictf (t1 , t2 , N ) cidf (t1 , t2 , N ) (13) • K N • • K • N
  • 22. 22 cw 900 ¨ ‚¯”£ 1 cumulative frequency of patterns 8 2 800 3 4 700 1 5 6 7 600 8 3 500 400 7 2 300 200 5 cw z 6 100 4 0 0 10 20 30 40 50 60 70 80 90 100 co-occurrence weight (cw)
  • 23. 23 1σ 16 ( )
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 28. 28 { | } (1) { | } t1 –t2 cw z ctf idf (t1 ) idf (t2 ) (24) – 86.06 3.33 10 3.18 4.63 – 65.15 1.76 5 3.18 3.26 – 64.32 1.70 2 3.43 4.69 – 63.36 1.62 2 3.18 4.92 – 61.87 1.51 2 3.18 4.69 – 60.36 1.40 4 3.18 3.18 – 55.34 1.02 2 3.18 4.37 (11) – 54.69 1.33 3 3.18 4.63 – 52.40 1.12 3 3.18 3.26 – 51.40 1.03 1 3.18 8.06 – 51.28 1.02 2 3.43 4.63 (15) – 80.25 3.74 8 3.18 4.63 – 55.90 1.54 2 3.18 3.83 – 54.92 1.46 8 3.18 2.08 – 54.35 1.40 2 3.18 3.95 – 52.42 1.23 2 3.18 3.37 – 50.48 1.05 1 3.18 7.77 (3) N/A
  • 29. 29 { | } (2) { | } t1 –t2 cw z ctf idf (t1 ) idf (t2 ) (5) – 72.27 3.34 4 3.43 4.63 – 52.17 1.44 2 3.43 3.95 – 51.68 1.40 2 3.43 3.71 – 51.00 1.33 2 3.43 3.43 – 49.48 1.19 4 3.43 2.08 – 48.33 1.08 1 3.43 6.59 – 47.56 1.01 1 3.43 6.38 (6) N/A (9) N/A (24) – 63.56 1.64 3 3.43 4.63 – 62.38 1.55 3 3.43 3.14 – 62.18 1.53 4 3.18 4.63 – 56.96 1.14 1 3.43 9.16
  • 30. 30 • • (cw) z 1σ 1σ(16 ) • •
  • 31. 31 • • • • http://etymology.jp/waka/poem.cgi XML(SVG) •