SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
(aomushi510)




2009   11   24
aomushi
                 •         perl

                 •



                 • casual-perl IRC




2009   11   24
aomushi


                 •     JPA




2009   11   24
• Algorithm::NaiveBayes
                       OK/NG




2009   11   24
Algorithm::NaiveBayes
                 • http://search.cpan.org/~kwilliams/
                   Algorithm-NaiveBayes-0.04/lib/
                   Algorithm/NaiveBayes.pm
                 •



                 • AI::Categorizer
                       Lingua::JA::Categorize

2009   11   24
2009   11   24
aomushi       •   Mecab




2009   11   24
aomushi


  aomushi
                         ,            ,    ,*,*,*,*
       
                 ,               ,*,*,*,*, , , ,,
                
                  ,            ,*,*,*,*,   ,           ,
                                                                             •   Mecab
                              ,,
       
                 ,               ,     ,*,*,*, , , ,,
                
                  ,            ,*,*,*,*,   ,           ,
                         ,,
       
                 ,             ,*,*,           ,        ,   , , ,,
       
                 ,                   ,*,*,*,*, , , ,,


2009       11       24
aomushi                                                              • NaiveBayes


  aomushi
                    ,       ,       ,*,*,*,*
            
                 ,       ,*,*,*,*,        ,       ,
            
                 ,       ,*,*,*,*,        ,       ,
                         ,,
                                                                        •
                     
        ,   ,*,*,*,*,        ,       ,       ,,




2009   11       24
28            for (my $node = $self->mecab->parse($text); $node; $node = $node->next) {
   29               my $info = $node->feature;
   30               my $word = $node->surface;
   31               next unless $info;
   32               if ( $info =~ /^   /){
   33                  next
   34                   if $info =~ / |        |     |       |    |          /;
   35                  next if List::MoreUtils::any { $word eq $_ } @{ $self->_skip_word };
   36                  $data->{$word}++;
   37              }
   38            }
   39            return $data;




2009   11   24
mecab
                 • naist-dic wikipedia

                     • deepneko



                     • http://deepneko.dyndns.org/
                       kokotech/2009/06/
                       mecabwikipedia.html
                 •             NG
2009   11   24
•

                     • NG

                     • OK   NG



                     •       10000


2009   11   24
59            while ( my ( $label, $ref ) = each %$categories ) {
   60              my $words = $self->_get_words($ref->{display});
   61              foreach (@$words) {
   62                 my $tokenizer = MyFilter::Util::Tokenizer->new;
   63                 my $word_set = $tokenizer->tokenize($_, $self->threshold);
   64
   65                 $brain->add_instance(
   66                    attributes => $word_set,
   67                    label => $label,
   68                 );
   69              }
   70              $brain->train;
   71            }
   72            $brain->save_state($save_file) if $save_file;


2009   11   24
31 sub categorize {
   32    my ($self, $word_set) = @_;
   33
   34    return $self->brain->predict( attributes =>
  $word_set );
   35 }




2009   11   24
• bad


  $result = {
     good => 0.092,
     bad => 0.996,
  };

2009   11   24
•



                     •

                     • ao shi

                 •



2009   11   24
•




2009   11   24
•




2009   11   24
•




2009   11   24
2009   11   24
2009   11   24
P-1

                 •       200       NG
                     (         )



                 •




2009   11   24
3



2009   11   24
3




2009   11   24
2



2009   11   24
2




2009   11   24
1



2009   11   24
1




2009   11   24
•

                     • Algorithm::NaiveBayes

                     • mecab

                 • yusukebe



2009   11   24
2009   11   24
•   Algorithm::NaiveBayes

                     •   http://search.cpan.org/~kwilliams/Algorithm-NaiveBayes-0.04/
                         lib/Algorithm/NaiveBayes.pm

                 •   mecab         wikipedia

                     •   http://deepneko.dyndns.org/kokotech/2009/06/
                         mecabwikipedia.html

                 •   Lingua::JA::Categorize

                     •   http://search.cpan.org/~miki/Lingua-JA-Categorize-0.01001/
                         lib/Lingua/JA/Categorize.pm



2009   11   24

Contenu connexe

Similaire à Casual-Talk #1 青虫の生態について

Programming Contest Hacks
Programming Contest HacksProgramming Contest Hacks
Programming Contest HacksKosei Moriyama
 
次世代シーケンサのデータ解析 技術開発編
次世代シーケンサのデータ解析 技術開発編次世代シーケンサのデータ解析 技術開発編
次世代シーケンサのデータ解析 技術開発編mickey24
 
Ecos基础应用介绍
Ecos基础应用介绍Ecos基础应用介绍
Ecos基础应用介绍wanglei999
 
Cache on Delivery
Cache on DeliveryCache on Delivery
Cache on DeliverySensePost
 
Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Jinho Jung
 

Similaire à Casual-Talk #1 青虫の生態について (7)

Programming Contest Hacks
Programming Contest HacksProgramming Contest Hacks
Programming Contest Hacks
 
Java7 normandyjug
Java7 normandyjugJava7 normandyjug
Java7 normandyjug
 
次世代シーケンサのデータ解析 技術開発編
次世代シーケンサのデータ解析 技術開発編次世代シーケンサのデータ解析 技術開発編
次世代シーケンサのデータ解析 技術開発編
 
仙台Ruby会議02 LT
仙台Ruby会議02 LT仙台Ruby会議02 LT
仙台Ruby会議02 LT
 
Ecos基础应用介绍
Ecos基础应用介绍Ecos基础应用介绍
Ecos基础应用介绍
 
Cache on Delivery
Cache on DeliveryCache on Delivery
Cache on Delivery
 
Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드
 

Dernier

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Casual-Talk #1 青虫の生態について

  • 2. aomushi • perl • • casual-perl IRC 2009 11 24
  • 3. aomushi • JPA 2009 11 24
  • 4. • Algorithm::NaiveBayes OK/NG 2009 11 24
  • 5. Algorithm::NaiveBayes • http://search.cpan.org/~kwilliams/ Algorithm-NaiveBayes-0.04/lib/ Algorithm/NaiveBayes.pm • • AI::Categorizer Lingua::JA::Categorize 2009 11 24
  • 6. 2009 11 24
  • 7. aomushi • Mecab 2009 11 24
  • 8. aomushi aomushi , , ,*,*,*,* , ,*,*,*,*, , , ,, , ,*,*,*,*, , , • Mecab ,, , , ,*,*,*, , , ,, , ,*,*,*,*, , , ,, , ,*,*, , , , , ,, , ,*,*,*,*, , , ,, 2009 11 24
  • 9. aomushi • NaiveBayes aomushi , , ,*,*,*,* , ,*,*,*,*, , , , ,*,*,*,*, , , ,, • , ,*,*,*,*, , , ,, 2009 11 24
  • 10. 28 for (my $node = $self->mecab->parse($text); $node; $node = $node->next) { 29 my $info = $node->feature; 30 my $word = $node->surface; 31 next unless $info; 32 if ( $info =~ /^ /){ 33 next 34 if $info =~ / | | | | | /; 35 next if List::MoreUtils::any { $word eq $_ } @{ $self->_skip_word }; 36 $data->{$word}++; 37 } 38 } 39 return $data; 2009 11 24
  • 11. mecab • naist-dic wikipedia • deepneko • http://deepneko.dyndns.org/ kokotech/2009/06/ mecabwikipedia.html • NG 2009 11 24
  • 12. • NG • OK NG • 10000 2009 11 24
  • 13. 59 while ( my ( $label, $ref ) = each %$categories ) { 60 my $words = $self->_get_words($ref->{display}); 61 foreach (@$words) { 62 my $tokenizer = MyFilter::Util::Tokenizer->new; 63 my $word_set = $tokenizer->tokenize($_, $self->threshold); 64 65 $brain->add_instance( 66 attributes => $word_set, 67 label => $label, 68 ); 69 } 70 $brain->train; 71 } 72 $brain->save_state($save_file) if $save_file; 2009 11 24
  • 14. 31 sub categorize { 32 my ($self, $word_set) = @_; 33 34 return $self->brain->predict( attributes => $word_set ); 35 } 2009 11 24
  • 15. • bad $result = { good => 0.092, bad => 0.996, }; 2009 11 24
  • 16. • • ao shi • 2009 11 24
  • 17. • 2009 11 24
  • 18. • 2009 11 24
  • 19. • 2009 11 24
  • 20. 2009 11 24
  • 21. 2009 11 24
  • 22. P-1 • 200 NG ( ) • 2009 11 24
  • 23. 3 2009 11 24
  • 24. 3 2009 11 24
  • 25. 2 2009 11 24
  • 26. 2 2009 11 24
  • 27. 1 2009 11 24
  • 28. 1 2009 11 24
  • 29. • Algorithm::NaiveBayes • mecab • yusukebe 2009 11 24
  • 30. 2009 11 24
  • 31. Algorithm::NaiveBayes • http://search.cpan.org/~kwilliams/Algorithm-NaiveBayes-0.04/ lib/Algorithm/NaiveBayes.pm • mecab wikipedia • http://deepneko.dyndns.org/kokotech/2009/06/ mecabwikipedia.html • Lingua::JA::Categorize • http://search.cpan.org/~miki/Lingua-JA-Categorize-0.01001/ lib/Lingua/JA/Categorize.pm 2009 11 24