SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Академические инициативы 
Академические инициативы
        Яндекса
           д
      Павел Браславский
      Павел Браславский
Академические инициативы
    Академические инициативы
• Школа Анализа Данных
• Семинары Яндекса
  Семинары Яндекса
• Интернет‐математика
• РОМИП
• Школа по информационному поиске 
  Школа по информационному поиске
  (RuSSIR)
• Книга «Введение в информационный 
  поиск»
                                      2
Yandex School of Data Analysis




  two‐year master program, http://shad.yandex.ru
                                                   3
Teachers




           4
Scientific seminars
                   Scientific seminars
Monthly seminars on Data analysis & 
Monthly seminars on Data analysis &
information retrieval

Organized by 
Microsoft Research + 
Microsoft Research +
Яндекс




  http://company.yandex.ru/public/seminars/schedule/
                                                       5
IMAT 2009
                     IMAT 2009
•   Learning to rank 
    L     i         k
•   245 features for query‐document pairs
•   Graded relevance judgments (0..4)
•   Pure numeric data (i.e. no original queries, documents 
    or feature semantics)
       f                 )
•   Learning set: 97 290 feature vectors (9 124 queries)
•   Test set: 115 643 vectors (21 103 – public evaluation; 
    94 540 – final evaluation)
•   Evaluation measure: DCG
•   http://imat2009.yandex.ru

                                                              6
7
IMAT 2010
                  IMAT 2010
• Traffic congestion prediction
     ffi         i      di i
• (Rough) data:
  (    g )
  – Modified graph of Moscow streets 
  – Observed traffic speed 4‐10 pm (4‐min intervals)
    Observed traffic speed 4 10 pm (4 min intervals) 
    for 30 subsequent days + 4‐6 pm on the 31st day
• Task: predict traffic speed 6‐10 pm of the 31st
  Task: predict traffic speed 6‐10 pm of the 31
  day
• public/final evaluation
     bli /fi l     l ti
• http://imat2010.yandex.ru
                                                        8
Modified graph of streets
Modified graph of streets




                            9
IMAT 2010 Data
                  IMAT 2010 Data
• G h
  Graph: vertices (139 241/33 029) and edges (206 
            ti    (139 241/33 029) d d       (206
  260/86 249)
   – <id_vertex> <id_g p
                     group>
   – <id_edge> <id_edge_group> <start_vert> <end_vert>
   – <id_edge_group> <length> <avg_speed>
• Observations (learning set 29 226 208 lines)
  Observations (learning set, 29 226 208 lines)
   – <id_edge_group> <day> <time> <speed>
• Task (691 641 lines)
       (             )
   – <id_edge_group> <day> <time> ??
• Evaluation



                                                         10
11
ИМАТ 2011
              ИМАТ 2011
Старт конкурса – февраль 2011
Задача интересная, победителю – приз ☺
Задача интересная, победителю  приз ☺




                                         12
ROMIP
•   TREC‐like Russian initiative
•   Started 2002 
•                        g
    Several text and image collections
•   10‐15 participants per year (total 50+)
     • Academia and industry, students support
• ~3 000 man‐hours of evaluation (2009)
• Remote participation + live meeting
  Remote participation + live meeting
• Collections are freely available
• Popular testbed for IR research in Russia


                                                 13
ROMIP largest text collections
 ROMIP largest text collections
                                                   Evaluated within 
                             Size
Collection   Documents                   Topics     ad‐hoc search 
                         (compressed)
                                                        track

             ~300 000
  Legal                     2 Gb        14 794           220



 By.Web
 By Web      1 524 676
             1 524 676      8 Gb
                            8 Gb        ~ 60 000
                                          60 000       1 500+
                                                       1 500+



 KM.RU       3 010 455      13 Gb       ~ 60 000        ~250




                                                                14
Image collections
            Image collections
Photo collection: 20 000 images from Flickr
Photo collection: 20 000 images from Flickr
Dups collection: 15 hrs video    37 800 frames




                                          15     15
RuSSIR
• Yekaterinburg, 5‐12 September 2007
  Yekaterinburg,  5 12 September 2007
  http://romip.ru/russir2007

• Taganrog 1 5 September 2008
  Taganrog, 1‐5 September 2008
  http://romip.ru/russir2008/

• Petrozavodsk, 11‐16 September 2009
  http://romip.ru/russir2009/

• Voronezh, 13‐18 September 2010
  http://romip.ru/russir2010/

• Saint Petersburg, 15‐19 August 2011
  http://romip.ru/edbt‐russir2011/
     p        p
                                        16
RuSSIR
•   Put RuSSIR pic here 
•   Annual event 
    Annual event
•   100+ participants
•   4th RuSSIR: Voronezh 13‐18 September
•   http://romip.ru/russir2010/
    http://romip ru/russir2010/




                                           17
Информационный поиск по‐русски
Информационный поиск по русски




 Оригинальная английская версия: http://informationretrieval.org
                                                                   18
Павел Браславский
Павел Браславский
pb@yandex‐team.ru




                    19

Contenu connexe

Similaire à 20101219 yandex academic_programs_braslavski

Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduceohkura
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
萬國之津梁
萬國之津梁萬國之津梁
萬國之津梁charsbar
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...AIST
 
Music recommendations API with Neo4j
Music recommendations API with Neo4jMusic recommendations API with Neo4j
Music recommendations API with Neo4jBoris Guarisma
 
Hands on Training – Graph Database with Neo4j
Hands on Training – Graph Database with Neo4jHands on Training – Graph Database with Neo4j
Hands on Training – Graph Database with Neo4jSerendio Inc.
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
 
Music Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIMusic Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIdavidsingleton
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]pstavirs
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.pptEqinNiftalyev
 
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archives
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video ArchivesMediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archives
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archivesmultimediaeval
 
超カジュアルに使うMySQL @ MySQL Casual Talks #2
超カジュアルに使うMySQL @ MySQL Casual Talks #2超カジュアルに使うMySQL @ MySQL Casual Talks #2
超カジュアルに使うMySQL @ MySQL Casual Talks #2Tasuku Suenaga
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 

Similaire à 20101219 yandex academic_programs_braslavski (20)

Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduce
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
SharePoint Dev Ecosystem / PnP - January 2018 monthly call
SharePoint Dev Ecosystem / PnP - January 2018 monthly callSharePoint Dev Ecosystem / PnP - January 2018 monthly call
SharePoint Dev Ecosystem / PnP - January 2018 monthly call
 
萬國之津梁
萬國之津梁萬國之津梁
萬國之津梁
 
Jdk 10 sneak peek
Jdk 10 sneak peekJdk 10 sneak peek
Jdk 10 sneak peek
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
 
Music recommendations API with Neo4j
Music recommendations API with Neo4jMusic recommendations API with Neo4j
Music recommendations API with Neo4j
 
Hands on Training – Graph Database with Neo4j
Hands on Training – Graph Database with Neo4jHands on Training – Graph Database with Neo4j
Hands on Training – Graph Database with Neo4j
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data Mining
 
Music Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIMusic Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm API
 
Linked data in the swiss federal data infra
Linked data in the swiss federal data infraLinked data in the swiss federal data infra
Linked data in the swiss federal data infra
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
Ostinato - Craft Packets, Generate Traffic [SharkFest '20]
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.ppt
 
OHM at FOSS4G17
OHM at FOSS4G17OHM at FOSS4G17
OHM at FOSS4G17
 
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archives
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video ArchivesMediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archives
MediaEval 2015 - SAVA at MediaEval 2015: Search and Anchoring in Video Archives
 
SharePoint Dev Monthly Community Call - 2018 March
SharePoint Dev Monthly Community Call - 2018 MarchSharePoint Dev Monthly Community Call - 2018 March
SharePoint Dev Monthly Community Call - 2018 March
 
超カジュアルに使うMySQL @ MySQL Casual Talks #2
超カジュアルに使うMySQL @ MySQL Casual Talks #2超カジュアルに使うMySQL @ MySQL Casual Talks #2
超カジュアルに使うMySQL @ MySQL Casual Talks #2
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 

Plus de Computer Science Club

20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12Computer Science Club
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11Computer Science Club
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10Computer Science Club
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09Computer Science Club
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02Computer Science Club
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01Computer Science Club
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04Computer Science Club
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01Computer Science Club
 

Plus de Computer Science Club (20)

20141223 kuznetsov distributed
20141223 kuznetsov distributed20141223 kuznetsov distributed
20141223 kuznetsov distributed
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04
 
20140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-0320140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-03
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01
 
20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 

20101219 yandex academic_programs_braslavski

  • 1. Академические инициативы  Академические инициативы Яндекса д Павел Браславский Павел Браславский
  • 2. Академические инициативы Академические инициативы • Школа Анализа Данных • Семинары Яндекса Семинары Яндекса • Интернет‐математика • РОМИП • Школа по информационному поиске  Школа по информационному поиске (RuSSIR) • Книга «Введение в информационный  поиск» 2
  • 5. Scientific seminars Scientific seminars Monthly seminars on Data analysis &  Monthly seminars on Data analysis & information retrieval Organized by  Microsoft Research +  Microsoft Research + Яндекс http://company.yandex.ru/public/seminars/schedule/ 5
  • 6. IMAT 2009 IMAT 2009 • Learning to rank  L i k • 245 features for query‐document pairs • Graded relevance judgments (0..4) • Pure numeric data (i.e. no original queries, documents  or feature semantics) f ) • Learning set: 97 290 feature vectors (9 124 queries) • Test set: 115 643 vectors (21 103 – public evaluation;  94 540 – final evaluation) • Evaluation measure: DCG • http://imat2009.yandex.ru 6
  • 7. 7
  • 8. IMAT 2010 IMAT 2010 • Traffic congestion prediction ffi i di i • (Rough) data: ( g ) – Modified graph of Moscow streets  – Observed traffic speed 4‐10 pm (4‐min intervals) Observed traffic speed 4 10 pm (4 min intervals)  for 30 subsequent days + 4‐6 pm on the 31st day • Task: predict traffic speed 6‐10 pm of the 31st Task: predict traffic speed 6‐10 pm of the 31 day • public/final evaluation bli /fi l l ti • http://imat2010.yandex.ru 8
  • 10. IMAT 2010 Data IMAT 2010 Data • G h Graph: vertices (139 241/33 029) and edges (206  ti (139 241/33 029) d d (206 260/86 249) – <id_vertex> <id_g p group> – <id_edge> <id_edge_group> <start_vert> <end_vert> – <id_edge_group> <length> <avg_speed> • Observations (learning set 29 226 208 lines) Observations (learning set, 29 226 208 lines) – <id_edge_group> <day> <time> <speed> • Task (691 641 lines) ( ) – <id_edge_group> <day> <time> ?? • Evaluation 10
  • 11. 11
  • 12. ИМАТ 2011 ИМАТ 2011 Старт конкурса – февраль 2011 Задача интересная, победителю – приз ☺ Задача интересная, победителю  приз ☺ 12
  • 13. ROMIP • TREC‐like Russian initiative • Started 2002  • g Several text and image collections • 10‐15 participants per year (total 50+) • Academia and industry, students support • ~3 000 man‐hours of evaluation (2009) • Remote participation + live meeting Remote participation + live meeting • Collections are freely available • Popular testbed for IR research in Russia 13
  • 14. ROMIP largest text collections ROMIP largest text collections Evaluated within  Size Collection Documents Topics ad‐hoc search  (compressed) track ~300 000 Legal 2 Gb 14 794 220 By.Web By Web 1 524 676 1 524 676 8 Gb 8 Gb ~ 60 000 60 000 1 500+ 1 500+ KM.RU 3 010 455 13 Gb ~ 60 000 ~250 14
  • 15. Image collections Image collections Photo collection: 20 000 images from Flickr Photo collection: 20 000 images from Flickr Dups collection: 15 hrs video  37 800 frames 15 15
  • 16. RuSSIR • Yekaterinburg, 5‐12 September 2007 Yekaterinburg,  5 12 September 2007 http://romip.ru/russir2007 • Taganrog 1 5 September 2008 Taganrog, 1‐5 September 2008 http://romip.ru/russir2008/ • Petrozavodsk, 11‐16 September 2009 http://romip.ru/russir2009/ • Voronezh, 13‐18 September 2010 http://romip.ru/russir2010/ • Saint Petersburg, 15‐19 August 2011 http://romip.ru/edbt‐russir2011/ p p 16
  • 17. RuSSIR • Put RuSSIR pic here  • Annual event  Annual event • 100+ participants • 4th RuSSIR: Voronezh 13‐18 September • http://romip.ru/russir2010/ http://romip ru/russir2010/ 17
  • 18. Информационный поиск по‐русски Информационный поиск по русски Оригинальная английская версия: http://informationretrieval.org 18