Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

The Allen AI Science Challenge

665 vues

Publié le

7th place by Team Generation Gap

Publié dans : Données & analyses
  • Identifiez-vous pour voir les commentaires

The Allen AI Science Challenge

  1. 1. The Allen AI Science Challenge & DeepHack.Q&A St. Petersburg Data Science Meetup #6, Feb 19th, 2016
  2. 2. Q: When athletes begin to exercise, their heart rates and respiration rates increase. At what level of organization does the human body coordinate these functions? A. at the tissue level B. at the organ level C. at the system level D. at the cellular level Wed 7 Oct 2015 – Sat 13 Feb 2016 Stage 1: 800 teams (>1000 participants), Stage 2: 170 teams https://www.kaggle.com/c/the-allen-ai-science-challenge 2700 questions - train set 8132 questions - validation set 21298 questions - final test set
  3. 3. DeepHack Q&A qa.deephack.me/ Qualification round: Top-50 participants with the highest scores Rough competition: Kaggle Top-40 to get to the Top-50 o_O Winter ML school + hackathon: 31st, Jan - 5th Feb, 2016 GP team created at Jan, 31st from the four teams The final 30 minutes of the hackathon: https: //www.youtube.com/watch?v=tCKL5vbiHuo
  4. 4. Pavel Kalaidin (VK) Marat Zainutdinov (Quantbrothers) Roman Trusov (ITMO University) Artyom Korkhov (Zvooq) Igor Shilov (Zvooq) Timur Luguev (Clevapi) Ilyas Luguev (Clevapi) Team Generation Gap DeepHack: 1st, ~0.556 Allen AI: 7th, 0.55059
  5. 5. Datasets ck12.org wikipedia.org (science subset) flashcards: studystack.com, quizlet.com Topic at the forum: https://www.kaggle.com/c/the-allen-ai-science-challenge/forums/t/16877/external-data- repository
  6. 6. Hail to Lucene Lucene Question a) ans1 b) ans2 c) ans3 d) ans4 Question ans1 Question ans4 Question ans3 Question ans2 0.5 0.4 0.02 0.01 ... 0.5 0.4 0.02 0.01 ... 0.5 0.4 0.02 0.01 ... 0.5 0.4 0.02 0.01 ... Wiki ck12 quizlets Stemming,stopwords
  7. 7. Custom queries rule Lucene scores: https://lucene.apache. org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html
  8. 8. AdaGram (a.k.a Reptil) Breaking Sticks and Ambiguities with Adaptive Skip-gram: http: //arxiv.org/abs/1502.07257 Reference implementation in Julia: https://github. com/sbos/AdaGram.jl
  9. 9. reptil art cultur final play signific role folklor religion popular cultur moch peopl noun coldblood anim scale general move stomach short leg exampl snake lizard turtl noun aw person
  10. 10. Model trained like this: sh train.sh --min-freq 20 --window 5 --workers 40 --epochs 5 --dim 300 --alpha 0.1 corpus.txt adam.dict adam.model Number of prototypes is 5 by default.
  11. 11. AdaGram (a.k.a Reptil) approach [0.42, 0.55, 0.08, …]
  12. 12. N-grams PMI x, y - Ngrams Example 1-gram -> 1-gram unit -> state magnet -> field carbon -> dioxid million -> year year -> ago amino -> acid Example 1-gram -> 3-gram around -> million year ago period -> million year ago forc -> van der waal fossil -> million year ago nobel -> prize physiolog medicin date -> million year ago mercuri -> venus earth mar
  13. 13. N-grams PMI greatest contributor air pollut unit state What is the greatest contributor to air pollution in the United States? greatest contributor air ... greatest contributor contributor air air pollut ... 1-grams 2-grams 3-grams Power plants power plant power plant 1-grams power plant 2-grams ...
  14. 14. Scores
  15. 15. Fail Story TL;DR wasted tons of time, got ~0.3 in almost all approaches
  16. 16. LSA + Lucene Corpus LSA TI_1 TI_2 TI_n Lucene qa pair 1 qa pair 2 qa pair 3 qa pair 4 Queries in topic indices Result: for each qa pair, max(s1...sn) Gave 1% improvement over basic Lucene; but took EXTREMELY long time to process :(
  17. 17. Syntax co-occurrence nobel chemistry prize 517 national science academy 445 long time period 340 also role play 306 nobel physic prize 279 national medical library 273 carbon water dioxide 261 second thermodynamics law 247 speed sound of_pobj density population compound take place dobj link external compound 0.3 :(
  18. 18. word2vec combinations Wanted to capture the intersection of meanings, but didn’t know how to combine word2vec representations TFIDFqa pairs Combinations of question tokens Combinations of answer tokens Cosine similarity Max score ~ 0.3 :( even with careful kw filtering word2gauss didn’t help too
  19. 19. Averaging Neural networks (1st encounter) w2v_dim = 300 vec_q = mean(w2v(Q)) vec_c = mean(w2v(Ac)) vec_w = mean(w2v(Aw)) cos_sim(vec_q, vec_c) > cos_sim(vec_q, vec_w)
  20. 20. Averaging Neural networks (1st encounter) w2v_dim = 300 vec_q = mean(w2v(Q)) vec_c = mean(w2v(Ac)) vec_w = mean(w2v(Aw)) a = CNN(w2v(X)) vec_x = mean(w2v(X) * a) cos_sim(vec_q, vec_c) > cos_sim(vec_q, vec_w)
  21. 21. Semantic Neural networks (2nd encounter) + Paragraphs LSTM = LSTM(w2v) LSTM(s1 | s2) > LSTM(s1 | s3) if s1 and s2 are from the same paragraph, while s1 and s3 are not LSTM(a, b) is low then a and b are from the same paragraph (energy based learning) Loss = max(0, M - LSTM(s1, s2) + LSTM(s1, s3)) Score: 0.26
  22. 22. Siamese architecture
  23. 23. Hinge Loss Margin
  24. 24. Reading Neural networks (3rd encounter) + Lots of paragraphs + Search Engine + A survey: - bigrams are not accounted - main idea (keywords) of a sentence is not recognized
  25. 25. Reading Neural networks (3rd encounter) + Lots of paragraphs + Search Engine + A survey: - bigrams are not accounted - main idea (keywords) of a sentence is not recognized
  26. 26. Reading Neural networks (3rd encounter) All we want is to know if a sentence is from a paragraph to be able to rerank lucene scores.
  27. 27. Hinge Loss Margin LSTM(P) LSTM(s1) LSTM(s2)
  28. 28. Reading Neural networks (3rd encounter) sentences -> LSTM -> Dense NN -> Embedding w2v -> LSTM -> Dense NN -> Embedding w2v -> Mean -> Dense NN -> Embedding
  29. 29. Neural networks. Learned lessons. Start as small as possible Corruption is important for siamese networks Learning curve is misleading in NLP
  30. 30. Lessons learned Start early - wasted two first months of the competition (but had a week of 24/7 hackathon at the end) No stickers in the team channel (except with Yann LeCun on a good submit) Common toolbox is nice A dedicated server is a good thing to have (no need in AWS spot instances) Experiment fast, fail early Team work means a lot

×