Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

End-to-End Learning for Answering Structured Queries Directly over Text

2 126 vues

Publié le

Slides presented at the Deep Learning for Knowledge Graphs at ESWC. Paper is https://arxiv.org/abs/1811.06303

Publié dans : Technologie
  • Login to see the comments

  • Soyez le premier à aimer ceci

End-to-End Learning for Answering Structured Queries Directly over Text

  1. 1. Faculty of Science DL4KG – ESWC 2019 June 2, 2019 End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth (@pgroth), Antony Scerri, Ron Daniel, Jr., Bradley P. Allen @INDE_LAB_AMS @ElsevierLabs
  2. 2. Faculty of Science “An information need is the topic about which the user desires to know more” – Manning Information Needs
  3. 3. Faculty of Science Data as an information need  Researchers across communities need a diversity of observational data, requiring data of different types, from different sources and disciplines, and often collected at different scales.  Integrating diverse data is a challenge. Gregory, K.; Cousijn, H.; Groth, P.; Scharnhorst, A.; Wyatt, S. (2019). Searching data: A review of observational data retrieval practices in selected disciplines. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.24165
  4. 4. Faculty of Science Data search – is it just a regular search engine? Survey of Research Challenges: Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez-Gonzalez, Emilia Kacprzak, Paul Groth (Jan 2019) "Dataset search: a survey" https://arxiv.org/abs/1901.00735
  5. 5. Faculty of Science Constructive Data Search SmartTable: A Spreadsheet Program with Intelligent Assistance, S. Zhang, V. A. Zada, and K. Balog. In: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’18), July 2018.
  6. 6. Faculty of Science Integration of Data Into Workflows Chichester, Christine, Daniela Digles, Ronald Siebes, Antonis Loizou, Paul Groth, and Lee Harland. "Drug discovery FAQs: workflows for answering multidomain drug discovery questions." Drug discovery today 20, no. 4 (2015): 399-405.
  7. 7. Faculty of Science Run structured queries
  8. 8. Faculty of Science https://kgtutorial.github.io FIRST: BUILD A KNOWLEDGE GRAPH
  9. 9. Faculty of Science FIRST: BUILD A KNOWLEDGE GRAPH Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 Michael Lauruhn, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  10. 10. Faculty of Science Text Databases Schneider, Rudolf, et al. "Interactive Relation Extraction in Main Memory Database Systems." Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. 2016.
  11. 11. Faculty of Science Can you skip all that?
  12. 12. Faculty of Science Machine Comprehension + Question Answering Tasks https://nlp.stanford.edu/software/sempre/wikitable/
  13. 13. Faculty of Science What if we have a parallel corpora
  14. 14. Faculty of Science Triple Pattern Fragments http://linkeddatafragments.org/concept/
  15. 15. Faculty of Science Now we only need to answer slot filling queries WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia, Hewlett, et al, ACL 2016 Constructing Datasets for Multi-hop Reading Comprehension Across Documents, Johannes Welbl, Pontus Stenetorp, Sebastian Riedel, Transactions of the Association for Computational Linguistics 2018
  16. 16. Faculty of Science Off the shelf QA architectures Dirk Weissenborn, Georg Wiese, and Laura Seiffe. Making neural qa as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 271–280, 2017. Tim Dettmers Isabelle Augenstein Johannes Welbl Tim Rocktaschel Matko Bosnjak Jeff Mitchell Thomas Demeester Pontus Stenetorp Sebastian Riedel Dirk Weissenborn, Pasquale Minervini. Jack the Reader – A Machine Reading Framework. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) System Demonstrations, July 2018. URL https://arxiv.org/abs/1806.08727 Jack the Reader – framework for machine reading https://github.com/uclmr/jack FastQA – state of the art baseline neural architecture JackQA – architecture from framework
  17. 17. Faculty of Science Training data Question: lexicalize(?city wdt:P131 wd:Q55) => located in the administrative territorial entity of Netherlands Input Text “Amsterdam is the capital city and most populous municipality of the Netherlands. ….” Answer span Amsterdam [0,9] 1150 predicates in Wikidata that link entities Filter  Subject must have a Wikipedia page  > 30 examples  Answer must be in the text 572 predicates ~300 examples per predicate
  18. 18. Faculty of Science - Train a model per predicate - 2/3 training 1/3 test - Windowing scheme over the text of articles - EC2 p2.xlarge - 1 virtual GPU - NVIDIA K80, 4 virtual CPUs, 61 GiB RAM - FastQA – 23 hours training time - JackQA – 81 hours - restarts to decrease batch sizes if model training failed Training
  19. 19. Faculty of Science Results
  20. 20. Faculty of Science Training data size as a factor?
  21. 21. Faculty of Science
  22. 22. Faculty of Science
  23. 23. Faculty of Science A Prototype
  24. 24. Faculty of Science - Joint model - Model architecture tuned to the task - Performance on complex queries - Accuracy - Speed - Other datasets - When to use what approach - … Where to go
  25. 25. Faculty of Science • Structured queries are important! • Can we do it on text? Looks like it … kind of • Text as the KB – McCallum • Interested in this kind of stuff? • We’re hiring! Questions? Paul Groth | @pgroth | pgroth.com indelab.org Conclusion
  26. 26. Faculty of Science