Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Process Methodology Jochen Leidner (Coburg University, Germany)

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Msr2021 tutorial-di penta
Msr2021 tutorial-di penta
Chargement dans…3
×

Consultez-les par la suite

1 sur 16 Publicité

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Process Methodology Jochen Leidner (Coburg University, Germany)

Télécharger pour lire hors ligne

Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.

Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Process Methodology Jochen Leidner (Coburg University, Germany) (20)

Plus par Dr. Haxel Consult (20)

Publicité

Plus récents (20)

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Process Methodology Jochen Leidner (Coburg University, Germany)

  1. 1. Accommodating the Deep Learning Revolution by a Development Process Methodology Jochen L. Leidner Coburg University of Applied Sciences and Arts, Coburg, Germany KnowledgeSpaces® UG (haftungsbeschränkt), Coburg, Germany University of Sheffield, Sheffield, UK 2022-10-11
  2. 2. Overview ● Introduction: Motivation, Pre-Trained Language Model Revolution ● Quick Recap: Some Machine Learning Methodologies (CRISP-DM, KDD, SEMMA, Data-to-Value) ● Before and After Pre-Trained LMs ● Comparison: Where Project Work is Spent: Pre-BERT and Post-BERT ● A Comment about Energy ● Summary & Conclusion
  3. 3. A Step Change in NLP: Deep Learning and Pre-Trained Language Models ● In recent years, Pre-Trained Language Models (PTLMs) like Google’s BERT have emerged (Devlin et al., 2018/2019). ● This led to enormous improvements in terms of accuracy in most NLP tasks. ● PTLMs show that transfer learning is possible by splitting up training into two phases.
  4. 4. BERT: An Example Pretained Neural Language Model – Pre-Training versus Fine-Tuning  Two Training Phases: – Pre-training: train deep neural network with masked sentence pairs on generic language (billions of words from books, Wikipedia) – Fine-tuning: adapt generic LM to specific task (e.g. question answering) using supervised learning (extra rounds on top of pre-trained LM)
  5. 5. Practical Questions ● RQ 1. How do PTLMs change the way NLP projects are done? ● RQ 2. In particular, How do PTLMs interact with existing methodologies?
  6. 6. Some Methodologies ● KDD ● CRISP-DM Azevedo and Santos (2008) ● SEMMA ● Data-to-Value (Leidner, 2013; Leidner 2022a,b)
  7. 7. The Data-to-Value Methodology (Leidner, 2013; Leidner 2022a,b) (1 of 2)
  8. 8. The Data-to-Value Methodology (Leidner, 2013; Leidner 2022a,b) (2 of 2) Minor fine-tuning sufficient
  9. 9. Before and After PTLMs Before: ● Any classifier/regressor a bespoke activity (100% custom development from scratch) ● Relatively slow and expensive to build ● Knobs: more labelled training data, more features After: ● Classifiers can be derived from PTLMs (80% re-use and 20% custom dev. →fine-tuning) ● Rapid/agile prototyping, cheap to get started ● Knobs: more unlabelled training data, more labelled training data, 3 training regimes: – Zero-shot (apply PTLM as-is) – Fine tuning only (take pre-trained LM and add a few hundred training rounds using annotated data) – Pre-training (huge unlabelled data) and fine- tuning (small labelled data) Increasing effort
  10. 10. Comparison: How Time May Be Spent – Before and After PTLMs Before: ● Data Collection & Pre-Processing 70% ● Annotation 10% ● Feature Engineering 10% ● Model Training 7% ● Evaluation 3% After: ● Data Collection & Pre-Processing 50% - 70% ● Annotation 2% - 10% ● Feature Engineering 0% ● Model Training 0% - 12% ● Evaluation 3% Percentages are estimates (an empirical study is needed but hard to obtain); ranges reflect training regimes symbolizes size of the project
  11. 11. Deep Learning & Energy Consumption ● Pre-training neural models is resource-intensive (Strubell, Ganesh and McCallum, 2019). ● Individual estimates vary, but cloud cost and environmental footprint are issues. ● While experiments show that “bigger is better“ (in terms of F1), there is a research drive to “distill“ smaller models.
  12. 12. Summary & Conclusions ● PTLMs have made NLP projects more agile. – While more unlabelled data may be needed, less labelled data may be required (sufficient data is sometimes unavailable in industrial practice). – Most importantly, the feature engineering cycle is removed from projects. – PTLMs offer 3 training regimes: zero-shot, tune-tuning and pre-training with increasing cost/effort. ● As artifacts they are also more clunky and energy-inefficient. ● Implications: – Research: Increasingly bigger models means some academic teams excluded from research (requires expensive GPU clusters) → research moves to industry (similar to semiconductor space). – Business: Public availability of PTLMs generates more level playing-ground, makes competitive differentiation harder and reduces barriers to entry.
  13. 13. References ● Devlin, Jacob, Ming-Wei Chang, Kenton Lee and Kristina Toutanova (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Technical Report/Unpublished ArXiv Pre-print, https://arxiv.org/abs/1810.04805. ● Devlin, Jacob, Ming-Wei Chang, Kenton Lee and Kristina Toutanova (2019) "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" Proc. NAACL-HLT, Minneapolis. ● Azevedo, A. and Santos, M. F. (2008) "KDD, SEMMA and CRISP-DM: a parallel overview", Proc. IADIS European Conference on Data Mining, Amsterdam, 24-26 July 2000, 182–185. ● Leidner, Jochen L. (2013) “Data-to-Value“, unpublished lecture notes Big Data and Language Technology, University of Zurich, Zurich, Switzerland. ● Leidner, Jochen L. (2022a) Data-to-Value: An Evaluation-First Methodology for Natural Language Projects, Technical Report/Unpublished ArXiv Pre-print https://arxiv.org/abs/2201.07725. ● Leidner, Jochen L. (2022b) "Data-to-Value: An Evaluation-First Methodology for Natural Language Projects", Proceedings of the 27th International Conference on Natural Language & Information Systems (NLDB 2022), Valencia, Spain, 15-17 June 2022, LNCS 13286, 517–523. ● Strubell, Emma, Ananya Ganesh and Andrew McCallum (2019) "Energy and Policy Considerations for Deep Learning in NLP", ArXiv pre-print, https://arxiv.org/pdf/1906.02243.pdf .
  14. 14. Accommodating the Deep Learning Revolution by a Development Process Methodology Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the- art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed. Abstract
  15. 15. About the Presenter Prof. Dr. Jochen L. Leidner, M.A. M.Phil. Ph.D. FRGS is Professor for Explainable and Responsible Artificial Intelligence in Insurance at Coburg University of Applied Sciences, a Visiting Professor in the Department of Computer Science, University of Sheffield. He is also founder and CEO of KnowledgeSpaces. His experience includes positions as Director of Research at Thomson Reuters and Refinitiv in London, where he headed its R&D team, which he founded He was also the Royal Academy of Engineering Visiting’ Professor of Data Analytics at the Department of Computer Science, University of Sheffield (2017-2020). His background includes a Master's in computational linguistics, English and computer science (University of Erlangen-Nuremberg), a Master's in Computer Speech, Text and Internet Technology (University of Cambridge) and a PhD in Informatics (University of Edinburgh), which won the first ACM SIGIR Doctoral Consortium Award. His scientific contributions include leading the teams that developed the QED and ALYSSA open-domain question answering Systems (evaluated at US NIST/DARPA TREC), proposing a new algorithm and comparing existing algorithms for spatial resolution of named entities, and information extraction of usual and unusual things (e.g. event extraction, company risk mining, sentiment analysis). At Thomson Reuters he has led projects in the vertical comains of finance, regulatory/law enforcement, legal, pharmacology, and news. His code and machine learning models have been transitioned into products deployed at institutions ranging from international banks to the U.S. Supreme Court. Prior to Thomson Reuters, he has worked for SAP and founded and co-founded a number of start-ups. He has lived and worked in Germany, Scotland, the USA, Switzerland and the UK, and has taught at various universities (Erlangen, Saarbrücken, Frankfurt, Zurich and now Coburg), and is a scientific expert for the European Commission (FP7, H2020, Horizon Europe) and other funding bodies. He is an author or co-author of several dozen peer-reviewed publications (incl. one best paper award), has authored/co-edited two books and holds several patents in the areas of information retrieval, natural language processing, and mobile computing. He has been twice winner of the Thomson Reuters inventor of the year award for the best patent application.
  16. 16. About KnowledgeSpaces® ● Contact for consulting: E-Mail: info@knowledgespaces.de Phone: +49 (172) 904 8908

×