The concurrent rise of big data, modern hardware, and deep learning promises to transform analytics within healthcare and life science organizations. But big data is expensive to annotate, and not all data is created equal. The practicality of most problems requires inference from small datasets and incorporation of external knowledge. This is particularly true for tasks that involve natural language processing. John will discuss several methods that allow us to introduce prior knowledge to learn from small data, including deep contextual representations, inductive transfer learning, and adversarial augmentation.
Presentation at the Joint Meeting: Nashville Data Science and Greater Nashville Healthcare Analytics on June 18, 2019.
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Machine Learning with Small Data
1. Machine Learning with Small Data
John C. Liu, Ph.D. CFA
June 18, 2019
Twitter: @drjohncliu
2. Disclaimer
THE INFORMATION SET FORTH HEREIN HAS BEEN OBTAINED OR DERIVED FROM SOURCES GENERALLY
AVAILABLE TO THE PUBLIC AND BELIEVED BY THE AUTHOR TO BE RELIABLE, BUT THE AUTHOR DOES NOT MAKE
ANY REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, AS TO ITS ACCURACY OR COMPLETENESS. THE
INFORMATION IS FOR EDUCATIONAL PURPOSES ONLY AND IS NOT INTENDED TO BE USED AS THE BASIS OF ANY
BUSINESS OR INVESTMENT DECISIONS BY ANY PERSON OR ENTITY. ALL OF THE INFORMATION CONTAINED IN
THE PRESENTATION IS SUBJECT TO FURTHER MODIFICATION AND ANY AND ALL FORECASTS, PROJECTIONS OR
FORWARD-LOOKING STATEMENTS CONTAINED HEREIN SHALL NOT BE RELIED UPON AS FACTS NOR RELIED
UPON AS ANY REPRESENTATION OF FUTURE RESULTS WHICH MAY MATERIALLY VARY FROM SUCH
PROJECTIONS AND FORECASTS.
3. Roadmap
• Introduction
• Big Data Revolution
• What about Small Data?
• Dealing with Reality
– Semantic/Contextualized Representations
– Experimental Design
– Adversarial Data Generation
• Conclusion
12. • 14 million images
• 20,000 categories
• 25 Human Years to annotate!
Source: Li Fei-Fei. (2010). ImageNet: Crowdsourcing, benchmarking & other cool things
14. Ways to Deal with Small Data
• AWS Mechanical Turk (e.g., ImageNet)
• CrowdFlower/Figure8/Appen
• Hire SMEs
• Data Augmentation/Synthetic Generation (SMOTE)
17. Not All Data is Created Equal
https://pypi.org/project/imbalanced-learn/
Source: Rishabh Misra
18. Training a Cat/Dog Classifier
• Which training samples are more useful?
Photograph:American Kennel Club
Photograph:Atchoumfan
Photograph:Sujoy Roychowdhury
19. Oncology Text Classifier
Which training samples are more useful?
1. Left medial foot and ankle pain and swelling. Plantar
metatarsal pain for 5 weeks. No known trauma.
2. Dorsal right medial upper back pain for 10 weeks. Right
parotid mass.
3. History pancreatic cancer. Status post aortic
chemotherapy and Whipple procedure
31. Did We Solve the Tiger Problem?
• Generalize with only a single label? (One-Shot Learning)
• If I described a lion, would you recognize one if you never
ever saw one? (Zero-Shot Learning)
• Did the chicken come before the egg, or vice versa?
(Causality)
33. Not Random
• Each CIFAR-10 image = 32x32 pixels by 3x256 colors
• Number of possible permutations = 786432!
Source: Krizhevsky, Alex. (2009). Learning Multiple Layers of Features from Tiny Images.
34. Not a Possible Permutation
Source: Goodfellow, Ian. (2016). Generative Adversarial Nettworks.
35. How many Laws of Physics are
sufficient to describe motion?
Photograph: Richard Jognston
39. My New Book
A comprehensive resource that
builds up from elementary deep
learning, text, and speech
principles to advanced state-of-
the-art neural architectures.
On Amazon, BN, Springer
https://www.amazon.com/Deep-Learning-
NLP-Speech-Recognition/dp/3030145956