Invited Talk at the ACM JCDL 2018 WORKSHOP ON CYBERINFRASTRUCTURE AND MACHINE LEARNING FOR DIGITAL LIBRARIES AND ARCHIVES. https://www.tacc.utexas.edu/conference/jcdl18
Time Series Foundation Models - current state and future directions
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing
1. What Can Machine Learning & Crowdsourcing
Do for You?
Exploring New Tools for Scalable Data Processing
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Slides:
slideshare.net/mattlease
2. “The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 65 universities around the world
www.ischools.org
What’s an Information School?
2
3. • Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
Roadmap
7. • Kumar et al., CIKM 2011
Dating Biographies without Time Mentions
Plato (428-348 B.C.) Lincoln (1809-1865)
7
8. Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
9. Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
15. Crowdsourcing
• Jeff Howe. Wired, June 2006.
• Take a job traditionally
performed by a known agent
(often an employee)
• Outsource it to an undefined,
generally large group of
people via an open call
15
18. • Marketplace for paid crowd work (“micro-tasks”)
– Created in 2005 (remains in “beta” today)
• On-demand, scalable, 24/7 global workforce
• API lets human labor be integrated into software
– “You’ve heard of software-as-a-service. Now this is human-as-a-service.”
Amazon Mechanical Turk (MTurk)
19. Collecting Data from Crowds
2008: MTurk sparks “gold rush” for ML training data
• Information Retrieval: Alonso et al., SIGIR Forum
• Human-Computer Interaction: Kittur et al., CHI
• Computer Vision: Sorokin & Forsythe, CVPR
• NLP: Snow et al, EMNLP
– Annotating human language
– 22,000 labels for only US $26
– Crowd’s consensus labels can
replace traditional expert labels
22. ACM Queue, May 2006
22
“Software developers with innovative ideas for
businesses and technologies are constrained by the
limits of artificial intelligence… If software developers
could programmatically access and incorporate human
intelligence into their applications, a whole new class
of innovative businesses and applications would be
possible. This is the goal of Amazon Mechanical Turk…
people are freer to innovate because they can now
imbue software with real human intelligence.”
26. But Who Protects the Moderators?
Dang et al., HCOMP’18 & CI’18 26
27. What about ethics?
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”
• Irani and Silberman (2013)
– “…by hiding workers behind web forms and APIs…
employers see themselves as builders of innovative
technologies, rather than… unconcerned with working
conditions… redirecting focus to the innovation of human
computation as a field of technological achievement.”
• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately
value ethics above cost savings.” 27
28. Summary
• Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
29. The Future of Crowd Work
Paper @ CSCW 2013 by
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton 29