This document discusses challenges, risks, and how to handle them with AI in the real world. It covers:
- AI can perform tasks like driving a car faster and cheaper than humans, but can't fully explain how.
- Deploying and managing AI models at scale is complex, as is integrating models with user experiences. Bias and lack of transparency are also risks.
- When applying AI, such as in high-risk domains like medicine, it is important to audit models, gradually introduce them with trials, monitor outcomes, and find ways to identify and address errors or unfair impacts. With care and oversight, AI can be developed to help more people than it harms.
AI in the Real World: Challenges, and Risks and how to handle them?
1. AI in the Real World
Challenges, and Risks and how to handle them?
Srinath Perera, Ph.D.
VP Research WSO2, Apache Member,
Advisor, Entracer
( srinath@wso2.com)
@srinath_perera
2. 2
• Some tasks (e.g., make a coffee)
we can explain, precisely. We can
write a code to do them.
• Other tasks, e.g., drive a car, we
can do, but can’t explain how. We
can’t teach our children how to
drive by writing it down. Such
“intelligent tasks,” we have to
give them examples and feedback
and train them.
• AI can learn from examples to do such
“intelligent tasks.”
• Andrew Ng’s rule - “AI can do anything
human can do in 10 sec or less.”
What is AI?
3. 3
• Compared to humans, Computer doing these
tasks (With AI) are
• Faster (1K-1M times) - e.g. 1s vs 11 days (
so can do more detailed analysis)
• Cheaper to replicate
• Reliable
• Learn from every mistake
So What?
5. Laplace's Demon
• “if someone (the demon) knows the precise location and
momentum of every atom in the universe, their past and
future values for any given time are entailed; they can be
calculated from the laws of classical mechanics.” [1]
• This mess up many things, including free will.
• Understanding this leads to advances in physics
( including thermodynamics) and later to Chaos theory.
• But we are often acting as if AI is a Laplace’s demon
1. https://en.wikipedia.org/wiki/Laplace%27s_demon
5
7. Fineprint: AI Works only If
7
• If there is a pattern to be found
• If our algorithms can find them
• If future is like the past ( mostly)
• behaviors does not significantly change
• Not a Wicked Problem ( where behaviors change
based on participants response). e.g. stocks
• If data is representative
• Not biased
• We have enough data
8. Your model is
wrong, and you
will never know
whether it is
right.
8
Three ways of looking at this
Behold the
magic of AI,
how dare
you
question?
Yet all models are wrong, but some are useful and most need to be verified empirically
9. The first step in
solving a problem is to
recognize that it does
exist. --Zig Ziglar
9
10. Weather as an Example
• We have tried to solve weather for long
time
• It is very hard problem
• However, simulations has become pretty
good
• We check for butterfly effect by checking
many initial conditions
• If all converge we are good. Further the
prediction time, more likely we get Chaos
10
11. We need to embrace the complexity and Handle them
• Building Models are hard
• Deployment is Complex
• User integration is hard
• AI has it’s own risks, and they
are much more critical than
Human mistakes
11
12. What can we do? Trust but verify!
• Audit your models
• Incorporate feedback
• Measure accuracies against
estimates ( e.g., use Bayesian
updates)
• Test the business impact of
your models using randomized
controlled trials [ref] and
introduce gradually.
12
13. I am going to go through a depressing list of why AI is hard
But don’t panic, I will
tell how to handle
them ( well at least
some of them).
13
14. Challenge: Lack of Skilled Professionals
• The data scientists, programmers, and
architects are in short supply and
expensive.
• It is hard for medium and smaller
organizations to attract, hire, and keep
enough skilled people.
• Bias and Interpretability makes this hard
• Limited Solutions ( Wizards, Mapping
AI to DB (e.g., BaysDB) or spreadsheets,
Automatic Statisticians)
14
• Need different thinking and
Skillset
• Deeper knowledge in math and
statistics
• Empathy to see how solutions
are used
15. Data Scarcity and Quality
• Lack of Large Enough Data Sets.
• With 10,000 data points per day,
it takes 3 years to collect 1
million data points
• Labeled data sets are hard to find
• Solutions (limited)
• Transfer learning
• Semi & Unsupervised learning
15
16. Deployments are Complex
• Setting up continuous pipeline of data to models is far
from simple[ ref]
• Implementing trust and verify need complicated
handling
• Tracking what data is used for a given version of the
model
• Python models vs. other language runtimes
• Most models barely do 50 TPS, so likely you will have
to autoscale also
16
17. AI delivered via Cloud APIs
• Concentrate expertise and Data
• Enable companies offering the
API to focus on a problem or
domain
• Remove complexity of
deploying and managing
models
• Let client organizations start
small and expand
17
Among examples are
disease diagnosis,
marketing insights,
spatiotemporal models,
and fleet management,
and device management.
can address many concerns
18. AI as Cloud APIs (Contd.)
• Only possible when If data formats are
well known (e.g., banking or healthcare
data) and key performance indicators
(KPIs) are well defined. Then reusable
models can be built for those use cases.
• Incentives for vendors to do this as it let
them collect data
• Tension between privacy and competitive
advantage
• Personalisations is needed
18
19. User Experience: Precision vs. Recall
Consider population with 0.05%
having a disease. Assume we
have a model diagnose the
disease with 99% accuracy.
19
Actual
(+)
Actual
(-)
Predicted
(+)
5 9995
Predicted
(-)
495 989505
• Precision ( give model the said yes, how likely it is to
be correct) is 0.05%, which does not work
• Precision vs. recall (what percentage of cases are
detected) is valued differently.
• If prediction stops a customer transaction, precision
is valued
• Bank doing on the side fraud detection said analyst
are cheap, they need 100% recall
• If a new model has low precision, users will give up
20. User Experience: Handling False Positives
• Although 99% accuracy sounds impressive it is not
• In Sri Lanka, if model is applied nationally, that means 200K will be
assessed wrongly
• Can they appeal? How would they know?
• Are we going to fix when we know?
• This is OK when errors are apparent and can be fixed (e.g. wrong
recommendation)
• If used again and again, errors adds up. Example, if we use AI model with
99% accuracy to estimate 50 properties, 40% of users will have at least one
property wrong
20
21. User Experience: Incorporating User Input
• If prediction can be verified by other means (e.g.,
after time passed), user feedback is not critical
• If not, user feedback should be collected (e.g., via
just clicking a link in the mail) and incorporating it
into the Models pipeline.
• Models can be deployed gradually using canary
testing ( show the model a small percentage of users)
and business outcomes verified.
21
22. Risks: Bias
• AI can learn and repeat the inherent bias in the data
caused by human behavior (e.g., Book “Weapons of
Math Destruction” by Cathy O’Neil.)
• Removing bias is hard - For example, an address in a
certain neighborhood might act as a proxy for the race;
the name might act as a proxy for gender, or name of
the degree might act as a proxy for age.
22
23. • AI, once figured out, is cheap to repeat, we will do lot of them
• AI put the decision in the background and we do not see them ( even
when they are not working) unless we watch the KPIs carefully
• Human errors and bias are diverse and there is strength in that diversity.
E.g. if one interviewer does not like you someone else might. AI likely to
lead to few win all situations on which case, if AI is biased, you are
screwed.
• Biased AI based gatekeeper keeps some people out and we do not see
them at all
23
AI weaknesses can be worse than humans Weakness
24. 24
Example: AI Model to Screen Students to University Interviews
• If model decided certain demographic is not suitable, we will no longer
have training data on demographic, thus self biasing our models ( and
freezing it in time)
• Human errors and bias are diverse. If interviewer Sarath might not like you
but Nimal might. If AI is the screener, you are in trouble.
• We have lost independent failures, which provide stability.
• We might decide to expand this to high schools as well ( as additional cost
is small) expanding potential impact.
25. What can we do? ( in this case)
Accept 1% of the CVs
rejected by AI for
interviews, monitor that
against the model error
rate, check that against
bias
25
26. • Audit AI decisions so we
can study them, debug
them, and fix them
• Be creative on testing AI
models
• Expose them gradually, do
randomized trails
• Actively monitor KPIs
26
What can we do?
27. Conclusion: embrace the complexity and Manage
• Building Models are hard
• Deployment is Complex
• User integration is hard
• AI has it’s own risks, and they
are much more critical than
Human mistakes
27
Create Cloud APIs
and Start with
Cloud APIs
Monitor and
Manage Models
Invest Deeply Into
Understand the
Use Cases
Always remember, although useful, it is a beast,
and we must be vigilent
28. • Yes!!
• Everyday, many dies due to lack of
quality medical advice
• AI model can serve everyone much
cheaper, and likely more accurately
and it will learn from every mistake
• So we kill by omission as well
• We need to take risks, but
understand and manage them.
28
Should we do AI for risky cases at all? e.g Medicine