A conversational tour through some things I’ve learned in helping scale-up stage client companies improve their AI development practices, especially where deep neural nets (DNNs) are in use.
3. BACKGROUND
• A conversational tour through some things I’ve learned in
helping scale-up stage client companies improve their AI
development practices, especially where deep neural nets
(DNNs) are in use
• YMMV
5. INITIAL TRAINING
DATA
• Where initial training data creates broad parameters for a
model (a.k.a. pre-training), and further training data will
create greater detail or precision:
• +ve: compresses amount of pre-training data
• -ve: errors or hidden variables can hamper performance of
the ultimate model
• Initial training data needs to be selected with care for
representativeness of the larger data set
• If generative changes in the process being modelled will
take place over time, the model and pre-training will need
to evolve
6. OUTLIER EVENTS
• A way is needed to respond
gracefully to events that
were not well captured in the
training data
• A simple common approach
is a voting system of multiple
models, to determine if the
outlier event is likely correct
or an error/exception
REALITY GAPS
• Watch out for the potential
for system failure in real
world use if the training
environment or data are
simplified representations of
reality
REALITY GAPS AND
OUTLIER EVENTS
7. GENERATIVE
MECHANISM CHANGE
• Periodically revisit if generative mechanisms or standards
of practice have changed, and the model needs to be
rebuilt
• The conditions under which training data come to be are
often different than in wider, forward-looking usage
• Classic dilemma:
• On any given day, it is usually easier to keep coaxing
along the current model, rather than doing a larger
redesign and rebuild
• It can help objectivity in the heat of battle to have
established a prior threshold criteria which will signal that
sufficient change has likely occurred to trigger a model
rebuild
9. • Akin to satisficing vs. optimizing; accuracy vs. precision
• Design dividend: Simpler representations lend themselves
more easily to heuristics to foster design intuition for
developers and maintainers of models
With somewhat simpler models, the impact of change
can be better anticipated – in data, in a single model,
or in a system of multiple models
• Further benefit of simpler models:
Better explain-ability, and understanding of the major
influences in the model, as well as the combinations of
rare or decisive events which can make the model tip
from one output state to another
• To boot, overly optimized models are often at greater risk of
failure or slowing the larger systems in which they operate if
usage conditions change unexpectedly
Often,
slightly lower
performing
models
perform
better over
time as
conditions
change
ROBUSTNESS
10. THIRD PARTY
PERSPECTIVE
• 3rd Party View Diagnostic Tools
• It is often good practice to build a visualization of what the
system “sees” looking out at its constituent parts, its
interfaces (both internal and external), and external
environment
• A 3rd party view tends to reveal a lot about the
perspectives which shaped how the system was built, what
influences it, what it ignores or downplays, and which
combinations of circumstances cause sharp changes in
performance
11. THIRD PARTY
PERSPECTIVE
• 3rd Party View Diagnostic Tools (cont’d)
• Especially in systems AI (multiple constituent models), the
real issue often comes down to understanding how
bottom-up performance of component models affects other
parts of the system and overall system performance (ex:
common issue in self-driving vehicle development)
• Stated differently, transparency and predictability is about
understanding the alternatives that the overall system
sees, and what it is capable of seeing
12. OVERFITTING
• Main Risk:
• Some AI models are no more compressed than the training
data, especially with DNNs
• A warning flag should usually go up if deep learners are
being used with training data points that number only in the
thousands
• Pragmatic criteria are needed to see if a more general
model is being built, to build assurance of achieving robust
predictive value in forward use of the model
13. THEORY OF MINIMUM
INFORMATION
• Theory of Minimum Information Thought Experiment
• Keep an eye on what subset of information would yield
most of the benefit of the larger model
• Similarly, always look at what the model could or should be
if one or a few of the data inputs were to become
erroneous or spurious in performance
14. BREADTH OF DATA AND
RELATIONSHIPS REFLECTED
IN THE DATA
• The training data and model need to reflect temporal,
spatial and causal relationships
• Otherwise, the model will only find correlations, and be at
greater risk of generating nonsensical errors
15. BREADTH OF DATA AND
RELATIONSHIPS REFLECTED
IN THE DATA
• The power zone of AI is when the data is rich enough to
detect valid patterns that exploit volumes of information
far beyond what humans can process
• The Gold Standard: When rare but decisive patterns can
be detected and acted upon
• The delicate balance: AI tends toward generalizations,
when some systems need to allow for idiosyncratic
behavior
• In some cases, especially when a lot of training data is
available, it can be useful to not filter data too early
based on preconceived notions of expected model
function; the rawest data may contain useful information
which has previously escaped notice
16. SOCIAL AND CULTURAL
ISSUES
• Models for social process are subject to individual and
group norms which vary greatly across the world
• Beware the classic trap of assuming that others have the
same values or conventions as we do
• Social and cultural processes tend to be underspecified,
i.e. there is a larger unstated context than what appears in
the data
• Often then, there needs to be a top-down inference
engine to estimate the larger context to get to a high
level of overall performance; purely bottom-up efforts to
construct larger predictions will often not perform as well
17. ABRIDGED REPERTOIRE OF
MODELLING AND MODEL
EXPOSITION TECHNIQUES
• Having AI developers think out loud (talking or writing)
• Transforming words to images
• Makes explicit things that were often implicit or hidden
• Aids further inferences to enhance model value
• Metaphors, analogies, benchmarks, adjacencies, and thought experiments
• Increasing the availability of candidate representations can help a lot, expanding variety in
reference frames
• Comparative study helps raise the level of model abstraction
• Analysing the well-documented experiences of AI development in self-driving vehicles,
facial recognition, gaming, and medical diagnostics can provide a lot of instructive value
for development practices in other fields
• Induction from observing the AI closely, including brute force parameter variation
• Differential or difference equations, and systems thereof
• Statistical & clustering analyses and probabilistic inference
• Simplified simulation
• Longitudinal analysis
• Articulating not just core assumptions and dependencies, but boundary conditions as
well
18. REPRESENTATIONS,
COMBINATIONS AND MODEL
EVOLUTION STEPS
• Isomorphic representations can provide clarity and useful
simplicity, revealing unwarranted complexity
• Because representations are used in design and
development as much for discovery as verification, creating
multiple representations also tends to help model creators go
beyond disambiguation to extend knowledge
• Combinations of techniques are often powerful to reveal
explanation and greater predictive value from models
• With the insight benefit from additional representations,
sometimes it becomes necessary to go back several
evolutionary steps of development before going forward
again on a new evolutionary branch to attain ultimately
higher performance
19. TRANSPARENCY,
PREDICTABILITY AND
DIVERSITY
• Taking advantage of these techniques for improving
transparency and predictability of AIs requires a respect
for and openness to the potential contribution of
intellectual diversity in model formation, testing and
optimization
• Useful advantages come from learning or knowing relevant
things that others don’t know or appreciate
• The cumulative effect of such gains can become
formidable over time
• Often though, some additional leadership effort is required
to gain full benefit, since interdisciplinary perspectives as a
source of intellectual diversity can become more
challenging to incorporate over time as intra-disciplinary
norms and dialects increasingly specialize
20. THE POWER OF ASKING
WHY
• There is frequently a temptation in the interest of time to
rely on correlations, without causal understanding of why
the model performs the way it does
• Correlation without causality has initial benefits for allowing
hypothesis-free forms of exploration
• Useful findings can be obtained from such systems-level
views of what is possible and actionable, rather than an
initial preoccupation on reductionism
• Over time however, reliance on models which perform
without understanding why is risky, or even dangerous
• Doing so largely sets aside the scientific method, all that it
has accomplished and will continue to achieve
21. CAUSAL UNDERSTANDING
AND BREAKTHROUGH
INSIGHT POTENTIAL
• Correlations and a functioning, if opaque, AI model should be the
beginning of the causal discovery process, not the end
• The evidence of unexpected but statistically significant correlations
from erstwhile hidden signals and relationships should be used to
drive causal inquiry
• Causal investigation is the only way to reliably separate out
confounding factors and better specify conditions of robust
prospective model use
• The added benefit of the drive toward causal understanding:
• Such insight usually paves the way to gain much further
differentiating, defensible IP and competitively important technology
of lasting significance
• The field of radiomics in diagnostic medical imaging provides a
useful case study for some forms of causal discovery practices
22. CAUTION ABOUT MAKING DEEP
LEARNERS MORE TRANSPARENT
AND PREDICTABLE
• Be cautious about accepting proposals to try to fix the
explain-ability difficulties of one opaque model through
the use of another
• Domain experts can be tempted to overreach in trying to
solve the problems of their technological field through
further application of the same technology
• While powerful AI technology can be built this way (such
as GANs), explain-ability and causal understanding often
require a different approach and greater technological
breadth
23. BACKSTOP TECHNIQUE
• If you’re really stuck to define a path for working forward
toward greater transparency and predictability, like just
about every other field of technology, thorough testing at
component, unit, integration and system levels can reveal
a lot about a model
• The known edges and boundary regions of performance,
as well as vulnerabilities, sensitivities, and potential for
dysfunction are always illuminating to build toward
functional understanding
• Knowing not just the region of safe operation, but the locus
of performance beyond which a technology fails, and how
it behaves in the zone of operation approaching the failure
realm is usually instructive to build toward transparency
and predictability
25. AI LEARNS WHAT IS
IN THE DATA
• Make sure the data conveys what the AI is to learn
• This is different from the software centric view of tuning the
algorithm until desired performance is achieved
• Data quality becomes extremely important with small
training sets
• Noise in small training sets can greatly hamper model
performance
• Raw training data based on human judgement may reflect
satisficing or historical values which will not be
acceptable in forward looking or machine-delegated
decision making
26. OBJECTIVE DATA
QUALITY MEASURES
• Develop objective measures of data quality, and have a
process for continuous improvement of data quality
• Useful forcing exercise:
• Hold the algorithm constant for a while, and see how much
improvement can be achieved purely through data quality
enhancement
27. DATA LABELING AND
VERSION/CONFIGURATION
CONTROL
• Labeling is a common source of data quality issues,
compromising the representation of ground truth
• Spot checking, statistical sampling, and increased
checking where errors have recently been detected can
help improve labeling
• Improvement efforts should look toward error proofing the
process for generating labels in the first place, not fixing
errors retroactively on a sustained basis
• Be especially wary if data, labels and models were built
when people were harried or in a rush
• Data benefits from similar version and configuration
control concepts as code
28. THICK VS. THIN TAILED
DISTRIBUTIONS
• The most constructive approaches to data cleaning and
error correction are greatly influenced by whether the data
distribution has thick or thin tails
• Thick tails are often a sign of multiple generative
mechanisms aggregated in the data
• Thin tails are typically associated with a single generative
mechanism producing the data
• The most efficacious data quality improvements vary
considerably depending on which of the tail phenomena is
the case
29. HIGH IMPACT CASES
AND DATA DEPTH
• Identify important cases and make sure the data includes
them with sufficient statistical depth
• Don’t forget about reserved test cases, not just training
data
• Map out happy paths, acceptable paths, exception cases
and error conditions
• Fault tree and failure propagation analysis can help
incorporate systems thinking in the data and the resulting
model
• More generally, understand the depth of the data over the
entire range of inputs, not just sub-ranges
30. FURTHER DATA QUALITY
IMPROVEMENT CONSIDERATIONS
• Arguably, data practices tend to be much more vertically
(application) specific, whereas algorithms tend to be more
horizontal in nature
• Ask what attributes of the data need to be monitored in
production to see if model/concept drift is taking place
• Instrumenting the distribution of production data along with
cohort and trend analysis can often be used to help signal
if drift may be occurring
• Do sensitivity analyses to know what kinds of data the
model performs poorly on, and build mechanisms to go
acquire or generate more of that data
33. CHANGE TRACKING
AND USER SEGMENTS
• If the AI will be used in industries where there are
regulatory and legal compliance issues for the model:
• Change tracking and performance change monitoring are
typically required
• Performance against baseline and historical performance
• Assessment of occurrences and rates of false positives
and false negatives
• Performance of sub-types of users or patients
• Averages can conceal a lot of things, especially with thick
or long tails to the distributions where multiple generative
mechanisms may be at work
• In general, the better people can provably explain why a
model performs and changes the way it does, the better