Innovation and economic growth depends on company's ability to gain insight into data. However, data is growing exponentially, but our ability to make use of it is not. Untapped economic value resides in this unutilized data, called "dark data." This presentation looks at some of the causes for the explosion of data, some of the impediments preventing exploring and creating business value from dark data; and some ideas for ways around those impediments.
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Dark Data: Where the Future Lies
1. Dark Data: Where the Future Lies
Vince Kellen, Ph.D.
Senior Vice Provost
Analytics and Technologies
University of Kentucky
Vince.Kellen@uky.edu
March 5, 2014
This is a living document subject to substantial revision.
2. The economic case
The global economy is now [permanently] fueled by information
Innovation is becoming the merging of human creativity and
increasingly automated information extraction
Data is growing exponentially, human creativity ‘cycles’ are not
We are going to need [novel, surprising, freaky] ways of increasing
the speed of information extraction from vast and growing data
reserves
Finally, we are going to have to develop [novel, surprising, freaky]
economic ‘infrastructure’ to foster emergent designs for turning
extracted information into wealth creation faster
2
3. [Population, wealth, technology, knowledge]
Hunting and foraging
Agricultural revolution
Rise of the ‘world system’
Industrial revolution
Post-information revolution
Sources: Wikipedia; various; UN Report World Population to 2300 (2004)
Diffusion accelerates technology adoption
Communications technology accelerates diffusion
4. World’s technological installed capacity to store information
Hilbert M, Lopez P. 2011. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science. Vol. 332 no. 6025 pp. 60-65.
5. The total world's information,
which is 1.8 zettabytes, could
be stored in about four grams
of DNA.
Harvard stores 70 billion books using DNA.
Research team stores 5.5 petabits, or 1 million gigabits, per cubic
millimeter in DNA storage medium
http://www.computerworld.com/s/article/9230401/Harvard_stores_70_billion_books_using_DNA
Photo: Kelvin Ma for the Wall
Street Journal
Dr. Church keeps a vial of
DNA encoded with copies of
his latest book.
10. As data grows exponentially, so does dark data
Dark data
10
11. Rate of innovation, pace of urban life
In order to sustain exponential economic growth rates, the rate of
innovation must increase. Otherwise we will not have exponential
growth
Information flows through human culture (cities) is akin to blood
flowing through a circulatory system.
• Both cities and animals conserve physical energy (molecules). As both
get bigger, they conserve energy
However, the two systems have two fundamentally different
behavior when it comes to ‘output’
• As cities get bigger, their ‘pace of life’ and economic output increases.
The rate of information flow quickens
• As animals get bigger, their ‘pace of life’ and metabolic output
decreases. The rate of metabolic flow decreases
Bettencourt, et al. (2007). Growth, innovation, scaling and the pace of life in cities. www.pnas.orgcgidoi10.1073pnas.0610172104
11
12. Information rules
Information quickening drives economic growth, encouraging consumption
(and conservation) of molecules
• While fears of a Malthusian collapse have haunted economists forever,
innovation and technology has enabled growth so far
• Analytics can lead to productivity increases
Information’s dominance in the economy appears to be causing slowing or
reversing population growth rates
• Rising populations drive rates of innovation and economic growth. No
population growth might be worrisome
• Is rising information unexpectedly going to cool down the economy?
While innovation allows both growth and efficient use of resources, to
sustain growth we are going to need more innovation, not less!
• Increasing stores of information and means of action will be needed
• DARK DATA WILL NEED TO BE MINED
12
13. Bits versus Atoms
Physical material exhibits limits to scale. Data does not. Computing cost-
effectiveness growth enables exponential data growth
13
14. Two overlapping, interacting systems
The two systems now interact. Less molecules create more data. Information fuels
economic growth, reduces population rates, improves utilization of molecules. The
rate at which dark data is applied will affect all these rates
Molecules
Dark data
Information
14
15. Pause. Where are we?
Data and information are very important at this point in human history. How
do we take advantage of these megatrends?
15
16. Production and consumption of information
In order to unleash dark data, we have to worry about two
problems: better production of information from dark data reserves
and better means of applying mined information to economic
activity (consumption)
Production
• We will need new purely human, purely technical and human-technical
ways of extracting information from growing reserves of data
Consumption
• We are desperately going to need [old, new] human beings with a very
different orientation to data and decision making
16
17. Production ideas
Crowd-sourced and community sourced analytics.
• Skills will be scarce. Have to do a better job of matching analytics to global
skill sets
Dark data exchanges
• Can we sell our dark data to others for their exploitation? Can we buy others
dark data?
Dark data reserves exploration
• More use of automated means of discovering data reserves and cataloging
their location. Idea generation on possible value from mining
Data refineries
• We need to improve the rate at which data can be refined. Automated
metadata extraction, automated data quality detection, semi-automated model
construction, elimination of ‘one-off’ models and better reuse of partial or
complete models
17
18. Production ideas
Make widespread use of rapid data discovery tools. The ability to go from
the first question to the final answer quickly matters greatly
Combine purely automated technical methods of extraction and
refinement with human, collaborative processes to further refine the data
Develop and use refined, automated data movement tools
Increase data’s ‘surface area’
through careful model design aimed
at facilitating regular analysts’ use
Increase data transparency, make
available to many more analysts
Utilize new ways obscuring data to
improve privacy and security without
sacrificing pattern discovery
18
19. Information consumption dysfunction
The No. 1 impediment for improved use of dark data is human
psychology. The dominant regime for managing information and power
must end
This regime has the following attributes:
• Define goals and try to achieve them
• Maximize winning, minimize losing
• Unilateral control and accountability
This regime causes the following dysfunctions
• Information is power, thus data is hoarded, metadata formation is guarded,
‘framing of problems’ becomes a competitive battlefield
• Gamers that rely on data obfuscation to make untestable claims
• Reliance on personal anecdote and sample sizes of 1
• Threat-induced reactions to difficult data, causing data suppression
• Cover-ups, manipulation of others, assaults on autonomy and agency
See Chris Argyris and Double Loop Learning. http://en.wikipedia.org/wiki/Chris_Argyris 19
20. The problems with the dominant regime
It’s in our nature, all humans are
highly skilled at this behavior. Part
of being a child and parent
It is toxic to creative, high IQ talent
It inhibits team performance
It creates internal political theater
It limits terribly the application of
insights from dark data
It causes awfully bad, if not tragic
public spectacles
20
21. Needed: A new culture of information
A new cultural model needs to develop, based on the following
attributes
• Transparency. Provide equal access to all sides of a debate
• Rapid validation. Find and use tools that let all sides of a debate
analyze, validate or refute insights into data
• Instead of maximizing winning and minimizing losing, encourage
small, fast failure. Instead of ‘punishing’ individuals, put the focus on
team rewards and multi-lateral control
• Instead of empowering leaders so that accountability can be overly
simple, establish more intricate performance measurement systems
that stabilize the enterprise, provide better feedback to many
The future of exploitation of dark data will be owned by teams that
can collaborate well, challenge members productively and stay
together long enough to turn the data into economic wealth
21
22. How can you spot the person who can’t succeed?
Shine light on their data and data management processes. Ask
them to document and share details about their model. See if they
will allow others to independently verify their results. Engage in a
conversation about their model assumptions
Gamers playing under the old rules will typically do the following
• Defer, delay and avoid the meeting or producing the evidence
• Refer to concepts like ‘we’re the experts’ or ‘we can’t explain it to non-
experts’
• Change the subject
• Cite powers outside of their control that limit their ability to respond
• Go undercover and hide for a while
You can’t succeed with a house full of gamers
22
23. Building expert teams takes skill and time
Expert teams share a clear and common purpose and a strong mission
Expert teams share mental models
• Their members anticipate each other. That can communicate without the need for overt communication
They are adaptive
• They are self correcting. Their members compensate for each other. They reallocate functions. They engage in
a cycle of prebrief-performance-debrief, giving feedback to each other. They establish and revise team goals.
They differentiate between high and low priorities. They have mechanisms for anticipating and reviewing issues
and problems of members. They periodically review and diagnose team effectiveness and team vitality
They have clear (but not overly clear or rigid) roles and responsibilities
• Members understand their roles and how they fit together
They have strong team leadership
• Led by someone with good leadership, not just technical skills. They have team members who believe the
leader cares about them. They provide situation updates. They foster teamwork, coordination and cooperation.
They self-correct first
They develop a strong sense of "collective"
• Trust, teamness and confidence are important. They manage conflict well. Members confront each other
effectively. They trust each others intentions
They optimize performance outcomes
• They make fewer errors. They communicate often enough, ensuring members have the information to be able
to contribute. They make better decisions
The cooperate and coordinate
• They identify team task work requirements. They ensure, through staffing and development, that the team
possesses the right mix of competencies. They consciously integrate new members. They distribute and assign
work thoughtfully. They examine and adjust the physical workspace to optimize communication and
coordination
24. Other consumption ideas
Examining decision-making within the enterprise. Find bottlenecks
to faster decisions. Draw a new line separating central from local
agency. Let projects proceed with light/fast approval with follow-up
and audit later
More rapid or time-boxed decision making. Use agile approaches.
Minimally viable products. Incremental releases
Reward spontaneous collaboration. Design committees, teams,
units based on collaboration IQ rather than representativeness
Automate more decisions, starting with the mundane or risk-free
Define new roles with complementary analysis and application
skills. Hire more generalists with excellent critical thinking
24
25. CEO imperative
Designing an organization that can take advantage of dark data is
very difficult. It is a CEO problem
The challenge has many layers
• Understanding where to strategically apply dark data findings, how to
compete on analytics
• Ascertaining organizational and infrastructure readiness
• Establishing executive and employee incentive models that help
• Managing and monitoring progress at the technical, individual, team
and enterprise level
• Enforcing evidence-based decision making and changing the culture
• Designing the models to be used throughout the enterprise
CIOs can play a strong role, but the CEO, IMHO, has to own this
25
26. CEO Advisory Engagement
1. Strategic possibilities
• Examine the firm’s business model, value-creating activities
• Identify areas where analytics and data may help, through ideation sessions
2. Dark data inventory
• Document the data assets across the enterprise
• Categorize and rank by quality and availability
3. Value network assessment
• Evaluate the value for upstream and downstream players
• Identify potential sources, uses for dark data
4. Economic estimates
• Identify use cases, evaluate potential benefit and risks
• Prioritize opportunities
5. Organizational development and change management
• Identify culture issues, skill gaps, org structure changes, incentives, additional
resources needed, communications approach, timelines and sequencing
26
27. Summary
Information is redefining humanity in ways we still don’t understand. The
future is not certain. It will be written by winners
Economic growth depends on rates of innovation. Innovation depends on
new insights which come chiefly from data
Data is growing exponentially. Human ability to process it is not. Thus,
dark data is growing exponentially too
Firms differ widely in their [in]ability to mine data for information
(production) and apply information in decisions (consumption)
A [largely, partially] semi-automated analytic discovery and refining
capability is imminent
Winners will find new ways of organizing themselves and their
ecosystems to gain advantage, speeding up timeframes
27