Dark data refers to the large amounts of unused data organizations collect during regular business activities. While organizations invest heavily in collecting data, much of it remains unused. There are three main types of dark data: existing unstructured internal data, non-traditional unstructured external data, and data available on the deep web. Analyzing dark data can provide valuable insights but also risks such as privacy issues. Some companies are already leveraging dark data for applications like fraud detection and personalization in retail. Approaching dark data requires getting the right data, augmenting with external sources, building data talent, and using advanced visualization tools.
2. What is dark
data?
The information assets organizations
collect, process and store during
regular business activities, but
generally fail to use for other
purposes.
- IT Glossary by Gartner
3. In simple terms, dark data is all that useful data an
organization possesses, but doesn’t actually
meaningfully use or analyze for the improvement of the
business.
4. The enormous digital universe
2013
2020 44 ZB 37% 27% 10%
4.4 ZB 22% 17% 2%
Total size of
digital
universe
Data useful
If
analyzed
Data from
mobile
devices
Data
from
Embedded
systems
5. According to IDC (a research firm), up to 90 percent of the
digital universe is unstructured data.
6. Traditional sources of dark data
Server log files
Networking machine data
Point-of-sale feeds
Customer queries recorded in calls, emails, forms
Underused employee data
Meeting notes
Unstructured information arising out of business mails and presentations
Unused data resulting from business research and surveys
7. Why is it
important?
Businesses are heavily invested when
it comes to collection of data;
however, tangible value can be
derived only after companies start to
understand their dark data and how
it can be applied.
8. It is also a sensible step for any company which is getting
started with big data and building a data warehouse.
In this case, dark data can be a reliable source of historical
data.
9. 3 facets of dark data
Existing
unstructured data
01
Nontraditional
unstructured data
02
Data in the deep
web
03
11. Unstructured data such as emails, notes, messages, documents, logs,
and notifications (including from IoT devices) are confined to the
organization and remain largely unused (due to lack of tools and
techniques or their absence in the database).
These data assets could be potentially having valuable insights related
to competitors, pricing and consumer behavior.
12. Nontraditional unstructured data
Data present in the web pages, audio and video files and still
images are largely untapped data that can be mined via data
extraction solutions, computer vision, advanced pattern
recognition, and video and sound analytics.
13. This can help businesses perform advanced analytics on data
present in nontraditional formats to better understand their
customers, employees, operations, and markets.
14. Data present in the deep web
The deep web presents the largest pool of
unused information—data curated by
academics, consortia, government
agencies, communities, and other third-
party domains.
15. Companies can potentially curate competitive intelligence
using a type of emerging search tools developed to help users
target scientific research, activist data, or even hobbyist
threads found in the deep web.
16. An example of such tool can be Stanford University’s search
engine called Hidden Web Exposer that scrapes the deep web
for information using a task-specific, human-assisted
approach.
18. Legal and
regulatory
issues
If the data stored is covered by legal
regulations such as credit card data,
exposure of such data could expose
companies into financial and legal
liabilities.
19. Intelligence risk
Companies could intentionally or
unintentionally disclose proprietary
or sensitive data on business
operations, products, financial status
and business plans.
20. PR disaster
Companies are considered as
protector of data they collect. So, any
loss of data, especially sensitive and
confidential data, can lead to loss of
reputation.
21. Opportunity
costs
If a company avoids analysis and
processing of dark data but its competitors
do, then its competitors will be in a better
position to capture more market share by
leveraging the insights from dark data.
23. Stitch Fix, an online subscription shopping service, uses images from
social media and other sources to track emerging fashion trends and
evolving customer preferences.
Personalization in retail
Questionnaire
filled by clients
Customer’s
Pinterest board
and social
media scanned
Data
augmentation
Deeper insight
of customer’s
style preference
Appropriate
clothing
shipped to the
customer
24. A financial services firm wanted to gain insight from its trading terminal data to find
correlations between trading patterns and abuses like money laundering and other fraudulent
activities.
Most of the data was dark owing to the volume and geographically scattered storage.
After the customer was able to utilize what was previously underutilized, and completed the
data prep and analysis process to determine suspect patterns in transactional records, they
took that analyzed data and created sophisticated predictive models that can identify activities
that indicate the potential for fraud, and take measures to prevent fraud before it occurs.
Fraud detection
26. Instead of attempting to discover and
collect all of the dark data hidden
within and outside your organization,
work with the business team to find
answers for specific business
problems.
Getting the right
data
27. Source data from the web to
augment your own data with publicly
available demographic, location, and
statistical information.
Being open to
third party data
28. Data scientists are valuable
resources, especially those who have
the skills to combine deep modeling
and statistical techniques with
industry or function-specific insights.
Building data
talent
29. Advanced visualization software can boost
business intelligence by repackaging big data into
smaller, more meaningful chunks, delivering value
to users much faster.
This is crucial since information can be more
easily consumed when presented as an
infographic, a dashboard, or another type of
visual representation.
Utilizing
advanced
visualization
tools
30. Future of dark data
Most of the companies in general will learn to better tap
into their dark data, it’s the way connected and
measurable world is progressing.
The real value will be delivered to those business that
would open their data sources in a secure and
responsible manner within their business so that the
workforce is empowered enough to become problem
solvers in own right.
31. Reach out to PromptCloud — a pioneer in custom, managed and cloud-based web
extraction services.
https://www.promptcloud.com | sales@promptcloud.com
Looking to augment data assets with web data?