SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
DCA: Current Themes and
Trends*
Alan Morrison
Data-Centric Architecture Forum
May 2023
1
Alain
Audet
at
https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/
*Separate talk to cover NLP/LLMs
Business goals enabled by a connected, shared data
ecosystem
2
Buying Helping
Making Selling
Sharing
Inhibitors to ecosystem-level sharing
● Data feudalism
● Poorly defined regulatory challenges
● Weak public sector
● Public apathy
● Technology + investor inertia and lack of clear vision
● Magic bullet syndrome
● Media groupthink
● Idol worship
● Pervasive myopia
● Lack of organization fox empowerment over hedgehogs
3
Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Shared Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
Challenge: Seamless, at-scale, FAIR data collaboration
5
James Kobelius, 2016
Association of European Libraries, 2017
6
Opportunity: Unitary data + description logic = knowledge
7
“Data management” (structured data,
mostly)
Knowledge management (internally
shared)
Content management (externally
shared)
Learning management (internal
coursework)
FAIR data and
associated
description
logic
FAIR data is data users can
have confidence in for
many purposes.
Data becomes FAIR when
it disambiguates concepts,
individuals and roles and
how they interact and relate
to one another.
In a knowledge graph
context, documented
knowledge = FAIR data.
Under the FAIR data umbrella are all heterogeneous
types of data/content.
To create a knowledge graph, users can start with a single triple
8
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
Simple way to start a business knowledge graph (besides using gist)
● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila!
You get a connected graph!
● ✨ Decentralize the process by having each team publish their own JSON-LD, for example,
let the sales team publish the sales data and ask them to link each sale to the correct product
and client.
● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power
of GPT to assist new teams in publishing their JSON-LD and integrating it back into your
enterprise-wide Knowledge Graph.”
Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites
GPT is trained on and connect it with internal data also modeled with schema.org
–#HT Tony Seale, UBS
https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc
9
Yes, data warehousing focused on the integration problem
10
● Pro: Identified the critical problem to solve
● Con: Advocated a method that doesn’t delve deep enough to solve today’s
problem
● Still face the unified data model challenge
No, data warehousing model conformance doesn’t scale
“I spent a good 15 years working in financial services at some
pretty big banks. Half of the IT change budget is spent on
integration and the by-products of integration….I saw as the
technology was advancing that the percentage wasn’t going
down – in fact, it was going up. At some point, is the integration
tax going to be 100 percent?”
– Dan DeMers, CEO of Cinchy
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video,
https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
11
How data warehousing stopped scaling
“They recognized that these themes ended up in all these legacy apps. Sales rolled up against a
geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have
those conformed dimensions and a small number of facts. Let’s bring the facts from all the
different systems and snap them together according to these conformed dimensions….
Brilliant idea, but I think what actually happened over time is the workload just got greater and
greater. The ability of people to actually conform those dimensions kept eroding….”
–Dave McComb, President, Semantic Arts
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
12
Data warehousing can’t solve today’s integration challenge
13
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
How did we get here? By selling the old as new
14
Why large-scale integration?
15
Large scale integration is essential to
avoiding observational bias. The drunk
looking for his money under the lamppost
analogy describes the nature of this bias.
The drunk is looking for his money where
the light is, even though he knows the
money is in the shadows.
To manage today’s business at scale,
enterprises need light and visibility
across departments, organizations and
supply networks
Semantic standards allow a desiloed data landscape for
interactive, interoperable digital twins and agents
16
Promise of digital twins and agents–way beyond APIs
17
Autonomous agents
Digital twins/
Small KGs
Locale: Portsmouth, UK
Sensor nets
Iotics, 2019
and 2023
How shared graph semantics helps
● Boosts meaningful results (result of lack of data and logic transparency and
cohesiveness) and relevancy
● Contextualizes data for management and reuse with relationship logic
● Scales meaningful connections between contexts (relevant relationships
living with entities)
● Enables Metcalfe’s network of networks effect (network_effectN
)
● Enables model-driven development via knowledge graphs (code once, reuse
anywhere)
● Provides access vIa KGs to logic programs as well as heterogeneous, smart data
● Scale efficiencies and economies so that energy consumption is reduced
18
KG centricity makes reliable, automated data webs possible
19
Data teams report spending 25-30% of their time cleaning, labelling, and
gathering data sets.... [Some can spend 80% plus]
What we know for sure is that data teams and knowledge workers
generally spend a noteworthy amount of their time procuring data
points that are available on the public web…”
It took Google knowledge panels one month and twenty days to update
following the inception of a new CEO at Citi, a F100 company. In Diffbot’s
Knowledge Graph, a new fact was logged within the week, with zero
human intervention and sourced from the public web.
– Merrill Cook, Diffbot Blog, 2021-2022
Example capabilities in Diffbot’s AI automated KG
20
Mike Tung, “VLDB2020: The Diffbot Knowledge Graph,”2020
“Decentralization”: Why you should care
● Further desiloing
● More systems federation
● More interorganizational use potential
● Data Centric approach to architecture
● “Decentralized/Web3 stack”
● More storage options and tiering
● Options at different temperatures (hot vs. cold storage) for new use cases
● More captive and independent storage
21
Simple web hosting + legacy Client-Server
storage
Early Web (on Client-Server)
Compute and storage more loosely coupled,
virtualized, controlled and data-centric
“Decoupled” and “Decentralized” Cloud
Application Distribution via Proprietary
and IP Networking
Client-Server and Desktops
Commodity servers + storage + some
virtualization
Distributed Cloud and Mobile Devices
1st
2nd
3rd
4th
5th
Centralized storage and compute, with
minimal networking
Mainframe and Green Screens
The Five Commingled Phases of Compute, Networking and Storage
22
Less
centralized
Time
More
centralized
Application
Centric
Data
Centric
All phases are
still active and
evolving
Degree of control assumes a continuum–not a binary split
23
See Thomas W. Malone, Inventing the Organizations of the 21st Century, MIT Press, 2003, 45FF.
SOLID: Federated storage and decentralized apps
24
Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
SOLID shared, federated XaaS: Construction industry
25
“TrinPod™: World's first conceptually indexed space-time
digital twin using Solid,” Graphmetrix, 2022,
https://graphmetrix.com/trinpod
Company-specific SOLID storage pods and access
control can be managed by each supply chain partner.
Graphmetrix as digital twin provider manages the
system and system-level apps.
Peergos makes personal file storage management possible via IPFS and a
browser
26
Peergos technology logical architecture, https://peergos.org/technology, 2019
Peergos is a personal data
dcloud storage environment
that also uses blockchain
based decentralized
public-key-infrastructure
(dpki). Consider as an
alternative to Google or
Amazon Photos, for example.
Enterprise decentralized app environment: OriginTrail.io
27
https://origintrail.io/
OriginTrail + BSI’s supply chain tracking and tracing
28
OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020
The Monasteriven
whiskey produced in
Ireland is tracked and
traced from “grain to
glass” with the
OriginTrail.io
approach.
OT uses
decentralized
knowledge graph that
connects to one of
several different
blockchains.
This method enables
shared data reuse
and other synergies
across the supply
chain.
Seven obstacles to adoption of decentralized,
interorganizational environments
29
To succeed, organizations will have to become
more bona fide data-centric organizations first
30
Seven obstacles to adoption of FAIR data development at scale
31
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
32
From NLP, to stochastic parrots,
to neurosymbolic AI
Alan Morrison
Data-Centric Architecture Forum
May 2023
33
What’s a “stochastic parrot” and one who worships the same?
“A Language Model is a system for haphazardly stitching together sequences of linguistic
forms it has observed in its vast training data, according to probabilistic information about
how they combine, but without any reference to meaning: a stochastic parrot.”
–Emily Bender, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,”
ACM paper presented at FAccT ’21, March 3–10, 2021, virtual event, Canada
Stochastic parrot worshippers: Those who mindlessly praise LLMs without realizing they’ve
mistaken the parrot part—probabilistic language methods alone–for the whole. These
worshippers seem to assume those methods alone will deliver artificial general intelligence.
Related term: Documentation debt (also per Bender, et al.)
“When we rely on ever larger datasets we risk incurring documentation debt,” they say, “i.e.,
putting ourselves in the situation where the datasets are both undocumented and too large to
document post hoc…. The solution, we propose, is to budget for documentation as part of the
planned costs of dataset creation.”
34
Deep learning guru Yann LeCun on LLMs
35
What’s Natural Language Processing (NLP)?
36
“The root of Natural Language
Processing dates back to the 1950s
when Alan Turing first devised the
Turing Test.
“The objective of the Turing Test was
to determine whether a computer
was truly intelligent based on its
ability to interpret and generate
natural language as a criterion of
intelligence.”
– Tithy Sreemani, Analytics Vidhya
blog, 2022
What’s natural language understanding (NLU)?
1. A form of overpromising and underdelivering, or
2. A serious, ongoing linguistics + cognition endeavor to model how human
understanding works.
37
A sentence-level
model based on Role
and Reference
Grammar by PAT
Inc., 2022.
What’s a large language model (LLM)?
1. A neural network with many layers (“deep learning”).
2. A transformer model that “learns” context a token at a time, in sequence.
3. A tokenizer that converts words to numbers and numbers to words.
4. A token-to-embedding (vectorization) transformer.
5. An ML model that is trained on very large data sets with millions of billions of
parameters (akin to multi-dimensional topographic features)
6. The NLP (natural language processing) system currently in vogue.
38
LLM Leaderboard (partial)
39
Dan Saatrup Nielsen, Alexandra Institute, LinkedIn post, 2023
Solving arithmetic or chasing “facts” with LLMs wastes time and energy
“Suppose that I wanted to find out the square root of five. If I asked an LLM (say ChatGPT), getting this answer involves the
following steps:
● Me: Send a prompt saying “What is the square root of 5?”
● ChatGPT: Do I understand the concept of square root? Yes, I do … it’s a math function.
● ChatGPT: There is a Python function that can be used to invoked that function, in the Python Math Library. Retrieve
that library.
● ChatGPT: Evaluate the number 5 with the function call to get the value 2.235.
● ChatGPT: Construct a response and send that response back to the client.
This assumes that everything goes right.”
– Curt Kagle, The Cagle Report
40
Knowledge graphs know; LLMs need prompts and figure it out, sort of .
“LLMs have to figure things out. They follow an iterative feedback loop called a
langchain, with either a human, itself, or a combination of the two. This
langchain model should be emulatable with SPARQL.
“Update. I’m playing around with this idea on Jena/Fuseki, and the early results
are … intriguing. The key is to recognize that you are doing mutations to the
database, which makes many DBAs cringe. However, I don’t think there is any
way you can get to conversational AI on a knowledge graph without constantly
building (and, when necessary, destroying) contextual graphs.”
Kurt Cagle. “Figuring Out vs. Knowing,” The Cagle Report
41
Idea: Connect the LLM directly to a KG such as Wikidata
“We can just use the SPARQL query generation ability directly and ask queries
against Wikidata. Not only can we connect the LLM to a knowledge graph, but
also to a repository of functions such as wiki functions.” LLM can learn to use KGs
and functions as tools.”
–Denny Vrandečić, Wikimedia Foundation, 2023
42
Each machine learning answer creates some uncertainty
“You can use machine learning to retrieve Obama’s birthplace every time you
need it, but it costs a lot, and you’re never sure it’s correct.”
–Jamie Taylor of Google
43
Efficiency argument for knowledge graphs
“Why would you ever use a 96-layer, 156 billion parameter large language model
to do multiplication, when that’s something you can do in a single operation on
your CPU?”
“Why internalize knowledge in an LLM, when you can externalize it in a graph
store and look it up when you need it?”
“Use LLMs where they are efficient.”
– Denny Vrandečić of the Wikimedia Foundation
44
To scale FAIR data, use an assisted, hybrid AI approach
45
Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on
YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
46
How hybrid AI helps in research
“LLMs have amazing abilities in
manipulating natural language text,
but generating timely and factually
verified recommendations is one
thing LLMs are not naturally great
at.”
–Mike Tung, CEO of Diffbot
Diffbot Blog, April 2023,
https://blog.diffbot.com/generating-company-recommendations-usi
ng-large-language-models-and-knowledge-graphs/
LLMs aren’t a reliable research tool
alone because they hallucinate. you
can’t trust the answers unless you know
the answer already.
Mike Tung recommends more precise
prompting on the query side and answer
verification via a knowledge graph such
as Diffbot. Both of these capabilities
harness precise logical description
missing in current LLM Q&As.
NLP’s compost grinder data mentality
47
https://pixabay.com/photos/compost-grinder-compost-chipper-3389088/
Versus KGs growing naturally in companion plant mode
48
Rich data ecosystems evolve naturally by
comparison with underdescribed, fragmented
data assets
Zero-copy integration becomes possible,
reducing complexity, labor and energy waste by
up to 90 percent
Second-order cybernetics (humans in the loop)
and precise facts and contextualization
complement probabilistic methods
https://www.fruitsaladtrees.com/blogs/news/ediblegarden
AI’s Wave III: Less wasteful, more explicit smart data
management via a knowledge graph foundation
49
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
50
NLP versus NLU: Most true understanding is unclaimed territory
51
Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
History of LLMs
52
Feature growth
53
Energy consumption
54
Stochastic parrots and hallucination
55
Neurosymbolic AI
56
Teaching LLMs to query knowledge graphs
57
Datalanguage hackathon results
58
Semantic community LLM use results
59
Goal: Develop FAIR data efficiently
60
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
61

Contenu connexe

Tendances

PPT on mind reading computer
 PPT on mind reading computer PPT on mind reading computer
PPT on mind reading computer
Anjali Agarwal
 
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
Artificial Intelligence PowerPoint Presentation Slide Template Complete DeckArtificial Intelligence PowerPoint Presentation Slide Template Complete Deck
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
SlideTeam
 
Agricultural management using cloud computing in india
Agricultural management using cloud computing in indiaAgricultural management using cloud computing in india
Agricultural management using cloud computing in india
VikasAghadi
 

Tendances (20)

Internet of things(IOT)
Internet of things(IOT)Internet of things(IOT)
Internet of things(IOT)
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
 
Significant Applications of Generative AI in Retail
Significant Applications of Generative AI in RetailSignificant Applications of Generative AI in Retail
Significant Applications of Generative AI in Retail
 
AI in economics and business management.
AI in economics and business management.AI in economics and business management.
AI in economics and business management.
 
GenerativeAI and Automation - IEEE ACSOS 2023.pptx
GenerativeAI and Automation - IEEE ACSOS 2023.pptxGenerativeAI and Automation - IEEE ACSOS 2023.pptx
GenerativeAI and Automation - IEEE ACSOS 2023.pptx
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdfai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
 
PPT on mind reading computer
 PPT on mind reading computer PPT on mind reading computer
PPT on mind reading computer
 
AI, Machine Learning, and Data Science Concepts
AI, Machine Learning, and Data Science ConceptsAI, Machine Learning, and Data Science Concepts
AI, Machine Learning, and Data Science Concepts
 
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
 
AI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtAI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for Thought
 
What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?
 
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
Artificial Intelligence PowerPoint Presentation Slide Template Complete DeckArtificial Intelligence PowerPoint Presentation Slide Template Complete Deck
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
 
Agricultural management using cloud computing in india
Agricultural management using cloud computing in indiaAgricultural management using cloud computing in india
Agricultural management using cloud computing in india
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
Automating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the EnterpriseAutomating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the Enterprise
 
Iot Report
Iot ReportIot Report
Iot Report
 
Smart Mirrors Technologies and Markets, 2015-2022
Smart Mirrors Technologies and Markets, 2015-2022Smart Mirrors Technologies and Markets, 2015-2022
Smart Mirrors Technologies and Markets, 2015-2022
 

Similaire à DCAF 2023 1 and 2.pdf

Tecnologias Estratégicas
Tecnologias Estratégicas Tecnologias Estratégicas
Tecnologias Estratégicas
sucesu68
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
Alan Morrison
 

Similaire à DCAF 2023 1 and 2.pdf (20)

FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Scaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsScaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphs
 
Tecnologias Estratégicas
Tecnologias Estratégicas Tecnologias Estratégicas
Tecnologias Estratégicas
 
HEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trendsHEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trends
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
 
SegmentOfOne
SegmentOfOneSegmentOfOne
SegmentOfOne
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Five Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age CollaborationFive Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age Collaboration
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graph
 
A blueprint for data in a multicloud world
A blueprint for data in a multicloud worldA blueprint for data in a multicloud world
A blueprint for data in a multicloud world
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
AI Trends.pdf
AI Trends.pdfAI Trends.pdf
AI Trends.pdf
 
Meetup 10 here&now_megatriscomp_design_methodparti_v1
Meetup 10 here&now_megatriscomp_design_methodparti_v1Meetup 10 here&now_megatriscomp_design_methodparti_v1
Meetup 10 here&now_megatriscomp_design_methodparti_v1
 

Plus de Alan Morrison

Plus de Alan Morrison (7)

Graph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationGraph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and Collaboration
 
Dcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonDcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrison
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphs
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlook
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
 
Blockchain demystified
Blockchain demystifiedBlockchain demystified
Blockchain demystified
 

Dernier

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Dernier (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

DCAF 2023 1 and 2.pdf

  • 1. DCA: Current Themes and Trends* Alan Morrison Data-Centric Architecture Forum May 2023 1 Alain Audet at https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/ *Separate talk to cover NLP/LLMs
  • 2. Business goals enabled by a connected, shared data ecosystem 2 Buying Helping Making Selling Sharing
  • 3. Inhibitors to ecosystem-level sharing ● Data feudalism ● Poorly defined regulatory challenges ● Weak public sector ● Public apathy ● Technology + investor inertia and lack of clear vision ● Magic bullet syndrome ● Media groupthink ● Idol worship ● Pervasive myopia ● Lack of organization fox empowerment over hedgehogs 3
  • 4. Unclaimed data market territory FAIR* Actionability Immediacy Divining purpose Divining intent Synthesis Reasoning Abstraction Contextualization Connection Classification Identification Unclaimed market territory Staked claims Present vs Future Shared Data Market Map 12 steps to FAIR data power *Findable, accessible, interoperable, reusable data Reach of current ML efforts
  • 5. Challenge: Seamless, at-scale, FAIR data collaboration 5 James Kobelius, 2016 Association of European Libraries, 2017
  • 6. 6
  • 7. Opportunity: Unitary data + description logic = knowledge 7 “Data management” (structured data, mostly) Knowledge management (internally shared) Content management (externally shared) Learning management (internal coursework) FAIR data and associated description logic FAIR data is data users can have confidence in for many purposes. Data becomes FAIR when it disambiguates concepts, individuals and roles and how they interact and relate to one another. In a knowledge graph context, documented knowledge = FAIR data. Under the FAIR data umbrella are all heterogeneous types of data/content.
  • 8. To create a knowledge graph, users can start with a single triple 8 Linked Open Data Cloud, 2022 Starter triple for a knowledge graph A standard knowledge graph consists of triplified, relationship-rich data. The data model, or ontology, is also described in triples and lives with the rest of the data. Ontologies can also be managed as data. Linking triples merely requires a verb (or predicate, or described edge) to link them.
  • 9. Simple way to start a business knowledge graph (besides using gist) ● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila! You get a connected graph! ● ✨ Decentralize the process by having each team publish their own JSON-LD, for example, let the sales team publish the sales data and ask them to link each sale to the correct product and client. ● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power of GPT to assist new teams in publishing their JSON-LD and integrating it back into your enterprise-wide Knowledge Graph.” Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites GPT is trained on and connect it with internal data also modeled with schema.org –#HT Tony Seale, UBS https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc 9
  • 10. Yes, data warehousing focused on the integration problem 10 ● Pro: Identified the critical problem to solve ● Con: Advocated a method that doesn’t delve deep enough to solve today’s problem ● Still face the unified data model challenge
  • 11. No, data warehousing model conformance doesn’t scale “I spent a good 15 years working in financial services at some pretty big banks. Half of the IT change budget is spent on integration and the by-products of integration….I saw as the technology was advancing that the percentage wasn’t going down – in fact, it was going up. At some point, is the integration tax going to be 100 percent?” – Dan DeMers, CEO of Cinchy “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 11
  • 12. How data warehousing stopped scaling “They recognized that these themes ended up in all these legacy apps. Sales rolled up against a geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have those conformed dimensions and a small number of facts. Let’s bring the facts from all the different systems and snap them together according to these conformed dimensions…. Brilliant idea, but I think what actually happened over time is the workload just got greater and greater. The ability of people to actually conform those dimensions kept eroding….” –Dave McComb, President, Semantic Arts “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 12
  • 13. Data warehousing can’t solve today’s integration challenge 13 ● Thousands of databases per enterprise (siloing) ● Thousands of applications (code sprawl) ● Data models buried in the app code ● Every app a special snowflake with its own data model
  • 14. How did we get here? By selling the old as new 14
  • 15. Why large-scale integration? 15 Large scale integration is essential to avoiding observational bias. The drunk looking for his money under the lamppost analogy describes the nature of this bias. The drunk is looking for his money where the light is, even though he knows the money is in the shadows. To manage today’s business at scale, enterprises need light and visibility across departments, organizations and supply networks
  • 16. Semantic standards allow a desiloed data landscape for interactive, interoperable digital twins and agents 16
  • 17. Promise of digital twins and agents–way beyond APIs 17 Autonomous agents Digital twins/ Small KGs Locale: Portsmouth, UK Sensor nets Iotics, 2019 and 2023
  • 18. How shared graph semantics helps ● Boosts meaningful results (result of lack of data and logic transparency and cohesiveness) and relevancy ● Contextualizes data for management and reuse with relationship logic ● Scales meaningful connections between contexts (relevant relationships living with entities) ● Enables Metcalfe’s network of networks effect (network_effectN ) ● Enables model-driven development via knowledge graphs (code once, reuse anywhere) ● Provides access vIa KGs to logic programs as well as heterogeneous, smart data ● Scale efficiencies and economies so that energy consumption is reduced 18
  • 19. KG centricity makes reliable, automated data webs possible 19 Data teams report spending 25-30% of their time cleaning, labelling, and gathering data sets.... [Some can spend 80% plus] What we know for sure is that data teams and knowledge workers generally spend a noteworthy amount of their time procuring data points that are available on the public web…” It took Google knowledge panels one month and twenty days to update following the inception of a new CEO at Citi, a F100 company. In Diffbot’s Knowledge Graph, a new fact was logged within the week, with zero human intervention and sourced from the public web. – Merrill Cook, Diffbot Blog, 2021-2022
  • 20. Example capabilities in Diffbot’s AI automated KG 20 Mike Tung, “VLDB2020: The Diffbot Knowledge Graph,”2020
  • 21. “Decentralization”: Why you should care ● Further desiloing ● More systems federation ● More interorganizational use potential ● Data Centric approach to architecture ● “Decentralized/Web3 stack” ● More storage options and tiering ● Options at different temperatures (hot vs. cold storage) for new use cases ● More captive and independent storage 21
  • 22. Simple web hosting + legacy Client-Server storage Early Web (on Client-Server) Compute and storage more loosely coupled, virtualized, controlled and data-centric “Decoupled” and “Decentralized” Cloud Application Distribution via Proprietary and IP Networking Client-Server and Desktops Commodity servers + storage + some virtualization Distributed Cloud and Mobile Devices 1st 2nd 3rd 4th 5th Centralized storage and compute, with minimal networking Mainframe and Green Screens The Five Commingled Phases of Compute, Networking and Storage 22 Less centralized Time More centralized Application Centric Data Centric All phases are still active and evolving
  • 23. Degree of control assumes a continuum–not a binary split 23 See Thomas W. Malone, Inventing the Organizations of the 21st Century, MIT Press, 2003, 45FF.
  • 24. SOLID: Federated storage and decentralized apps 24 Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
  • 25. SOLID shared, federated XaaS: Construction industry 25 “TrinPod™: World's first conceptually indexed space-time digital twin using Solid,” Graphmetrix, 2022, https://graphmetrix.com/trinpod Company-specific SOLID storage pods and access control can be managed by each supply chain partner. Graphmetrix as digital twin provider manages the system and system-level apps.
  • 26. Peergos makes personal file storage management possible via IPFS and a browser 26 Peergos technology logical architecture, https://peergos.org/technology, 2019 Peergos is a personal data dcloud storage environment that also uses blockchain based decentralized public-key-infrastructure (dpki). Consider as an alternative to Google or Amazon Photos, for example.
  • 27. Enterprise decentralized app environment: OriginTrail.io 27 https://origintrail.io/
  • 28. OriginTrail + BSI’s supply chain tracking and tracing 28 OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020 The Monasteriven whiskey produced in Ireland is tracked and traced from “grain to glass” with the OriginTrail.io approach. OT uses decentralized knowledge graph that connects to one of several different blockchains. This method enables shared data reuse and other synergies across the supply chain.
  • 29. Seven obstacles to adoption of decentralized, interorganizational environments 29
  • 30. To succeed, organizations will have to become more bona fide data-centric organizations first 30
  • 31. Seven obstacles to adoption of FAIR data development at scale 31
  • 32. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 32
  • 33. From NLP, to stochastic parrots, to neurosymbolic AI Alan Morrison Data-Centric Architecture Forum May 2023 33
  • 34. What’s a “stochastic parrot” and one who worships the same? “A Language Model is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.” –Emily Bender, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,” ACM paper presented at FAccT ’21, March 3–10, 2021, virtual event, Canada Stochastic parrot worshippers: Those who mindlessly praise LLMs without realizing they’ve mistaken the parrot part—probabilistic language methods alone–for the whole. These worshippers seem to assume those methods alone will deliver artificial general intelligence. Related term: Documentation debt (also per Bender, et al.) “When we rely on ever larger datasets we risk incurring documentation debt,” they say, “i.e., putting ourselves in the situation where the datasets are both undocumented and too large to document post hoc…. The solution, we propose, is to budget for documentation as part of the planned costs of dataset creation.” 34
  • 35. Deep learning guru Yann LeCun on LLMs 35
  • 36. What’s Natural Language Processing (NLP)? 36 “The root of Natural Language Processing dates back to the 1950s when Alan Turing first devised the Turing Test. “The objective of the Turing Test was to determine whether a computer was truly intelligent based on its ability to interpret and generate natural language as a criterion of intelligence.” – Tithy Sreemani, Analytics Vidhya blog, 2022
  • 37. What’s natural language understanding (NLU)? 1. A form of overpromising and underdelivering, or 2. A serious, ongoing linguistics + cognition endeavor to model how human understanding works. 37 A sentence-level model based on Role and Reference Grammar by PAT Inc., 2022.
  • 38. What’s a large language model (LLM)? 1. A neural network with many layers (“deep learning”). 2. A transformer model that “learns” context a token at a time, in sequence. 3. A tokenizer that converts words to numbers and numbers to words. 4. A token-to-embedding (vectorization) transformer. 5. An ML model that is trained on very large data sets with millions of billions of parameters (akin to multi-dimensional topographic features) 6. The NLP (natural language processing) system currently in vogue. 38
  • 39. LLM Leaderboard (partial) 39 Dan Saatrup Nielsen, Alexandra Institute, LinkedIn post, 2023
  • 40. Solving arithmetic or chasing “facts” with LLMs wastes time and energy “Suppose that I wanted to find out the square root of five. If I asked an LLM (say ChatGPT), getting this answer involves the following steps: ● Me: Send a prompt saying “What is the square root of 5?” ● ChatGPT: Do I understand the concept of square root? Yes, I do … it’s a math function. ● ChatGPT: There is a Python function that can be used to invoked that function, in the Python Math Library. Retrieve that library. ● ChatGPT: Evaluate the number 5 with the function call to get the value 2.235. ● ChatGPT: Construct a response and send that response back to the client. This assumes that everything goes right.” – Curt Kagle, The Cagle Report 40
  • 41. Knowledge graphs know; LLMs need prompts and figure it out, sort of . “LLMs have to figure things out. They follow an iterative feedback loop called a langchain, with either a human, itself, or a combination of the two. This langchain model should be emulatable with SPARQL. “Update. I’m playing around with this idea on Jena/Fuseki, and the early results are … intriguing. The key is to recognize that you are doing mutations to the database, which makes many DBAs cringe. However, I don’t think there is any way you can get to conversational AI on a knowledge graph without constantly building (and, when necessary, destroying) contextual graphs.” Kurt Cagle. “Figuring Out vs. Knowing,” The Cagle Report 41
  • 42. Idea: Connect the LLM directly to a KG such as Wikidata “We can just use the SPARQL query generation ability directly and ask queries against Wikidata. Not only can we connect the LLM to a knowledge graph, but also to a repository of functions such as wiki functions.” LLM can learn to use KGs and functions as tools.” –Denny Vrandečić, Wikimedia Foundation, 2023 42
  • 43. Each machine learning answer creates some uncertainty “You can use machine learning to retrieve Obama’s birthplace every time you need it, but it costs a lot, and you’re never sure it’s correct.” –Jamie Taylor of Google 43
  • 44. Efficiency argument for knowledge graphs “Why would you ever use a 96-layer, 156 billion parameter large language model to do multiplication, when that’s something you can do in a single operation on your CPU?” “Why internalize knowledge in an LLM, when you can externalize it in a graph store and look it up when you need it?” “Use LLMs where they are efficient.” – Denny Vrandečić of the Wikimedia Foundation 44
  • 45. To scale FAIR data, use an assisted, hybrid AI approach 45 Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
  • 46. 46 How hybrid AI helps in research “LLMs have amazing abilities in manipulating natural language text, but generating timely and factually verified recommendations is one thing LLMs are not naturally great at.” –Mike Tung, CEO of Diffbot Diffbot Blog, April 2023, https://blog.diffbot.com/generating-company-recommendations-usi ng-large-language-models-and-knowledge-graphs/ LLMs aren’t a reliable research tool alone because they hallucinate. you can’t trust the answers unless you know the answer already. Mike Tung recommends more precise prompting on the query side and answer verification via a knowledge graph such as Diffbot. Both of these capabilities harness precise logical description missing in current LLM Q&As.
  • 47. NLP’s compost grinder data mentality 47 https://pixabay.com/photos/compost-grinder-compost-chipper-3389088/
  • 48. Versus KGs growing naturally in companion plant mode 48 Rich data ecosystems evolve naturally by comparison with underdescribed, fragmented data assets Zero-copy integration becomes possible, reducing complexity, labor and energy waste by up to 90 percent Second-order cybernetics (humans in the loop) and precise facts and contextualization complement probabilistic methods https://www.fruitsaladtrees.com/blogs/news/ediblegarden
  • 49. AI’s Wave III: Less wasteful, more explicit smart data management via a knowledge graph foundation 49
  • 50. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 50
  • 51. NLP versus NLU: Most true understanding is unclaimed territory 51 Unclaimed data market territory FAIR* Actionability Immediacy Divining purpose Divining intent Synthesis Reasoning Abstraction Contextualization Connection Classification Identification Unclaimed market territory Staked claims Present vs Future Data Market Map 12 steps to FAIR data power *Findable, accessible, interoperable, reusable data Reach of current ML efforts
  • 55. Stochastic parrots and hallucination 55
  • 57. Teaching LLMs to query knowledge graphs 57
  • 59. Semantic community LLM use results 59
  • 60. Goal: Develop FAIR data efficiently 60
  • 61. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 61