Framing Few Shot Knowledge Graph Completion with Large Language Models

MODUL Technology GmbH
MODUL Technology GmbHInnovation in Data and Media Extraction, Annotation and Analysis à MODUL Technology GmbH
FRAMING FEW-SHOT
KNOWLEDGE GRAPH
COMPLETION WITH LARGE
LANGUAGE MODELS
Adrian M.P. Brașoveanu
Lyndon J.B. Nixon
Albert Weichselbraun
Arno Scharl
NLP4KGC@SEMANTICS 2023
LLMs 2020-2023
LARGE LANGUAGE MODELS
Generative AI
ChatGPT 3.5/4.0
Claude 2
Cohere Chat
Falcon
LLaMa2
Flan-T5
Core Innovation:
Ecosystems
Agents
LangChain
KGs
Tools
Problem Solving
Mixture of
Experts (MoE)?
Image Copyright © Language Models are Few-Shot Learners (2020) by Tom
B. Brown et al. NeurIPS 2020.
LLM Reasoning Strategies(1): CoT
LARGE LANGUAGE MODELS
Relation
Extraction with
CoT
Explanation is All
You Need!
Step-by-step
reasoning
Augmented Text
leads to better
results!
Image Copyright © Revisiting Relation Extraction in the era of Large Language
Models by Wadhwa et al. ACL(1) 2023.
LLM Reasoning Strategies (2): ToT
LARGE LANGUAGE MODELS
CoT contains
explanations
ToT extends CoT
Multiple paths
towards an answer
CoT-SC – Majority
voting mechanism
ToT – more similar
to the human
selection process
ToT allows for
parallel exploration
of ideas as
opposed to linear
exploration (CoT).
Image Copyright © Tree of Thoughts: Deliberate Problem Solving with Large
Language Models (2023) by Yao et al.
Knowledge Graphs (KG)
LARGE LANGUAGE MODELS
Sustainability
KG
Built with Wikidata.
Missing relations:
- country-specific
- region-specific
KG Completion
(KGC)
Can we fill the
missing relations
using LLMs?
Evaluating Large Language Models
LARGE LANGUAGE MODELS
Single interface
nat.dev/chat
Includes
ChatGPT3.5/4
(with 32k cw)
Claude1/2
(with 100k cw)
Cohere Chat
MPT30B
Falcon40B
LLaMa2
Functionality
Playground
Compare
Chat
Metrics
Evaluating Large Language Models
LARGE LANGUAGE MODELS
Relations
Only Relations
Explanations
CoT
Completions
Restricted CoT
Self-Scoring
Truthfulness Proxy
Evaluating Large Language Models
LARGE LANGUAGE MODELS
Tools
GPT-3.5
GPT-4.0
Claude2
MPT-30B
Few-Shot
Input: 12-14
annotated texts
Output: 50
annotated texts
We want all the
texts annotated in
a large batch if
possible
Evaluating Large Language Models
LARGE LANGUAGE MODELS
Taxonomy of
Errors
These are only the
most frequent
errors!
And the Winner
Is?
ChatGPT and
Claude2 have
similar
performance
Conclusion?
LARGE LANGUAGE MODELS
Self-Scoring
Consecutive runs
Huge differences
And the Winner
Is?
ChatGPT and
Claude2 have
similar
performance
Acknowledgments
PROJECTS
DWBI Vienna - Vienna Science and Technology Fund (WWTF) [10.47379/ICT20096]
SDG-HUB – FFG (GA No. 892212)
CONTACT
adrian.brasoveanu@modul.ac.at
THANK YOU!
1 sur 11

Recommandé

ijeter35852020.pdf par
ijeter35852020.pdfijeter35852020.pdf
ijeter35852020.pdfSatishBhalshankar
7 vues7 diapositives
Landscape of AI/ML in 2023 par
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
2.4K vues67 diapositives
1808.10245v1 (1).pdf par
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdfKSHITIJCHAUDHARY20
60 vues6 diapositives
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier par
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierParis Women in Machine Learning and Data Science
22 vues25 diapositives
Natural language processing and transformer models par
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
658 vues31 diapositives
ICAME 2010 par
ICAME 2010ICAME 2010
ICAME 2010nottyknight
304 vues27 diapositives

Contenu connexe

Similaire à Framing Few Shot Knowledge Graph Completion with Large Language Models

A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati... par
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...mlaij
12 vues13 diapositives
Analysis of the evolution of advanced transformer-based language models: Expe... par
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...IAESIJAI
8 vues16 diapositives
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR... par
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
5 vues10 diapositives
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR... par
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
5 vues10 diapositives
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015 par
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015RIILP
231 vues21 diapositives
BERT Explained_ State of the art language model for NLP.pdf par
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfsudeshnakundu10
11 vues13 diapositives

Similaire à Framing Few Shot Knowledge Graph Completion with Large Language Models(20)

A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati... par mlaij
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
mlaij12 vues
Analysis of the evolution of advanced transformer-based language models: Expe... par IAESIJAI
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
IAESIJAI8 vues
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR... par mathsjournal
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal5 vues
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR... par mathsjournal
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal5 vues
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015 par RIILP
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
RIILP231 vues
BERT Explained_ State of the art language model for NLP.pdf par sudeshnakundu10
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdf
sudeshnakundu1011 vues
Topicmodels par Ajay Ohri
TopicmodelsTopicmodels
Topicmodels
Ajay Ohri2.8K vues
Fine grained irony classification through transfer learning approach par CSITiaesprime
Fine grained irony classification through transfer learning approachFine grained irony classification through transfer learning approach
Fine grained irony classification through transfer learning approach
CSITiaesprime2 vues
Language Models for Information Retrieval par Dustin Smith
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith3.2K vues
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS par IRJET Journal
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSAUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
IRJET Journal9 vues
Texts Classification with the usage of Neural Network based on the Word2vec’s... par ijsc
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc6 vues
Natural Language Processing - Research and Application Trends par Shreyas Suresh Rao
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
Artificial intelligence markup language: a Brief tutorial par ijcses
Artificial intelligence markup language: a Brief tutorialArtificial intelligence markup language: a Brief tutorial
Artificial intelligence markup language: a Brief tutorial
ijcses6.2K vues
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE par ijmpict
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
ijmpict152 vues

Plus de MODUL Technology GmbH

Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl... par
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...MODUL Technology GmbH
98 vues12 diapositives
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec... par
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...MODUL Technology GmbH
9 vues11 diapositives
New Opportunities for Understanding Tourist Photography.pptx par
New Opportunities for Understanding Tourist Photography.pptxNew Opportunities for Understanding Tourist Photography.pptx
New Opportunities for Understanding Tourist Photography.pptxMODUL Technology GmbH
42 vues13 diapositives
How do destinations relate to one another? A study of visual destination bran... par
How do destinations relate to one another? A study of visual destination bran...How do destinations relate to one another? A study of visual destination bran...
How do destinations relate to one another? A study of visual destination bran...MODUL Technology GmbH
19 vues10 diapositives
Do DMOs promote the right aspects of the destination? A study of Instagram ph... par
Do DMOs promote the right aspects of the destination? A study of Instagram ph...Do DMOs promote the right aspects of the destination? A study of Instagram ph...
Do DMOs promote the right aspects of the destination? A study of Instagram ph...MODUL Technology GmbH
198 vues30 diapositives
The Impact of Social Media on perceived Destination Image: case of Mexico Ci... par
The Impact of Social Media on perceived Destination Image:  case of Mexico Ci...The Impact of Social Media on perceived Destination Image:  case of Mexico Ci...
The Impact of Social Media on perceived Destination Image: case of Mexico Ci...MODUL Technology GmbH
321 vues16 diapositives

Plus de MODUL Technology GmbH(20)

Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl... par MODUL Technology GmbH
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec... par MODUL Technology GmbH
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...
How do destinations relate to one another? A study of visual destination bran... par MODUL Technology GmbH
How do destinations relate to one another? A study of visual destination bran...How do destinations relate to one another? A study of visual destination bran...
How do destinations relate to one another? A study of visual destination bran...
Do DMOs promote the right aspects of the destination? A study of Instagram ph... par MODUL Technology GmbH
Do DMOs promote the right aspects of the destination? A study of Instagram ph...Do DMOs promote the right aspects of the destination? A study of Instagram ph...
Do DMOs promote the right aspects of the destination? A study of Instagram ph...
The Impact of Social Media on perceived Destination Image: case of Mexico Ci... par MODUL Technology GmbH
The Impact of Social Media on perceived Destination Image:  case of Mexico Ci...The Impact of Social Media on perceived Destination Image:  case of Mexico Ci...
The Impact of Social Media on perceived Destination Image: case of Mexico Ci...
The Impact of Social Media on perceived Destination Image: the case of Mexico... par MODUL Technology GmbH
The Impact of Social Media on perceived Destination Image:the case of Mexico...The Impact of Social Media on perceived Destination Image:the case of Mexico...
The Impact of Social Media on perceived Destination Image: the case of Mexico...
How Instagram influences Visual Destination Image - a case study of Jordan an... par MODUL Technology GmbH
How Instagram influences Visual Destination Image - a case study of Jordan an...How Instagram influences Visual Destination Image - a case study of Jordan an...
How Instagram influences Visual Destination Image - a case study of Jordan an...

Dernier

"Node.js Development in 2024: trends and tools", Nikita Galkin par
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
33 vues38 diapositives
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... par
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
120 vues17 diapositives
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... par
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
196 vues62 diapositives
LLMs in Production: Tooling, Process, and Team Structure par
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
57 vues77 diapositives
"Package management in monorepos", Zoltan Kochan par
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
34 vues18 diapositives
MVP and prioritization.pdf par
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
39 vues8 diapositives

Dernier(20)

"Node.js Development in 2024: trends and tools", Nikita Galkin par Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 vues
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... par ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 vues
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... par ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 vues
LLMs in Production: Tooling, Process, and Team Structure par Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 vues
"Package management in monorepos", Zoltan Kochan par Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays34 vues
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... par ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 vues
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... par ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 vues
Optimizing Communication to Optimize Human Behavior - LCBM par Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 vues
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... par Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
"Surviving highload with Node.js", Andrii Shumada par Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 vues
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... par ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 vues
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 par BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 vues
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 vues
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue par ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 vues
The Power of Generative AI in Accelerating No Code Adoption.pdf par Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf

Framing Few Shot Knowledge Graph Completion with Large Language Models

  • 1. FRAMING FEW-SHOT KNOWLEDGE GRAPH COMPLETION WITH LARGE LANGUAGE MODELS Adrian M.P. Brașoveanu Lyndon J.B. Nixon Albert Weichselbraun Arno Scharl NLP4KGC@SEMANTICS 2023
  • 2. LLMs 2020-2023 LARGE LANGUAGE MODELS Generative AI ChatGPT 3.5/4.0 Claude 2 Cohere Chat Falcon LLaMa2 Flan-T5 Core Innovation: Ecosystems Agents LangChain KGs Tools Problem Solving Mixture of Experts (MoE)? Image Copyright © Language Models are Few-Shot Learners (2020) by Tom B. Brown et al. NeurIPS 2020.
  • 3. LLM Reasoning Strategies(1): CoT LARGE LANGUAGE MODELS Relation Extraction with CoT Explanation is All You Need! Step-by-step reasoning Augmented Text leads to better results! Image Copyright © Revisiting Relation Extraction in the era of Large Language Models by Wadhwa et al. ACL(1) 2023.
  • 4. LLM Reasoning Strategies (2): ToT LARGE LANGUAGE MODELS CoT contains explanations ToT extends CoT Multiple paths towards an answer CoT-SC – Majority voting mechanism ToT – more similar to the human selection process ToT allows for parallel exploration of ideas as opposed to linear exploration (CoT). Image Copyright © Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023) by Yao et al.
  • 5. Knowledge Graphs (KG) LARGE LANGUAGE MODELS Sustainability KG Built with Wikidata. Missing relations: - country-specific - region-specific KG Completion (KGC) Can we fill the missing relations using LLMs?
  • 6. Evaluating Large Language Models LARGE LANGUAGE MODELS Single interface nat.dev/chat Includes ChatGPT3.5/4 (with 32k cw) Claude1/2 (with 100k cw) Cohere Chat MPT30B Falcon40B LLaMa2 Functionality Playground Compare Chat Metrics
  • 7. Evaluating Large Language Models LARGE LANGUAGE MODELS Relations Only Relations Explanations CoT Completions Restricted CoT Self-Scoring Truthfulness Proxy
  • 8. Evaluating Large Language Models LARGE LANGUAGE MODELS Tools GPT-3.5 GPT-4.0 Claude2 MPT-30B Few-Shot Input: 12-14 annotated texts Output: 50 annotated texts We want all the texts annotated in a large batch if possible
  • 9. Evaluating Large Language Models LARGE LANGUAGE MODELS Taxonomy of Errors These are only the most frequent errors! And the Winner Is? ChatGPT and Claude2 have similar performance
  • 10. Conclusion? LARGE LANGUAGE MODELS Self-Scoring Consecutive runs Huge differences And the Winner Is? ChatGPT and Claude2 have similar performance
  • 11. Acknowledgments PROJECTS DWBI Vienna - Vienna Science and Technology Fund (WWTF) [10.47379/ICT20096] SDG-HUB – FFG (GA No. 892212) CONTACT adrian.brasoveanu@modul.ac.at THANK YOU!