SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
Machine Learning
Techniques for the
  Semantic Web
         Paul Dix
     http://pauldix.net
     paul@pauldix.net
Machine Learning
Semantic Web
What is Semantic Web?
Ontology
RDF
Machine Learning is
   about Data
actually...
Making Predictions
 Based on Data
FOAF
Simple Example
Marco Neumann
<http://www.marconeumann.org/foaf.rdf>
  <http://xmlns.com/foaf/0.1/knows>
  <http://community.linkeddata.org/dataspace/person/
kidehen2/about.rdf> .
<http://www.marconeumann.org/foaf.rdf>
  <http://xmlns.com/foaf/0.1/knows>
  <http://www.johnbreslin.com/foaf/foaf.rdf> .
<http://www.marconeumann.org/foaf.rdf>
  <http://xmlns.com/foaf/0.1/knows>
  <http://swordfish.rdfweb.org/people/libby/rdfweb/
webwho.xrdf> .
<http://www.marconeumann.org/foaf.rdf>
  <http://xmlns.com/foaf/0.1/knows>
  <http://danbri.org/foaf.rdf> .
Marco only knows 4
     people?
Two Degrees Out
4   -   <http://www.w3.org/People/Connolly/home-smart.rdf>
4   -   <http://jibbering.com/foaf.rdf>
2   -   <http://sw.deri.org/~haller/foaf.rdf>
2   -   <http://sw.deri.org/~knud/knudfoaf.rdf>
2   -   <http://www-cdr.stanford.edu/~petrie/foaf.rdf>
Three Degrees
9   -   <http://sw.deri.org/~knud/knudfoaf.rdf>
8   -   <http://www.w3.org/People/Connolly/home-smart.rdf>
7   -   <http://jibbering.com/foaf.rdf>
6   -   <http://www.aaronsw.com/about.xrdf>
5   -   <http://sw.deri.org/~aharth/foaf.rdf>
but that’s not really
 machine learning
Short
Machine Learning is


• How you formulate the problem
• How you represent the data
• Graphical Models
• Vector Space Models
Back to FOAF
Convert RDF triples to vector space
We Want to Find
Groups of People
To make predictions on
   their interests...
(subject) (predicate) (object)
Paul        knows      Jeff
Paul        knows      Joe
Paul        knows      Marco
Jeff        knows      Joe
Vector Space
        Representation
          Jeff   Joe   Marco   Paul

 Jeff            1              1

 Joe       1                    1

Marco                           1

Paul       1     1       1
Latent Factors Analysis

• Used in Latent Semantic Indexing (LSI)
• Good for finding synonyms
• Good for finding “genres”
Latent Factors Methods

• Principle Component Analysis (PCA)
• Singular Value Decomposition (SVD)
• Restricted Boltzmann Machines (RBM)
Considerations for
  Semantic Web Data

• Large Data Sets
• Sparse Data Sets
Netflix Prize Research

• Movie Review Data set has similar
  problems
• Generalized Hebbian Algorithm for
  Dimensionality Reduction in NLP (Gorrell
  ’06.)
Reduce Dimensions


• 1m x 1m matrix with 1m people
• Reduce to 1m x 100
100 Latent Factors
Represent different groups of people based on who
                    they know.
What the Data Might
    Look Like
         Factor 1   Factor 2

  Paul    0.678      0.311

  Joe     0.455      0.432

  Jeff    0.476      0.398

 Marco    0.203      0.789
Find Similar People
    k Nearest Neighbors
Pick a Similarity Metric

• Euclidean Distance
• Jaccard index
• Cosine Similarity
Joe’s Similarity to Paul
(Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
Once We’ve Calculated
     Similarities
• Fill In Missing Interests
• Target Ads, Content, Products
• ???
• Profit!
Generalizing RDF
Triples to Vector Space
• Subjects are Rows
• Objects are Columns
• Predicates are values
Object 1    Object 2




Subject 1   Predicate




Subject 2
Predicates Should be
  Mutually Exclusive

• Paul likes Ruby
• Paul hates PHP
• Paul loves PHP
Assign Values to
        Predicates
• 1 = Hates
• 2 = Dislikes
• 3 = Neutral
• 4 = Likes
• 5 = Loves
More Applications
Supervised Learning

• Classifiers
• Ontology Mapping
• Assigning Instances to Concepts
Ontology Mapping


• Examples from Ontology A
• Examples from Ontology B
Train Classifiers


• One Classifier for each Concept in A
• One Classifier for each Concept in B
Classify Instances

• Use A Classifiers to predict which concepts
  B instances map to
• Use B Classifiers to predict which concepts
  A instances map to
Use Classified Instances


• Predict Concept Mappings
 • Which in A match ones in B
Limitations

• One Classifier per Concept
 • Large Ontologies Could be a Problem
• Ontologies should be a little similar
Unsupervised Learning

• Clustering
 • Hierarchical Clustering
• Learning Ontologies from Text
Machine Learning as
        Triage

• Automatically tag or recommend Examples
  the algorithm is Certain About
• Send uncertain examples to human for
  review
Thank You
     Paul Dix
 paul@pauldix.net
 http://pauldix.net

Contenu connexe

Similaire à Machine Learning Techniques for the Semantic Web

Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
kalai75
 

Similaire à Machine Learning Techniques for the Semantic Web (20)

FIWARE Global Summit - FIROS: Helping Robots to be Context Aware
FIWARE Global Summit - FIROS: Helping Robots to be Context AwareFIWARE Global Summit - FIROS: Helping Robots to be Context Aware
FIWARE Global Summit - FIROS: Helping Robots to be Context Aware
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
Particle swarm optimization
Particle swarm optimization Particle swarm optimization
Particle swarm optimization
 
Purpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure NirvanaPurpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure Nirvana
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applications
 
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."
 
DEF CON 27- BRIZENDINE STROSCHEIN - the jop rocket
DEF CON 27- BRIZENDINE STROSCHEIN - the jop rocketDEF CON 27- BRIZENDINE STROSCHEIN - the jop rocket
DEF CON 27- BRIZENDINE STROSCHEIN - the jop rocket
 
Look beyond PHP
Look beyond PHPLook beyond PHP
Look beyond PHP
 
MapReduce and Its Discontents
MapReduce and Its DiscontentsMapReduce and Its Discontents
MapReduce and Its Discontents
 
Oop is not Dead
Oop is not DeadOop is not Dead
Oop is not Dead
 
Ruby Xml Mapping
Ruby Xml MappingRuby Xml Mapping
Ruby Xml Mapping
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
data science
data sciencedata science
data science
 

Plus de pauldix (7)

An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
Indexing thousands of writes per second with redis
Indexing thousands of writes per second with redisIndexing thousands of writes per second with redis
Indexing thousands of writes per second with redis
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModel
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModel
 
Building Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time WebBuilding Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time Web
 
Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Machine Learning Techniques for the Semantic Web