SlideShare une entreprise Scribd logo
Inducing Predictive Clustering Trees for
Datatype properties Values
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Semantic Machine Learning, 10th July 2016
G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18
Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions
G.Rizzo et al. (Univ. of Bari) 10th July 2016 2 / 18
The Context and Motivations
• Goal: approximating the (numerical) datatype property values
through regression models in the Web of Data
• Web of data: a large number of knowledge bases, datasets and
vocabularies exposed in a standard format (RDF, OWL)
• (numerical) property values can hardly be derived by using
reasoning services
• Open World Assumption
• a large number of missing information
• The informative gap can be filled by using regression models
G.Rizzo et al. (Univ. of Bari) 10th July 2016 3 / 18
The context and Motivations
• Solving a regression problem
• two or more property values may be related (e.g. crime rate and
population of a place)
• correlations should improve the predictiveness
• Predicting more numerical values at once (multi-target
regression) through Predictive Clustering approaches
• Predictive Clustering Trees (PCTs) as a generalization of decision
trees
• PCTs compliant to the representation languages for the Web of
Data (e.g. Description Logics)
• target values: the numeric role fillers for the properties
G.Rizzo et al. (Univ. of Bari) 10th July 2016 4 / 18
Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC and roles (relations), NR to model
domains
• Operators to build complex concept descriptions
• Concrete domains: string, boolean, numeric values
• Semantics defined through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
ALC operators
Top concept: ∆I
Bottom concept: ⊥ ∅
Concept: C CI
⊆ ∆I
Full Complement: ¬C ∆  CI
Intersection: C D CI
∩ DI
Disjunction: C D CI
∪ DI
Universal restriction ∀R.D {x ∈ ∆I
| ∀y ∈ ∆I
(x, y) ∈ RI
→ y ∈ DI
}
Existential restriction ∃R.D {x ∈ ∆I
| ∃y ∈ ∆I
(x, y) ∈ RI
∧ y ∈ DI
}
G.Rizzo et al. (Univ. of Bari) 10th July 2016 5 / 18
Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): class assertions, C(a) and role assertions,R(a, b) about
a set of individuals is denoted by Ind(A)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 6 / 18
The problem
Given:
• a knowledge base K = (T , A);
• the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains
Di , whose analytic forms are unknown;
• a training set Tr ⊆ Ind(A) for which the numeric fillers are
known,
Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t}
Build a regression model for {Ri }t
i=1, i.e. a function
h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over
Tr. A possible loss function may be based on the mean square error.
G.Rizzo et al. (Univ. of Bari) 10th July 2016 7 / 18
The proposed solution
• Predictive Clustering
• objects are clustered according to an homogeneity criterion
• for each cluster a predictive model is determined (e.g. vector
containing predictions)
(a) clustering (b) predictive mod-
els
(c) predictive clus-
tering
G.Rizzo et al. (Univ. of Bari) 10th July 2016 8 / 18
The model for multi-target regression
• Given a knowledge
base K, a PCT for
multi-target regression
is a binary tree where
• intermediate nodes:
DL concept
descriptions
• leaf nodes: vectors
containing the
predictions w.r.t.
the target properties
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 9 / 18
Learning PCTs
• Divide-and-conquer strategy
• For the current node:
• the refinement operator generates the candidate concepts
• The most promising concept E∗
is selected by maximizing the
homogeneity w.r.t. the target variables simultaneously.
• Best concept: the one minimizing the RMSE of the standardized
target properties values
• Stop conditions:
• maximum number of levels
• size of the training (sub)set
• Leaf: the i-th component contains the average value for the i-th
target property over the instances sorted to the node
G.Rizzo et al. (Univ. of Bari) 10th July 2016 10 / 18
Installing new DL concepts as inner nodes
• The candidate concept descriptions are generated by using a
refinement operator
• A quasi ordering relation over the space of the concept
descriptions
• The subsumption between concepts in Description Logics
• Downward refinement operator ρ(·) to obtain specializations E of
a concept description D (E D)
• Each concept can be obtained:
• by introducing a new concept name (or its complement) as a
conjuct
• by replacing a sub-description in the scope of an existential
restriction
• by replacing a sub-description in the scope of an universal
restriction
G.Rizzo et al. (Univ. of Bari) 10th July 2016 11 / 18
Prediction
• Given an unseen individual a, the properties values are
determined by traversing the tree structure
• Given a test concept D:
• if K |= D(a) the left branch is followed
• if K |= ¬D(a) the right branch is followed
• otherwise, a default model is returned
G.Rizzo et al. (Univ. of Bari) 10th July 2016 12 / 18
Experiments
Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 10-fold cross validation
• performance in terms of RRMSE
G.Rizzo et al. (Univ. of Bari) 10th July 2016 13 / 18
Table: Datasets extracted from DBPedia
Datasets Expr. Axioms. #classes # properties # ind.
Fragm.#1 ALCO 17222 990 255 12053
Fragm.#2 ALCO 20456 425 255 14400
Fragm.#3 ALCO 9070 370 106 4499
Table: Target properties ranges, number of individuals employed in the
learning problem
Datasets Properties Range |Tr|
Fragm. # 1
elevation [-654.14,19.00]
10000
populationTotal [0.0, 2255]
Fragm. #2
areaTotal [0, 16980.1]
10000
areaUrban [0.0, 6740.74]
areaMetro [0, 652874]
Fragm. #3
height [0,251.6]
2256
weight [-63.12,304.25]
G.Rizzo et al. (Univ. of Bari) 10th July 2016 14 / 18
Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5
G.Rizzo et al. (Univ. of Bari) 10th July 2016 15 / 18
Discussion
• PCTs more performant than TRT
• the different heuristic allows to choose more promising concepts
• standardization mitigated abnormal values increasing the error
• PCT more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more efficient than TRTs
G.Rizzo et al. (Univ. of Bari) 10th July 2016 16 / 18
Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties
• Further extensions
• New refinement operators
• Further heuristics
• linear models at leaf nodes
G.Rizzo et al. (Univ. of Bari) 10th July 2016 17 / 18
Questions?
G.Rizzo et al. (Univ. of Bari) 10th July 2016 18 / 18

Contenu connexe

Tendances

Assignment 3 push down automata final
Assignment 3 push down automata finalAssignment 3 push down automata final
Assignment 3 push down automata final
Pawan Goel
 
ForecastCombinations package
ForecastCombinations packageForecastCombinations package
ForecastCombinations package
eraviv
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
Sebastian Ruder
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
Asai Masataro
 
Probabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDFProbabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDF
DKALab
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
matele41
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
Tomasz Kusmierczyk
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
Yoshitaka Ushiku
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
Giuseppe Rizzo
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
QUT_SEF
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
Hema Kashyap
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
RishirajChakraborty4
 

Tendances (12)

Assignment 3 push down automata final
Assignment 3 push down automata finalAssignment 3 push down automata final
Assignment 3 push down automata final
 
ForecastCombinations package
ForecastCombinations packageForecastCombinations package
ForecastCombinations package
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
 
Probabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDFProbabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDF
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
 

Similaire à Inducing Predictive Clustering Trees for Datatype properties Values

Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Giuseppe Rizzo
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision Trees
Giuseppe Rizzo
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
Giuseppe Rizzo
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
Denis Parra Santander
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
Cluster
ClusterCluster
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
Enrico Daga
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
REVEAL - Social Media Verification
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
Symeon Papadopoulos
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Arithmer Inc.
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
dgarijo
 
clustering.ppt
clustering.pptclustering.ppt
clustering.ppt
VivekKumar898803
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
National Institute of Informatics
 
Finding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of DerivativesFinding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of Derivatives
ijtsrd
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
csandit
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
cscpconf
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
Ahmed Gad
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
Roy Clariana
 

Similaire à Inducing Predictive Clustering Trees for Datatype properties Values (20)

Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision Trees
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Cluster
ClusterCluster
Cluster
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
clustering.ppt
clustering.pptclustering.ppt
clustering.ppt
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Finding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of DerivativesFinding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of Derivatives
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 

Dernier

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Dernier (20)

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Inducing Predictive Clustering Trees for Datatype properties Values

  • 1. Inducing Predictive Clustering Trees for Datatype properties Values Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Semantic Machine Learning, 10th July 2016 G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18
  • 2. Outline 1 The Context and Motivations 2 Basics 3 The approach 4 Empirical Evaluation 5 Conclusion & Further Extensions G.Rizzo et al. (Univ. of Bari) 10th July 2016 2 / 18
  • 3. The Context and Motivations • Goal: approximating the (numerical) datatype property values through regression models in the Web of Data • Web of data: a large number of knowledge bases, datasets and vocabularies exposed in a standard format (RDF, OWL) • (numerical) property values can hardly be derived by using reasoning services • Open World Assumption • a large number of missing information • The informative gap can be filled by using regression models G.Rizzo et al. (Univ. of Bari) 10th July 2016 3 / 18
  • 4. The context and Motivations • Solving a regression problem • two or more property values may be related (e.g. crime rate and population of a place) • correlations should improve the predictiveness • Predicting more numerical values at once (multi-target regression) through Predictive Clustering approaches • Predictive Clustering Trees (PCTs) as a generalization of decision trees • PCTs compliant to the representation languages for the Web of Data (e.g. Description Logics) • target values: the numeric role fillers for the properties G.Rizzo et al. (Univ. of Bari) 10th July 2016 4 / 18
  • 5. Description Logics Syntax & Semantics • Atomic concepts (classes), NC and roles (relations), NR to model domains • Operators to build complex concept descriptions • Concrete domains: string, boolean, numeric values • Semantics defined through interpretations I = (∆I, ·I) • ∆I : domain of the interpretation • ·I : intepretation function • for each concept C ∈ NC , CI ⊆ ∆I • for each role R ∈ NR , RI ⊆ ∆I × ∆I ALC operators Top concept: ∆I Bottom concept: ⊥ ∅ Concept: C CI ⊆ ∆I Full Complement: ¬C ∆ CI Intersection: C D CI ∩ DI Disjunction: C D CI ∪ DI Universal restriction ∀R.D {x ∈ ∆I | ∀y ∈ ∆I (x, y) ∈ RI → y ∈ DI } Existential restriction ∃R.D {x ∈ ∆I | ∃y ∈ ∆I (x, y) ∈ RI ∧ y ∈ DI } G.Rizzo et al. (Univ. of Bari) 10th July 2016 5 / 18
  • 6. Description Logics Knowledge bases • Knowledge base: a couple K = (T , A) where • T (TBox): axioms concerning concepts/roles • Subsumption axioms C D: iff for every interpretation I, CI ⊆ DI holds • Equivalence axioms C ≡ D: iff for every interpretation I, CI ⊆ DI and I, DI ⊆ CI holds • A (ABox): class assertions, C(a) and role assertions,R(a, b) about a set of individuals is denoted by Ind(A) • Reasoning services: • subsumption: a concept is more general than a given one • satisfiability: given a concept description C and an interpretation I, CI = ∅ • instance checking: for every interpretation, I C(a) holds (a is an instance for C) G.Rizzo et al. (Univ. of Bari) 10th July 2016 6 / 18
  • 7. The problem Given: • a knowledge base K = (T , A); • the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains Di , whose analytic forms are unknown; • a training set Tr ⊆ Ind(A) for which the numeric fillers are known, Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t} Build a regression model for {Ri }t i=1, i.e. a function h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over Tr. A possible loss function may be based on the mean square error. G.Rizzo et al. (Univ. of Bari) 10th July 2016 7 / 18
  • 8. The proposed solution • Predictive Clustering • objects are clustered according to an homogeneity criterion • for each cluster a predictive model is determined (e.g. vector containing predictions) (a) clustering (b) predictive mod- els (c) predictive clus- tering G.Rizzo et al. (Univ. of Bari) 10th July 2016 8 / 18
  • 9. The model for multi-target regression • Given a knowledge base K, a PCT for multi-target regression is a binary tree where • intermediate nodes: DL concept descriptions • leaf nodes: vectors containing the predictions w.r.t. the target properties Comedy Comedy starring.Actor p = (8.45, 9810666) p = (5.38, 4200000) ¬Comedy ¬Horror p = (4.7, 4200000) p = (8.6, 4930000) G.Rizzo et al. (Univ. of Bari) 10th July 2016 9 / 18
  • 10. Learning PCTs • Divide-and-conquer strategy • For the current node: • the refinement operator generates the candidate concepts • The most promising concept E∗ is selected by maximizing the homogeneity w.r.t. the target variables simultaneously. • Best concept: the one minimizing the RMSE of the standardized target properties values • Stop conditions: • maximum number of levels • size of the training (sub)set • Leaf: the i-th component contains the average value for the i-th target property over the instances sorted to the node G.Rizzo et al. (Univ. of Bari) 10th July 2016 10 / 18
  • 11. Installing new DL concepts as inner nodes • The candidate concept descriptions are generated by using a refinement operator • A quasi ordering relation over the space of the concept descriptions • The subsumption between concepts in Description Logics • Downward refinement operator ρ(·) to obtain specializations E of a concept description D (E D) • Each concept can be obtained: • by introducing a new concept name (or its complement) as a conjuct • by replacing a sub-description in the scope of an existential restriction • by replacing a sub-description in the scope of an universal restriction G.Rizzo et al. (Univ. of Bari) 10th July 2016 11 / 18
  • 12. Prediction • Given an unseen individual a, the properties values are determined by traversing the tree structure • Given a test concept D: • if K |= D(a) the left branch is followed • if K |= ¬D(a) the right branch is followed • otherwise, a default model is returned G.Rizzo et al. (Univ. of Bari) 10th July 2016 12 / 18
  • 13. Experiments Settings • Ontologies extracted from DBPedia via crawling • Maximum depth for PCTs: 10, 15,20 • Comparison w.r.t. Terminological regression trees (TRT), multi-target k-nn regressor (with k = √ Tr) and multi-target linear regression model • atomic concepts as features set for k-nn regressor and multi-target linear regression model • 10-fold cross validation • performance in terms of RRMSE G.Rizzo et al. (Univ. of Bari) 10th July 2016 13 / 18
  • 14. Table: Datasets extracted from DBPedia Datasets Expr. Axioms. #classes # properties # ind. Fragm.#1 ALCO 17222 990 255 12053 Fragm.#2 ALCO 20456 425 255 14400 Fragm.#3 ALCO 9070 370 106 4499 Table: Target properties ranges, number of individuals employed in the learning problem Datasets Properties Range |Tr| Fragm. # 1 elevation [-654.14,19.00] 10000 populationTotal [0.0, 2255] Fragm. #2 areaTotal [0, 16980.1] 10000 areaUrban [0.0, 6740.74] areaMetro [0, 652874] Fragm. #3 height [0,251.6] 2256 weight [-63.12,304.25] G.Rizzo et al. (Univ. of Bari) 10th July 2016 14 / 18
  • 15. Outcomes Table: RRMSE averaged on the number of runs Datasets PCT TRT k-NN LR Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02 Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02 Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05 Table: Comparison in terms of elapsed times (secs) Datasets PCT TRT k-NN LR Fragm #1 elevation 2454.3 populationTotal 2353.0 total 2432 4807.3 547.6 234.5 Fragm #2 areaTotal 2256.0 areaUrban 2345.0 areaMetro 2345.2 total 2456 6946.2 546.2 235.7 Fragm #3 height 743.5 weight 743.4 total 743.3 1486.9 372.3 123.5 G.Rizzo et al. (Univ. of Bari) 10th July 2016 15 / 18
  • 16. Discussion • PCTs more performant than TRT • the different heuristic allows to choose more promising concepts • standardization mitigated abnormal values increasing the error • PCT more performant than k-nn • curse of dimensionality • k-nn more performant than LR • spurious individuals were excluded to determine the local model • PCTs more efficient than TRTs G.Rizzo et al. (Univ. of Bari) 10th July 2016 16 / 18
  • 17. Conclusion and Further Outlooks • We proposed an extension of predictive clustering trees compliant to DL representation languages for solving the problem of predicting datatype properties • Further extensions • New refinement operators • Further heuristics • linear models at leaf nodes G.Rizzo et al. (Univ. of Bari) 10th July 2016 17 / 18
  • 18. Questions? G.Rizzo et al. (Univ. of Bari) 10th July 2016 18 / 18