SlideShare une entreprise Scribd logo
1  sur  32
.nju.edu.cn




                RELIN: Relatedness and Informativeness-based
                    Centrality for Entity Summarization




                     Gong Cheng1, Thanh Tran2, Yuzhong Qu1
1 State Key Laboratory for Novel Software Technology, Nanjing University, China

          2 Institute AIFB, Karlsruhe Institute of Technology, Germany

                               gcheng@nju.edu.cn



                           Presented at ISWC2011
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           2 of 30
Entity search --- find entities that match an information need
                                                                     ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                                    3 of 30
Pay-as-you-go data integration --- judge whether two entities denote the same
                                                                                    ws .nju.edu.cn




                                     sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                                                   4 of 30
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.
        Problem: to summarize lengthy entity descriptions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           5 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   6 of 30
Data graph
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   7 of 30
Feature set
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   8 of 30
Entity summarization
                                                 ws .nju.edu.cn

        Entity summarization = feature ranking
        Entity summary = k top-ranked features




Gong Cheng (程龚) gcheng@nju.edu.cn                9 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   10 of 30
Centrality-based ranking: concepts
                                                                                 ws .nju.edu.cn

        Widely applied to text summarization and ontology summarization
        By constructing a graph
            Nodes: data elements to be ranked
            Edges: connecting related nodes
        and then, measuring node centrality
            e.g. degree, PageRank, …



                     f2
                                       f1       f4
                                                                Relatednesss ≥ threshold

                                                         f5
   Relatednesss < threshold
                                                 f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               11 of 30
PageRank
                                                                                ws .nju.edu.cn

        Simulating a random surfer’s behavior who navigates from node to node
        Two types of action
            Following a random edge (with a uniform probability distribution)
            Jumping at random (with a uniform probability distribution)
        Ranking based on the stationary distribution of such a Markov chain




                     f2
                                     f1               f4

                                                                 f5

                                                       f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               12 of 30
Centrality-based ranking for entity summarization: problems
                                                                                            ws .nju.edu.cn

        How to define a good feature
            Not only capturing the main themes of the entity description
            But also distinguishing the entity from others
        Loss of information
            Float-valued function  boolean-valued function




                     f2
                                      f1               f4
                                                                           Relatednesss ≥ threshold

                                                                  f5
   Relatednesss < threshold
                                                        f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                                          13 of 30
RELIN: concepts
                                                                                               ws .nju.edu.cn

        An extension of PageRank
            Following a random edge (           )
            within a complete graph, with a probability proportional to the relatedness between the
            two associated nodes, i.e. no threshold needed
            Jumping at random (          )
            with a probability proportional to the amount of information carried by the target that
            helps to identify the entity




Gong Cheng (程龚) gcheng@nju.edu.cn                                                              14 of 30
RELIN: RELatedness and INformativeness-based centrality
                                                                                                ws .nju.edu.cn

        Two kinds of action
            Relational move --- more likely to a feature that carries related information about the
            theme currently under investigation
            Informational jump --- more likely to a feature that provides a large amount of new
            information for clarifying the identity of the underlying entity
        Two non-uniform probability distributions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                               15 of 30
Formalization
                                                                                                    ws .nju.edu.cn

        Actions (given the current feature fq)
            P(M|fq): the probability of performing a relational move from fq
            P(J|fq): the probability of performing an informational jump from fq
            subject to P(M|fq) + P(J|fq) = 1
        Targets for actions (given FS the feature set)
            P(fp|fq,M): the probability of performing a relational move from fq to fp
            P(fp|fq,J): the probability of performing an informational jump from fq to fp
            subject to      P f p | f q , M 1 and      P f p | fq , J 1
                           f p FS                         f p FS
        Result
            x(t): |FS|-dimensional vector
            xp(t): the probability that the surfer visits fp at step t
            Finally,
                 xp t 1                  xq t   P M | fq P f p | fq , M   P J | fq P f p | fq , J
                                    f q FS

            and
                 lim x t       x
                 t




Gong Cheng (程龚) gcheng@nju.edu.cn                                                                   16 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   17 of 30
Actions
                                        ws .nju.edu.cn

        P(M|fq) = 1 – λ
        P(J|fq) = λ
        λ: to be tuned in experiments




Gong Cheng (程龚) gcheng@nju.edu.cn       18 of 30
Relatedness --- P(fp|fq,M)
                                                                                      ws .nju.edu.cn

        Relatedness between features (i.e. property-value pairs) combines
            Relatedness between properties (i.e. resources)
            Relatedness between values (i.e. resources)
        Relatedness between resources = relatedness between resource names
            URI: label or local name
            Literal: lexical form
        Distributional relatedness between resource names
            More related = more often co-occur in certain contexts (e.g. documents)
        Estimated via “pointwise mutual information + Google”

                                                                      Hits si , s j
                                                      P si , s j
                                                                              N


                                                                   Hits s j
                                                      P sj
                                                                      N




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     19 of 30
Informativeness --- P(fp|fq,J)
                                                                                           ws .nju.edu.cn

        Self-information

        o: informational jump from fq to fp
        P(fp|fq): the probability that fp belongs to a feature set given fq also does so
        Estimated via a statistical analysis of the data set




        Approximation: P(fp|fq) = P(fp)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          20 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   21 of 30
Experiments
                                    ws .nju.edu.cn

        Intrinsic evaluation
        Extrinsic evaluation




Gong Cheng (程龚) gcheng@nju.edu.cn   22 of 30
Intrinsic evaluation --- design
                                                                                ws .nju.edu.cn

        Task
            To manually construct ideal entity summaries as the gold standard
        Participants
            24 students majoring in computer science
        Test cases
            149 entity descriptions randomly selected from DBpedia 3.4
        Assignment
            4.43 participants per entity description
        Output
            Top-5 features
            Top-10 features




Gong Cheng (程龚) gcheng@nju.edu.cn                                               23 of 30
Intrinsic evaluation --- results
                                                                          ws .nju.edu.cn

        Metric: overlap between summaries

        Agreement between participants about ideal summaries
            2.91 when k=5
            7.86 when k=10
        Quality of summaries computed under different approach settings




             Baselines
              Ours




Gong Cheng (程龚) gcheng@nju.edu.cn                                         24 of 30
Extrinsic evaluation --- design
                                                                                 ws .nju.edu.cn

        Task
            To manually confirm entity mappings by using summaries
        Participants
            19 students majoring in computer science
        Test cases
            47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009)
            Gold-standard judgments based on owl:sameAs links
                 24 correct and 23 incorrect
        Assignment
            3.62 participants per pair, per approach setting
        Output
            Judgment: correct or incorrect




Gong Cheng (程龚) gcheng@nju.edu.cn                                                25 of 30
Extrinsic evaluation --- results
                                                                                         ws .nju.edu.cn

        Metrics
            Accuracy of the judgments
                  1.0 = consistent with the gold standard
                  0.0 = inconsistent
            Time spent
                  Normalized by the average time per judgment spent by the participant
                  1.0 = medium efficiency
                  Smaller value = higher efficiency


        Results




Gong Cheng (程龚) gcheng@nju.edu.cn                                                        26 of 30
Discussion
                                                                                      ws .nju.edu.cn

        Automatically computed summaries are still not as good as handcrafted ones.

                                                                         k=5   k=10
         Agreement between ideal summaries                              2.91 7.86
         Agreement between computed summaries and ideal summaries       2.40 4.88

        User-specific notion of informativeness
            Longitude and latitude are highly informative, but …
        Information redundancy
            Longitude + latitude = point
            What if multiple sources …
        Summarization = what + how (to present)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     27 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   28 of 30
Conclusions
                                                                                            ws .nju.edu.cn

        Problem of entity summarization
            Extractive
            About identifying the entity that underlies a lengthy description
        The RELIN model
            Variant of the random surfer model
            Non-uniform probability distributions
            Informativeness + relatedness
        Implementation
            Based on linguistic and information theory concepts
            Using information captured by the labels of nodes and edges in the data graph
        Experiments
            Closer to handcrafted ideal summaries
            Assisting users in confirming entity mappings more accurately




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           29 of 30
Future work --- application-specific entity summarization
                                                                ws .nju.edu.cn




                                    sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                               30 of 30
Related work --- summarization
                                                                                 ws .nju.edu.cn



   Paradigm               Approach             Measure             Model
                                                                        RELIN
         Extractive                                              - Relatedness
    - Text                  Centrality-based     PageRank-like   - Informativeness
    - Ontology                                                   - Non-uniform
                                                                 probability distribution



                                                     Others           PageRank
      Non-extractive
                                               - Degree          - Relatedness
    - Database               Centroid-based
                                               - Betweenness     - Uniform probability
    - Graph
                                               -…                distribution




Gong Cheng (程龚) gcheng@nju.edu.cn                                                31 of 30
Related work --- ranking
                                                                                         ws .nju.edu.cn

        Different goals --- to best identify the underlying entity
            B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE
            Internet Comput. 2005.
            R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010.
            T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC
            2009.
            …
        Exploitation of data semantics at different levels --- use labels of nodes and edges
            T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009.
            X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007.
            …




Gong Cheng (程龚) gcheng@nju.edu.cn                                                       32 of 30

Contenu connexe

En vedette

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyViewGong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析Gong Cheng
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
 

En vedette (6)

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 

Plus de Gong Cheng

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference reviewGong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryGong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...Gong Cheng
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic WebGong Cheng
 

Plus de Gong Cheng (20)

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and Beyond
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic Web
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

  • 1. .nju.edu.cn RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization Gong Cheng1, Thanh Tran2, Yuzhong Qu1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute AIFB, Karlsruhe Institute of Technology, Germany gcheng@nju.edu.cn Presented at ISWC2011
  • 2. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 30
  • 3. Entity search --- find entities that match an information need ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 30
  • 4. Pay-as-you-go data integration --- judge whether two entities denote the same ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 30
  • 5. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Problem: to summarize lengthy entity descriptions Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 30
  • 6. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 30
  • 7. Data graph ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 30
  • 8. Feature set ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 30
  • 9. Entity summarization ws .nju.edu.cn Entity summarization = feature ranking Entity summary = k top-ranked features Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 30
  • 10. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 30
  • 11. Centrality-based ranking: concepts ws .nju.edu.cn Widely applied to text summarization and ontology summarization By constructing a graph Nodes: data elements to be ranked Edges: connecting related nodes and then, measuring node centrality e.g. degree, PageRank, … f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 30
  • 12. PageRank ws .nju.edu.cn Simulating a random surfer’s behavior who navigates from node to node Two types of action Following a random edge (with a uniform probability distribution) Jumping at random (with a uniform probability distribution) Ranking based on the stationary distribution of such a Markov chain f2 f1 f4 f5 f3 Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 30
  • 13. Centrality-based ranking for entity summarization: problems ws .nju.edu.cn How to define a good feature Not only capturing the main themes of the entity description But also distinguishing the entity from others Loss of information Float-valued function  boolean-valued function f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 30
  • 14. RELIN: concepts ws .nju.edu.cn An extension of PageRank Following a random edge ( ) within a complete graph, with a probability proportional to the relatedness between the two associated nodes, i.e. no threshold needed Jumping at random ( ) with a probability proportional to the amount of information carried by the target that helps to identify the entity Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 30
  • 15. RELIN: RELatedness and INformativeness-based centrality ws .nju.edu.cn Two kinds of action Relational move --- more likely to a feature that carries related information about the theme currently under investigation Informational jump --- more likely to a feature that provides a large amount of new information for clarifying the identity of the underlying entity Two non-uniform probability distributions Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 30
  • 16. Formalization ws .nju.edu.cn Actions (given the current feature fq) P(M|fq): the probability of performing a relational move from fq P(J|fq): the probability of performing an informational jump from fq subject to P(M|fq) + P(J|fq) = 1 Targets for actions (given FS the feature set) P(fp|fq,M): the probability of performing a relational move from fq to fp P(fp|fq,J): the probability of performing an informational jump from fq to fp subject to P f p | f q , M 1 and P f p | fq , J 1 f p FS f p FS Result x(t): |FS|-dimensional vector xp(t): the probability that the surfer visits fp at step t Finally, xp t 1 xq t P M | fq P f p | fq , M P J | fq P f p | fq , J f q FS and lim x t x t Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 30
  • 17. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 30
  • 18. Actions ws .nju.edu.cn P(M|fq) = 1 – λ P(J|fq) = λ λ: to be tuned in experiments Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 30
  • 19. Relatedness --- P(fp|fq,M) ws .nju.edu.cn Relatedness between features (i.e. property-value pairs) combines Relatedness between properties (i.e. resources) Relatedness between values (i.e. resources) Relatedness between resources = relatedness between resource names URI: label or local name Literal: lexical form Distributional relatedness between resource names More related = more often co-occur in certain contexts (e.g. documents) Estimated via “pointwise mutual information + Google” Hits si , s j P si , s j N Hits s j P sj N Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 30
  • 20. Informativeness --- P(fp|fq,J) ws .nju.edu.cn Self-information o: informational jump from fq to fp P(fp|fq): the probability that fp belongs to a feature set given fq also does so Estimated via a statistical analysis of the data set Approximation: P(fp|fq) = P(fp) Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 30
  • 21. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 30
  • 22. Experiments ws .nju.edu.cn Intrinsic evaluation Extrinsic evaluation Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 30
  • 23. Intrinsic evaluation --- design ws .nju.edu.cn Task To manually construct ideal entity summaries as the gold standard Participants 24 students majoring in computer science Test cases 149 entity descriptions randomly selected from DBpedia 3.4 Assignment 4.43 participants per entity description Output Top-5 features Top-10 features Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 30
  • 24. Intrinsic evaluation --- results ws .nju.edu.cn Metric: overlap between summaries Agreement between participants about ideal summaries 2.91 when k=5 7.86 when k=10 Quality of summaries computed under different approach settings Baselines Ours Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 30
  • 25. Extrinsic evaluation --- design ws .nju.edu.cn Task To manually confirm entity mappings by using summaries Participants 19 students majoring in computer science Test cases 47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009) Gold-standard judgments based on owl:sameAs links 24 correct and 23 incorrect Assignment 3.62 participants per pair, per approach setting Output Judgment: correct or incorrect Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 30
  • 26. Extrinsic evaluation --- results ws .nju.edu.cn Metrics Accuracy of the judgments 1.0 = consistent with the gold standard 0.0 = inconsistent Time spent Normalized by the average time per judgment spent by the participant 1.0 = medium efficiency Smaller value = higher efficiency Results Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 30
  • 27. Discussion ws .nju.edu.cn Automatically computed summaries are still not as good as handcrafted ones. k=5 k=10 Agreement between ideal summaries 2.91 7.86 Agreement between computed summaries and ideal summaries 2.40 4.88 User-specific notion of informativeness Longitude and latitude are highly informative, but … Information redundancy Longitude + latitude = point What if multiple sources … Summarization = what + how (to present) Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 30
  • 28. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 30
  • 29. Conclusions ws .nju.edu.cn Problem of entity summarization Extractive About identifying the entity that underlies a lengthy description The RELIN model Variant of the random surfer model Non-uniform probability distributions Informativeness + relatedness Implementation Based on linguistic and information theory concepts Using information captured by the labels of nodes and edges in the data graph Experiments Closer to handcrafted ideal summaries Assisting users in confirming entity mappings more accurately Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 30
  • 30. Future work --- application-specific entity summarization ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 30
  • 31. Related work --- summarization ws .nju.edu.cn Paradigm Approach Measure Model RELIN Extractive - Relatedness - Text Centrality-based PageRank-like - Informativeness - Ontology - Non-uniform probability distribution Others PageRank Non-extractive - Degree - Relatedness - Database Centroid-based - Betweenness - Uniform probability - Graph -… distribution Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 30
  • 32. Related work --- ranking ws .nju.edu.cn Different goals --- to best identify the underlying entity B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Comput. 2005. R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010. T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC 2009. … Exploitation of data semantics at different levels --- use labels of nodes and edges T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009. X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007. … Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 30