SlideShare une entreprise Scribd logo
1  sur  21
STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
Overview Motivation & Applications Why STL?  Semantic Terminology Linguistic Evaluation Conclusion and future work 2
Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic  Phrase and dependency structure 7
STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable  with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant  and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant  and Equipment Property, Plant  and Equipment [Total] 11
Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of 	    	concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship   12
Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
Cont. Trade Debts Payable After More Than One Year  [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year  14
Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
Linguistic	 Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
Experiment An overview of similarity measures 18
Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic  Contribution Terminology Contribution Linguistic  Contribution 19
Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms  9M term pairs richer set of linguistic operations “recognise” => “recognition”  	by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21

Contenu connexe

Tendances

110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
helggeist
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRL
Mamta Binani
 

Tendances (10)

Overview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.comOverview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.com
 
Gaia 5
Gaia 5Gaia 5
Gaia 5
 
Xbrl india[1]
Xbrl india[1]Xbrl india[1]
Xbrl india[1]
 
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and Fundamental
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Steps
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRL
 
XBRL Fundamentals
XBRL FundamentalsXBRL Fundamentals
XBRL Fundamentals
 
XBRL Overview
XBRL OverviewXBRL Overview
XBRL Overview
 
Xbrl slideshare
Xbrl slideshareXbrl slideshare
Xbrl slideshare
 

Similaire à STL: A similarity measure based on semantic and linguistic information

Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
Tobias Wunner
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrl
Ifk Bigfood
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvalues
jps619
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation Considerations
John Bauer
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
cd_crisci
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets report
MMMTechLaw
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrules
jps619
 

Similaire à STL: A similarity measure based on semantic and linguistic information (20)

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrl
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and Ontologies
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptx
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrl
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federation
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvalues
 
42109 scudeletti (1)
42109 scudeletti (1)42109 scudeletti (1)
42109 scudeletti (1)
 
Mike Bennett
Mike BennettMike Bennett
Mike Bennett
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation Considerations
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
 
CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets report
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrules
 
Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018
 

Dernier

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Dernier (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

STL: A similarity measure based on semantic and linguistic information

  • 1. STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
  • 2. Overview Motivation & Applications Why STL? Semantic Terminology Linguistic Evaluation Conclusion and future work 2
  • 3. Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
  • 4. SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
  • 5. Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
  • 6. Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
  • 7. Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic Phrase and dependency structure 7
  • 8. STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
  • 9. Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
  • 10. Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
  • 11. Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant and Equipment Property, Plant and Equipment [Total] 11
  • 12. Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship 12
  • 13. Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
  • 14. Cont. Trade Debts Payable After More Than One Year [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year 14
  • 15. Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
  • 16. Linguistic Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
  • 17. Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
  • 18. Experiment An overview of similarity measures 18
  • 19. Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic Contribution Terminology Contribution Linguistic Contribution 19
  • 20. Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
  • 21. Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms 9M term pairs richer set of linguistic operations “recognise” => “recognition” by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21