Publication - Computing Semantic Similarity of Concepts in Knowledge Graphs

Computing Semantic Similarity of Concepts in Knowledge Graphs

Ganggao Zhu & Carlos A. Iglesias. (2017). Computing Semantic Similarity of Concepts in Knowledge Graphs. Transactions on Knowledge and Data Engineering, 29 (1), 72-85.

Abstract:

This paper presents a method for measuring the semantic similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic similarity methods have focused on either the structure of the semantic network between concepts (e.g. path length and depth), or only on the Information Content (IC) of concepts. In many KGs, since the structure of the concepts can be represented by TBox and the IC of concepts can be derived from ABox, we propose a semantic similarity method, namely wpath, to combine these two approaches using IC to weight the shortest path length of concepts. Corpus-based IC is computed from the distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated concepts and has high computational cost. As instances of ABox are already extracted from textual corpus and annotated by concepts from TBox, graph-based IC is proposed to compute IC based on the distribution of concepts over instances. Within this graph-based IC, the wpath semantic similarity method can be used to compute semantic similarity between concepts of a KG only based on the structural knowledge of concepts and the statistical knowledge of instances. Through experiments performed on well known word similarity datasets, we show that the wpath semantic similarity method has produced statistically significant improvement over other semantic similarity methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance in terms of accuracy and F score.

JRESEARCH_BIBTEX:

@article{computing-gsi-article-2017,
author = "Zhu, Ganggao and Iglesias, Carlos A.",
abstract = "This paper presents a method for measuring the semantic similarity between concepts in Knowledge Graphs (KGs) such as
WordNet and DBpedia. Previous work on semantic similarity methods have focused on either the structure of the semantic network
between concepts (e.g. path length and depth), or only on the Information Content (IC) of concepts. In many KGs, since the structure
of the concepts can be represented by TBox and the IC of concepts can be derived from ABox, we propose a semantic similarity
method, namely wpath, to combine these two approaches using IC to weight the shortest path length of concepts. Corpus-based IC is
computed from the distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated
concepts and has high computational cost. As instances of ABox are already extracted from textual corpus and annotated by concepts
from TBox, graph-based IC is proposed to compute IC based on the distribution of concepts over instances. Within this graph-based
IC, the wpath semantic similarity method can be used to compute semantic similarity between concepts of a KG only based on the
structural knowledge of concepts and the statistical knowledge of instances. Through experiments performed on well known word
similarity datasets, we show that the wpath semantic similarity method has produced statistically significant improvement over other
semantic similarity methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance
in terms of accuracy and F score.",
comments = "JCR 2017 Q1 2.775, SJR 2017 Q1 1.133, Scopus 2017 Q1 9.4",
doi = "10.1109/TKDE.2016.2610428",
issn = "1041-4347",
journal = "Transactions on Knowledge and Data Engineering",
keywords = "semantic similarity;knowledge graph;semantic relatedness;WordNet;DBpedia;information content",
month = "January",
number = "1",
pages = "72-85",
title = "{C}omputing {S}emantic {S}imilarity of {C}oncepts in {K}nowledge {G}raphs",
url = "http://ieeexplore.ieee.org/document/7572993/",
volume = "29",
year = "2017",
}

JCR 2017 Q1 2.775, SJR 2017 Q1 1.133, Scopus 2017 Q1 9.4