Motivation

For several decades, semantic systems were predominantly developed around knowledge graphs at different degrees of expressivity. Through the explicit representation of knowledge in well-formed, logically sound ways, knowledge graphs provide knowledge-based text analytics with rich, expressive and actionable descriptions of the domain of interest and support logical explanations of reasoning outcomes. On the downside, knowledge graphs can be costly to produce since they require a considerable amount of human effort to manually encode knowledge in the required formats. Additionally, such knowledge representations can sometimes be excessively rigid and brittle in the face of different natural language processing applications, like e.g. classification, named entity recognition, sentiment analysis and question answering.

In parallel, the last decade has witnessed a shift towards neural methods to text understanding due to the increasing availability of raw data and cheaper computing power. Such methods have proved to be powerful and convenient in many linguistic tasks. Particularly, results in the field of distributional semantics have shown promising ways to capture the meaning of each word in a text corpus as a vector in dense, low-dimensional spaces. Approaches like Word2Vec, GloVe, fastText or Swivel were tremendously successful. More recently, the area is led by contextual word embeddings based on neural language models like ELMo, BERT and GPT-2, to name a few. Among their applications, word embeddings have shown their utility in term similarity, analogy and relatedness, as well as many downstream tasks in natural language processing like semantic role labeling, entailment, question answering and sentiment analysis.

Aimed towards both researchers and practitioners, this tutorial shows how it is possible to bridge the gap between knowledge-based and neural approaches to bring an additional boost to natural language processing and other data modalities . Following a practical and hands-on approach, the tutorial tries to address a number of fundamental questions to achieve this goal, including:

  • How can neural methods extend previously captured knowledge explicitly represented as knowledge graphs in cost-efficient and practical ways?
  • What are the main building blocks and techniques enabling such hybrid approach to NLP?
  • How can structured and neural representations be seamlessly integrated?
  • How can the quality of the resulting hybrid representations be inspected and evaluated?
  • How can this result in higher quality structured and neural representations?
  • How does this impact on the performance of NLP tasks, the processing of other data modalities, like visual data, and their interplay?