Overview

Many different artificial intelligence techniques can be used to explore and exploit large document corpora that are available inside organizations and on the Web. While natural language is symbolic in nature and the first approaches in the field were based on symbolic and rule-based methods, many of the most widely used methods are currently based on neural approaches. Each of these two main schools of thought in natural language processing have their strengths and limitations and there is an increasing trend that seeks to combine them in complementary ways to get the best of both worlds.

This tutorial covers the foundations and modern practical applications of knowledge-based and neural methods, techniques and models and their combination for exploiting large document corpora. The tutorial first focuses on the foundations that can be used to this purpose, including knowledge graphs, word embeddings, and language models. Then it shows how these techniques can be effectively combined in NLP tasks and other data modalities in addition to text related to research and innovation projects.

Motivation

For several decades, semantic systems were predominantly developed around knowledge graphs at different degrees of expressivity. Through the explicit representation of knowledge in well-formed, logically sound ways, knowledge graphs provide knowledge-based text analytics with rich, expressive and actionable descriptions of the domain of interest and support logical explanations of reasoning outcomes. On the downside, knowledge graphs can be costly to produce since they require a considerable amount of human effort to manually encode knowledge in the required formats. Additionally, such knowledge representations can sometimes be excessively rigid and brittle in the face of different natural language processing applications, like e.g. classification, named entity recognition, sentiment analysis and question answering.

In parallel, the last decade has witnessed a shift towards neural methods to text understanding due to the increasing availability of raw data and cheaper computing power. Such methods have proved to be powerful and convenient in many linguistic tasks. Particularly, results in the field of distributional semantics have shown promising ways to capture the meaning of each word in a text corpus as a vector in dense, low-dimensional spaces. Approaches like Word2Vec, GloVe, fastText or Swivel were tremendously successful. More recently, the area is led by contextual word embeddings based on neural language models like ELMo, BERT and GPT-2, to name a few. Among their applications, word embeddings have shown their utility in term similarity, analogy and relatedness, as well as many downstream tasks in natural language processing like semantic role labeling, entailment, question answering and sentiment analysis.

Aimed towards both researchers and practitioners, this tutorial shows how it is possible to bridge the gap between knowledge-based and neural approaches to bring an additional boost to natural language processing and other data modalities . Following a practical and hands-on approach, the tutorial tries to address a number of fundamental questions to achieve this goal, including:

  • How can neural methods extend previously captured knowledge explicitly represented as knowledge graphs in cost-efficient and practical ways?
  • What are the main building blocks and techniques enabling such hybrid approach to NLP?
  • How can structured and neural representations be seamlessly integrated?
  • How can the quality of the resulting hybrid representations be inspected and evaluated?
  • How can this result in higher quality structured and neural representations?
  • How does this impact on the performance of NLP tasks, the processing of other data modalities, like visual data, and their interplay?

Description

This tutorial can be followed individually or at some of the venues where we regularly impart it. It offers plenty of materials, including practical content and examples in the form of Jupyter notebooks that can be run in the cloud.

When given physically, we like to have an interactive session where both instructors and participants can engage in rich discussions on the topic. Some familiarity on the matter is expected but otherwise this should not prevent you from coming if you are interested in the topic.

We divide the tutorial in two main blocks: fundamentals and applications.

Fundamentals

  • Capturing meaning from text as word embeddings.
  • Neural language models and contextual embeddings.
  • Knowledge graph embeddings.
  • Vecsigrafo – generating hybrid knowledge representations from text corpora and knowledge graphs.
  • Evaluating Vecsigrafo – beyond visual inspection and intrinsic methods.
  • Vecsigrafo for knowledge graph curation and interlinking.

Applications

  • Applications in multi-lingual natural language processing.
  • Beyond text understanding: multi-modal machine comprehension.

Proposed application domains

  • Misinformation analysis
  • Scientific information management
  • Classic literature

Audience

We welcome researchers and practitioners both from industry and academia with an interest in neural and knowledge-based approaches to NLP and knowledge graphs.

Instructors

The tutorial is offered by the following members of the Cogito Research Lab at Expert System.

Jose Manuel Gomez-Perez (jmgomez@expertsystem.com) works in the intersection of several areas of Artificial Intelligence, including Natural Language Processing, Knowledge Graphs and Machine Learning. His vision is to enable machines to understand text and other data modalities like diagrams in a way similar to how humans read. At Expert System, Jose Manuel leads the Cogito Research Lab in Madrid, where he focuses on the combination of structured knowledge graphs and neural representations to extend Cogito‘s capabilities. Before Expert System, he worked at iSOCO, one of the first european companies to deploy AI solutions on the Web. He consults for organizations like the European Space Agency and is the co-founder of ROHub.org, the platform for scientific information management based on research objects. A former Marie Curie fellow, Jose Manuel holds a Ph.D. in Computer Science and Artificial Intelligence from Universidad Politécnica de Madrid. He regularly publishes in top scientific conferences and journals and his views have appeared in magazines like Nature and Scientific American, as well as newspaper like El País.

Ronald Denaux (rdenaux@expertsystem.com) is a senior researcher at Expert System. Ronald obtained his MSc in Computer Science from the Technical University Eindhoven, The Netherlands. After a couple of years working in industry as a software developer for a large IT company in The Netherlands, Ronald decided to go back to academia. He obtained a PhD, again in Computer Science, from the University of Leeds, UK. Ronald’s research interests have revolved around making semantic web technologies more usable for end users, which has required research into (and resulted in various research publications in) the areas of Ontology Authoring and Reasoning, Natural Language Interfaces, Dialogue Systems, Intelligent User Interfaces and User Modelling. Besides research, Ronald also participates in knowledge transfer and product development.

Raul Ortega (rortega@expertsystem.com) is a research engineer at Expert System. Raul obtained his degree in Computer Engineering from Universidad Politécnica de Madrid. After joining Expert System he participated in the development of different applications for the recommendation of scientific content and management of scholarly communications in the context of the EVER-EST project funded by the European research program Horizon 2020. Since he obtained his degree, his main focus has been on research in multi-modal knowledge extraction and transfer learning.