Michael Günther

Context Management in Database Systems with Word Embeddings

In complex adaptive systems data integration plays an important role. Large organizations usually store data in a lot of different databases with different large schemes. Storing the data in one common scheme is due to the segmentation of such organizations in many loosely connected departments often not possible. However, there are situations in which data from different schemes have to be integrated to solve certain tasks. Since the schemes can change independently it is highly desirable to be able to automatically integrate data from different sources. However, such data integration tasks typically require a lot of manual effort. Since the volume of data which has to be managed growth and software systems evolving more frequently, the demand for more automated data integration solutions increases. Furthermore, data integration has to be supported by tools for data discovery and data exploration which allow the user to observe the coherence in the data. Word embeddings can be trained on texts of a specific domain. In this way, word embeddings provide a deeper understanding of these domains. They can also be facilitated to gather domain information which is useful to integrate different information sources. Moreover, word embedding operations enable capabilities for semantic comparison. This can be utilized for information retrieval and data exploration. Despite this, word embeddings have been shown to be useful for a variety of machine learning task, especially for data integration purposes (e.g. entity resolution, schema matching, …). In this thesis, it should be investigated how word embeddings can be integrated into relational database systems. For this purpose, new operations for unstructured text values should be provided. Furthermore, techniques should be provided to combine the knowledge in the relational database with the information encoded in the word embeddings to enable inference based on combinations of logical and inductive reasoning.