The still young discipline of the management and governance of Knowledge graphs (KG) is gradually beginning to consolidate on the basis of concrete project experience. It has been clearly recognized that the underlying methodology is multidisciplinary and that it cannot simply be covered by existing, often classical roles and skills in data and information management. Rather, there is a need for new roles in which the 'Knowledge Scientist' is to be given a central position because he or she is able to bring together the two archetypical, sometimes rivalling roles of the 'Data Engineer' and the 'Knowledge engineer'.
What an enterprise knowledge graph (EKG) is and how it is created, there are (at least) two different answers to that in the current discourse. These two points of view are often understood as if they were mutually exclusive and incompatible; however, these are two approaches to semantic data modeling that should be combined in the concrete development of a knowledge graph. For practitioners and potential users, these supposed opposites naturally cause confusion, because the two approaches are often understood as alternatives to each other, if presented in simplified form. Here are the two views in simple words:
Approach 1—Principle ‘Knowledge’: A knowledge graph is a model of a knowledge domain that is curated by corresponding subject-matter experts (SMEs) with the support of knowledge engineers, e.g., taxonomists or ontologists, whereby partially automatable methods can be used. Knowledge domains can overlap and represent in most cases only a subdomain of the entire enterprise. Knowledge modelers tend to create specific, expressive and semantically rich knowledge models, but only for a limited scope of an enterprise. This approach is mainly focused on the expert loop within the entire knowledge graph lifecycle.
Approach 2—Principle ‘Data’: A knowledge graph is a graph-based representation of already existing data sources, which is created by data engineers with the help of automatable transformation, enrichment and validation steps. Ontologies and rules play an essential role in this process, and data lineage is one of the most complex problems involved. In this approach, data engineers focus on the automation loop of the KG lifecycle and aim to reuse and integrate as many data sources as possible to create a data graph. The ontologies and taxonomies involved in this approach provide only the level of expressiveness needed to automate data transformation and integration.
With the principle 'Data', the graph-based representation of often heterogeneous data landscapes moves into the center so that it can roll out agile methods of data integration (e.g., 'Customer 360'), data quality management, and extended possibilities of data analysis.
The 'Knowledge' principle, on the other hand, introduces to a greater extent the idea of linking and enriching existing data with additional knowledge as a means to, for example, support knowledge discovery, automated reasoning, and in-depth analyses in large and complex databases.
So, are these two approaches mutually exclusive? The acting protagonists and proponents of both scenarios look at the same corporate knowledge from two different perspectives. This sometimes seems as if they are pursuing different goals, especially when participants’ mindsets can vary significantly.
The view of ‘Knowledge engineers’: Approach 1 involves knowledge modelers/engineers, computer linguists and partly also data scientists who have a holistic view of data, i.e., they want to be able to link data and bring it into new contexts in order to be able to provide extended possibilities for data analysis, knowledge retrieval, or recommender systems. This is done without 'container thinking', no matter whether information or facts are locked up in relational databases or proprietary document structures, they should be extracted and made (re-)usable.