Abstracts Track 2023


Nr: 60
Title:

Enhancing Knowledge Graph Embedding with Type-Constraint Features

Authors:

Wenjie Chen and Zhengyin Hu

Abstract: Knowledge graph (KG) embedding represents entities and relations with latent vectors, which has been widely adopted in relation extraction and KG completion. Among existing works, translation-based models treat each relation as the translation from head entitiy to tail entitiy and have attracted much attention. However, these models only utilize fact triples but ignore prior knowledge on relational type-constraints. This paper presents a generic framework to enhance knowledge graph embedding with type-constraint features (ETF). In ETF, the embedding of entity is comprised of two parts—entity-specific embedding and constraint-specific embedding. The former expresses translation features of entities, and the latter represents semantic constraints Influence by linked relations. Besides, the adaptive margin-based loss is designed to learn embeddings, which effectively separates the negative and positive triples. Finally, the results on four public datasets demonstrate that ETF makes significant improvements over the baselines.

Nr: 195
Title:

Ontology Development of IPAMS Readiness Model to Enhance Digital Transformation

Authors:

Ildikó Szabó and Katalin Ternai

Abstract: The European Union's current digital strategy, the Digital Agenda 2030, aims to improve the digitalisation level of 75% of European businesses through the adoption of cloud computing, AI and Big Data technologies. Companies need to be aware how these technologies fit to their working and effect their business processes, what modifications are required before implementing them. This requirement was the call for experts to create digital maturity and readiness models. These models and their evaluating criteria systems (pillars, layers, levels etc,) differ based on the creators’ perspectives such as data-centric development or technological point of view. However, interconnections between technologies and company attributes (processes, human or financial assets and so on) were discovered by these studies, they have been formalized scarcely. Ontology development method help to do this formalization in an explicit, machine-readable way. Recently, ontology-based approach was involved only in a few research, to support the research rigor, specify Industry 4.0 domain and formalize common aspects and shared conceptualization of these models. IPAMS project (2020-1.1.2-PIACI-KFI-2021-00213) aims at developing production monitoring and analytics systems based on I4.0 solutions to support domestic industrial digitalisation. This paper aims to demonstrate how ontology development and SPARQL scripts can facilitate to discover hidden relationship within the structure of the IPAMS readiness model, what rules can be detected and how they can be applied to measure the readiness of a company for a given Industry 4.0 technology. The applicability of this ontology within the Protégé ontology development environment is presented based on the cases of 31 firms.

Nr: 99
Title:

Utilising Bert-Based Topic Modeling and Auto-Generated Knowledge Graphs to Investigate Industry 4.0 Readiness

Authors:

Szabina Fodor, Andrea Gelei and Katalin Ternai

Abstract: 1. INTRODUCTION The term "Industry 4.0" (I4.0) has gained popularity in recent years, and experts view it as a beneficial instrument to boost competitiveness, particularly small and medium-sized firms (SMEs). Despite several I4.0 readiness models, researchers highlighted the need for an SME-focused one to capture the key challenges SMEs confront while implementing I4.0. We aim to empirically examine the challenges associated with SMEs’ adoption of I4.0. The analysis is based on semi-structured experts’ interviews, explored using topic modeling and an automatically generated knowledge graph (AutoKG). 2. METHODOLOGY In our research, seven interviews with domestic experts on topics relevant to the I4.0 readiness of SMEs were conducted, and their transcripts were analysed using BERTopic [1] and AutoKG [2]. First, we translated the interviews into English. Then the text was transformed using a pre-trained sentence transformer language model named "all-MiniLM-L12-v2". We clustered the sentences using the HDBSCAN algorithm. To characterise the resulting 13 clusters, we merged the sentences belonging to the clusters and created topic vectors describing clusters using the class-based TF-IDF approach (c-TF-IDF). The resulting words as name entities and the transcripts of the interviews were fed into the AutoKG model, where a cooperative agent framework called role-playing [3] was used. This framework utilises multiple agents to facilitate the construction of knowledge graphs. In our research, the AI assistant is designated as a Consultant and the AI user as a KG domain expert. They collaborate to complete the specified task until the AI user confirms completion. Finally, we used the resulting knowledge graphs to identify the critical elements of the aspects I4.0 readiness framework for SMEs. 3. RESULTS AND DISCUSSION Our results highlight the importance of two significant, closely interlinked, intrinsically complex organisational capabilities, without which I4.0 applications are unlikely to be successful for SMEs: - The advanced process management capability of the enterprise and the supporting, - enterprise IT management capability. Furthermore, we compared our results with two models (CMMI [4], L&K [5]) used to evaluate the organisational capabilities of SMEs for I 4.0 readiness. Our research results confirmed that the evaluation criteria of these earlier models are still valid today. However, in the case of IT management capability, our research has identified a vital assessment aspect (IT, including data and cyber security) that needs to be added to the existing IT management capability model. REFERENCES 1. Grootendorst, M., BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794, 2022. 2. Zhu, Y., et al., LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities. arXiv preprint arXiv:2305.13168, 2023. 3. Li, G., et al., Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760, 2023. 4. CMMI Product Team, T., CMMI® for Development, Version 1.3. Preface, SEI, CMU, 2010. 5. Seong Leem, C., et al., Information technology maturity stages and enterprise benchmarking: an empirical study. Industrial Management & Data Systems, 2008. 108(9): p. 1200-1218.

Nr: 193
Title:

A Hematopoietic Stem Cell Knowledge Graph for Scientific Knowledge Discovery

Authors:

Zhengyin Hu and Wenjie Chen

Abstract: 1. INTRODUCTION Scientific Knowledge Discovery (SKD) which derives from literature-based discovery (LBD), aims to alleviate these issues by combining natural language processing, text mining, semantic techniques and scientometrics methods, and has become an important research area in biomedical informatics. Hematopoietic stem cell (HSC) is one kind of the most effective stem cells for clinical treatments, it is of great significance to discover important knowledge entities, knowledge relations and knowledge paths by literatures mining for HSC knowledge discovery. Knowledge graph (KG) which represents knowledge entities and their relations with more details in a simple manner and is widely used in scientific knowledge discovery (SKD).The objective of this research is to automatically generate a HSC KG from S&T literatures to support HSC knowledge discovery. 2. METHOD AND DATA This paper proposes a framework of generating KG using Subject-Predicate-Object (SPO) triples from literatures, which includes five processes: literatures retrieval, SPO extracting, SPO cleanup, SPO ranking, discovery pattern integrating, and graph building. 21, 098 papers from PubMed and 4,786 patents from Derwent Innovation were eventually used for HSC KG. At present, there are 14,617 knowledge entities and 224,742 relationships in HSC KG. These knowledge entities were given 98 labels according to the UMLS semantic types. And the relationships among them belong to 28 types in three categories — “knowledge entity-knowledge entity, knowledge entity- literature, SPO triple- literature”. 3. SKD BASED ON HSC KG Three kinds of SKD scenarios using HSC KG are introduced. In an open discovery case, researcher starts from a problem to be solved, that is, what are the indirect effects of HSC. The computational complexity and the cognitive complexity of open discovery are both very high. HSC KG helps to efficiently and accurately discover those implicit knowledge entities by looking for a series of concepts that may be indirectly affected by HSC, and interpret why this might happen. In a close discovery case, researcher has made hypothesis (or preliminary experimental findings) that there should be associations between HSC and HIV. However, it is not clear what the specific associations are and how they occur. HSC KG helps to discover some heuristic knowledge paths between HSC and HIV by path finding and knowledge inference techniques. In a topic discovery case, researcher hope to find some valuable research topics about the effects of HSC. HSC KG helps to generate these topics by subgraph generating and community detection techniques. 4.CONCLUSIONS The results show the HSC KG has advantages of “using graph data structure”, “integrating discovery patterns”, “fusing native graph mining algorithms”, and “easy use”, which can effectively support deep open discovery, close discovery, and topic discovery in HSC. APPENDIX This study has been published as a paper of “Generating a Hematopoietic Stem Cell Knowledge Graph for Scientific Knowledge Discovery”, more details can be found in the complementing material.