Abstracts Track 2022


Nr: 10
Title:

Is Hyper-personalization of Recommendation Always Good? Users’ Active Optimization Behavior than Passive Personalization Behavior on Dataveillance and Privacy Concerns

Authors:

Jooyoung Kim and Hangjung Zo

Abstract: How to improve recommendation algorithm accuracy and provide more personalized recommendations to users is currently mainstream research from various researchers in the information systems field. Most of these studies suggest more customized recommendation algorithms to develop the overall recommendation accuracy that fits users' tastes and increases recommender systems' profits. However, it has not been questioned yet whether it is always suitable if the accuracy of the recommendation algorithm keep continues to grow. Building on protection motivation theory (PMT), this study tries to illuminate the impact of users' dataveillance and their active self-optimization behavior rather than passive personalization behavior on their recommendation algorithm results. It also investigates the mediating effects of privacy concerns and perceived value. To test the research hypotheses, the author plans to conduct a scenario-based questionnaire survey on CloudResearch members who have experience with video recommender systems. The expected outcome and stage of the research are suggested based on previous discussions.

Nr: 8
Title:

Using Entropy for Community Detection in ComplexNetworks

Authors:

Krista R. Žalik

Abstract: Community detection is a key to understanding the structure of complex networks. Communities are sets of nodes of complex networks, which are densely connected internally. Many community detection approaches have been proposed based on the modularity optimization. Algorithms that optimize one initial solution often get into local optima, but algorithms that simultaneously optimize a population of solutions have high computational complexity. To solve these problems, genetic algorithms (GA) improved by a local learning procedure, named memetic algorithms-MAs, have been proposed. In each generation of new popularity using crossover, mutation and selection, a local search procedure is applied to individuals in the population to obtain better individuals. A node can join to any community of one of its neighbors. For searching the most adequate community for each node, only local node surrounding (direct neighbors) can be evaluated using node entropy. Node entropy can be derived from Shannon entropy as a measure of the node’s similarity to the current community. Node entropy is measurement of uncertainty about the node and natural community. Each node that is not allocated to a community with the greatest part of the node entropy are moved to this community. Node entropy is easy to use to speed up the convergence of an evolutionary algorithm and to increase the quality of partitions, while it uses only the node’s neighborhood and does not require any threshold value. We introduce and use partition entropy instead of modularity function in memetic algorithm to avoid resolution limit and to identify all significant well-separated communities regardless of the community sizes. Partition entropy measures disorder of each community. The more similar the elements of community, more ordered is community and less entropy has community. The greater difference between internal and external edges of community per number of nodes of community shows that the community consists of more similar nodes and that community has less entropy. Partition entropy measure tries to find a partition with low entropy of communities while keeping in mind the modularity of partition. Experiments on real-world and synthetic networks illustrate that using node and partition entropy in memetic algorithm can find natural partitions effectively. Centrality based methods are also very spread for community detection. First, they identify the most important nodes-seeds, which are centers of real communities. And then they expand seeds to strong connected neighbor nodes in more iterations until all nodes belong to communities. Choosing the right vertices as seeds of communities is crucial in determining the real communities. One of the most important problems in complex networks is the location of essential nodes which have a main role within the network and are seeds of communities. We propose a new density-based entropy centrality for seed selection. It measures entropy of the sum of sizes of maximal cliques to which each node and its neighbor node belong. It is a local measure for explaining the local influence of each node, which provides efficient use for community detection while communities are local structures. It can be computed independently for individual vertices or for large networks and for not well-specified networks. The use of the proposed entropy centrality for seed selection outperforms other centrality measures.

Nr: 9
Title:

A Unified Approach to Multi-field Dataset Search by using Contextual Link Prediction

Authors:

Hung-Nghiep Tran and Atsuhiro Takasu

Abstract: BACKGROUND: Dataset search has become an important task due to increasing value of data. Each "dataset", for example, government statistic dataset or scientific dataset, usually includes title, description, and data table, that are characterized by their multi-field, multi-modal, and structured content. The dataset search task is similar to traditional document search but it needs to handle such complex data. Traditional search methods can be categorized in to full-text search such as BM25 and neural search approaches. Recently, neural search has achieved better result and become more popular with cross-encoder and dual-encoder architectures. Because of better scalability, the current standard architecture is dual-encoder, that uses large transformers models to encode the "query" and the "search item" as embedding vectors, then compute their dot product to measure relevance. However, the dual-encoder architecture has several limitations in the case of dataset search. First, it cannot directly handle complex data such as multi-field data. Second, it usually requires heavy training and fine-tuning, but training data is usually small in dataset search. In addition, the accuracy of dual-encoder needs to be improved in general. PROPOSED METHOD: To address the limitations of dual-encoder, we propose the contextual link prediction (CLP) architecture that puts a relational mapping module on top of the encoding module. In this architecture, the encoding module can be reused between datasets to minimize training and fine-tuning requirements. The relational mapping module can be trained and fine-tuned more efficiently and can be enhanced with richer mappings to improve accuracy. Most importantly, the relational mapping module enables handling of complex data such as multi-field data. KEY TECHNIQUES: The relational mapping module is based on the light-weight mapping operation from knowledge graph embedding method. It is used to map "query embedding" to relevant "dataset embedding". It can also be used in certain way to map the "field embedding" to the "dataset embedding" and enable handling of multi-field data. In particular, we propose the multi-field relational-fusion method to compose the "dataset embedding" from the "field embedding". In this method, each field embedding is mapped by a field-specific relational mapping, then the mapped embedding are summed to get the dataset embedding. This is simple but expressive, in the sense that it preserves the information of all field embeddings, because it is equivalent to concatenate all field embeddings and apply a large relational mapping. TRAINING: To train this model, we treat the search task as a link prediction task and use the knowledge graph embedding training objective. The encoding module and the relational mapping module are trained together end-to-end. Our key insight is the information retrieval problem can be treated as a graph modeling problem. PRELIMINARY RESULT: An early version of this model was evaluated on the NTCIR-15 dataset search competition. The data contains US and Japan government statistic datasets with over one million "dataset" items. The test set contains 96 queries for the US and 96 queries for the Japan datasets. This model achieved promising result with the best performance on the average metric, outperforming several popular and strong baselines such as BM25 full-text search and BERT cross-encoder model.