| Abstract: |
In the information age, tabular data often lacks explicit semantic metadata, challenging the inference of its underlying schema. This is a particular challenge when there is no prior information. Existing methodologies often assume perfect data or require supervised training, which limits their applicability in real-world scenarios. The relational database model utilizes functional dependencies (FDs) to support normalization tasks. However, the direct application of strict FDs to real-world data is problematic due to inconsistencies, errors, or missing values. Previous proposals, such as fuzzy functional dependencies (FFDs), have shown weaknesses, including a lack of clear semantics and ambiguous benefits for database design. This article proposes the concept of functional probability (FP), a novel approach for quantifying the probability of existence of a functional dependency between incomplete and uncertain data, for supporting semantic schema inferencing. FP measures the probability that a randomly selected tuple satisfies the functional dependency with respect to the most frequent association observed. Based on Codd’s relational model and Armstrong’s axioms, this methodology allows for inferring a minimal and non-redundant set of FDs, filtering weak candidates using probability thresholds. The method has been evaluated on two tabular datasets, yielding expected results that demonstrate its applicability. This approach overcomes the limitations of strict dependencies, which are binary, and FFDs, which lack clear semantics, offering a robust analysis of data quality and the inference of more realistic and fault-tolerant database structures. |