Abstract: |
The main aim of a credit scoring model is the classification of the loan customers into two classes, reliable and
unreliable customers, on the basis of their potential capability to keep up with their repayments. Nowadays,
credit scoring models are increasingly in demand, due to the consumer credit growth. Such models are usually
designed on the basis of the past loan applications and used to evaluate the new ones. Their definition represents
a hard challenge for different reasons, the most important of which is the imbalanced class distribution
of data (i.e., the number of default cases is much smaller than that of the non-default cases), and this reduces
the effectiveness of the most widely used approaches (e.g., neural network, random forests, and so on). The
Linear Dependence Based (LDB) approach proposed in this paper offers a twofold advantage: it evaluates a
new loan application on the basis of the linear dependence of its vector representation in the context of a matrix
composed by the vector representation of the non-default applications history, thus by using only a class
of data, overcoming the imbalanced class distribution issue; furthermore, it does not exploit the defaulting
loans, allowing us to operate in a proactive manner, by addressing also the cold-start problem. We validate our
approach on two real-world datasets characterized by a strong unbalanced distribution of data, by comparing
its performance with that of one of the best state-of-the-art approach: random forests. |