A Proposal of Smooth Interpolation to Optimal Transport for Restoring Biased Data for Algorithmic Fairness
Elena M. De Diego et al.
Abstract
The so‐called algorithmic bias is a hot topic in the decision‐making process based on Artificial Intelligence, especially when demographics, such as gender, age or ethnic origin, come into play. Frequently, the problem is not only in the algorithm itself, but also in the biased data that feed the algorithm, which is just the reflection of the societal bias. Thus, this input given to the algorithm has to be repaired in order to produce unbiased results. As a simple, but frequent case, two different subgroups will be considered: the privileged and the unprivileged groups. Assuming that results should not depend on such a characteristic, the rest of the attributes in each group have to be moved (transported) so that their underlying distribution can be considered similar in both groups. To do this, optimal transport (OT) theory is used to effectively transport the values of the features, excluding the sensitive variable, to the so‐called Wasserstein barycenter of the two distributions conditional on each group. An efficient procedure based on the auction algorithm is adapted to do so. The transportation is made for the data at hand. If new data arrive, then the OT problem has to be solved for the new set, gathering previous and incoming data, which is rather inefficient. Alternatively, an implementation of a smooth interpolation procedure called Extended Total Repair (ExTR) is proposed, which is one of the main contributions of the article. The methodology is applied successfully to both simulated biased data and a real‐world case involving a German credit dataset used for risk assessment prediction.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.