Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. In this paper, we address the problem of identifying a densest subgraph, while ensuring that none of one binary protected attribute is disparately impacted. Unfortunately, the underlying algorithmic problem is NP-hard, even in its approximation version: approximating the densest fair subgraph with a polynomial-time algorithm is at least as hard as the densest subgraph problem of at most k vertices, for which no constant approximation algorithms are known. Despite such negative premises, we are able to provide approximation results in two important cases. In particular, we are able to prove that a suitable spectral embedding allows recovery of an almost optimal, fair, dense subgraph hidden in the input data, whenever one is present, a result that is further supported by experimental evidence. We also show a polynomial-time, $2$-approximation algorithm, whenever the underlying graph is itself fair. We finally prove that, under the small set expansion hypothesis, this result is tight for fair graphs. The above theoretical findings drive the design of heuristics, which we experimentally evaluate on a scenario based on real data, in which our aim is to strike a good balance between diversity and highly correlated items from Amazon co-purchasing graphs.
2020, International Conference on Information and Knowledge Management, Proceedings, Pages 35-44
Spectral Relaxations and Fair Densest Subgraphs (04b Atto di convegno in volume)
Anagnostopoulos A., Becchetti L., Fazzone A., Menghini C., Schwiegelshohn C.
Gruppo di ricerca: Algorithms and Data Science