We study fair clustering problems as proposed by Chierichetti et al. [CKLV17]. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable [IMMM14] and show how to compute them in a streaming setting. This yields a streaming PTAS for fair k-means in the case of two colors (and exact balances). Furthermore, we extend techniques due to Chierichetti et al. [CKLV17] to obtain an approximation algorithm for k-means, which leads to a constant factor algorithm in the streaming model when combined with the coreset.
2020, Approximation and Online Algorithms, Pages 232-251 (volume: 11926)
Fair Coresets and Streaming Algorithms for Fair k-means (04b Atto di convegno in volume)
Schmidt M., Schwiegelshohn C., Sohler C.
ISBN: 978-3-030-39478-3; 978-3-030-39479-0
Gruppo di ricerca: Algorithms and Data Science