Home » Publication » 19886

Dettaglio pubblicazione

2018, Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, Pages 72-80

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams (04b Atto di convegno in volume)

Bury Marc, Schwiegelshohn CHRIS RENE, Sorella Mara

Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to ℓ0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletions
ISBN: 9781450355810
Gruppo di ricerca: Algorithms and Data Science
keywords
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma