Adaptive Algorithms for Data Summarization

Robert Webber, UCSD
10/29, 2025 at 11:10AM-12:00PM in 939 Evans (for in-person talks) and https://berkeley.zoom.us/j/98667278310

Clustering, low-rank approximation, and nonnegative matrix factorization are all summarization problems based on selecting a small number of prototypes from a large data set. We provide a unified formulation of these problems and many others. Natural algorithms for solving these problems include adaptive search, which is a greedy deterministic selection rule, and adaptive sampling, which is a randomized selection rule based on distances to existing prototypes. We evaluate the utility of the two approaches theoretically and empirically, pointing out when each approach is preferable.