Skip to main content
Seminar | Mathematics and Computer Science

Getting the Most Out of Your Representations

LANS Seminar

Abstract: The goal of source compression is to map any outcome of a discrete random variable $x ∼ p_d(x)$ in a finite symbol space $x ∈ S$ to its shortest possible binary representation. Given a tractable model probability mass function (PMF) $p(x)$ that approximates $p_d(x)$, entropy coders provide such an optimal mapping. As a result, the task of source compression is simplified to identifying a good model PMF for the data at hand. Even though the setup as described is the most commonly used one, there are restrictions to it. Entropy coders can only process one dimensional variables and process them sequentially. Hence the structure of the entropy coder implies a sequential structure of the data. This is a problem when compressing sets instead of sequences.

In the first part of the talk, I present an optimal codec for sets. The problem we encounter for sets can be generalized for many other structural priors in data. In the second part of the talk, I thus investigate the problem. We generalize rate distortion theory for structural data priors and develop a strategy to learn codecs for this data.

Bio: Karen Ullrich is a research scientist at FAIR NY and is actively collaborating with researchers from the Vector Institute and the University of Amsterdam. She completed a PhD under the supervision of Prof. Max Welling. Prior to that, she worked at the Austrian Research Institute for AI, Intelligent Music Processing and Machine Learning Group lead by Prof. Gerhard Widmer. She studied Physics and Numerical Simulations in Leipzig and Amsterdam.