Abstract: Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals).
In this talk, I discuss the challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and W) jets in a QCD background. I will show that latent space carries physical information: it encodes the optimal transport distance between different events. This suggests instead using optimal transport distances to representative background events to identify anomalous events and is found to be as efficient at anomaly detection as an autoencoder. For anomaly detection with either autoencoders or optimal transport, we see that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches.