Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Supervised learning with distributional inputs is a classic area of machine learning, and recently the two-stage sampling setup has received considerable attention. In this setting, the inputs (which are probability distributions) are not accessible in the learning phase, but only samples thereof. This problem is particularly amenable to kernel-based learning methods, where the distributions or samples are first embedded into a Hilbert space, often using kernel mean embeddings (KMEs), and then a standard kernel method like Support Vector Machines (SVMs) is applied, using a kernel defined on the embedding Hilbert space. The case of distributional regression has received particular attention and there is by now a substantial body of theory available, including learning rates. In contrast, the case of distributional classification is considerably less investigated, despite being relevant for applications like learning-based medical screening or causal learning. Motivated by this, we provide a thorough analysis of classification with distributional inputs in the two-stage sampling setup using SVMs, in particular, establishing consistency and learning rate results. Furthermore, for SVMs using the hinge loss and Gaussian kernels, we formulate a novel variant of an established noise assumption from the binary classification literature, under which we can establish learning rates. Our results are formulated in significant generality, with many results also applicable to learning problems other than classification. Furthermore, some of our technical tools like a new feature space for Gaussian kernels on Hilbert spaces are of independent interest.
