-By Hayata Yamasaki, Sathyawageeswar Subramanian, Sho Sonoda, Masato Koashi
Abstract
Kernel methods augmented with random features give scalable algorithms for learning from big data. But it has been computationally hard to sample random features according to a probability distribution that is optimized for the data, so as to minimize the required number of features for achieving the learning to the desired accuracy. Here, we develop a quantum algorithm for sampling from this optimized distribution over features, in runtime O(D) that is linear in the dimension D of the input data. Our algorithm achieves an exponential speedup in D compared to any known classical algorithm for this sampling task. In contrast to existing quantum machine learning algorithms, our algorithm circumvents sparsity and low-rank assumptions and thus has wide applicability. We also show that the sampled features can be combined with regression by stochastic gradient descent to achieve the learning without canceling out our exponential speedup. Our algorithm-based on sampling optimized random features leads to an accelerated framework for machine learning that takes advantage of quantum computers.
Random features provide a powerful technique for scaling up kernel methods applicable to various machine learning tasks, such as ridge regression, kernel learning, and principle component analysis. Recently, Bach has shown an optimized probability distribution of random features, and sampling features from this optimized distribution would drastically improve runtime of learning algorithms based on random features. However, this sampling task has been computationally “hard in practice” due to inversion of a high-dimensional operator. In contrast, the power of quantum computers to process data in quantum superposition attracts growing attention towards accelerating learning tasks, opening a new field of quantum machine learning (QML). QML accelerates a supervised learning task, by constructing an efficient quantum algorithm for sampling from this optimized distribution.
Problem:
These conventional methods for obtaining random features from the data-independent distribution dτ (v) require a large number M of features to learn the function f, which slows down the decision of all M features and the regression over M coefficients. Aiming at minimizing M required for the learning, rather than sampling from dτ (v), we will sample features from a probability distribution that puts greater weight on important features that are optimized for the data, via a probability density function q(v) for dτ (v).
Solution
- (Theorem 1) We construct a new quantum algorithm for sampling an optimized random feature from q ∗ (v)dτ (v) in as fast as linear runtime O(D) in the data dimension D. The best existing classical algorithm by Bach for sampling each single feature from this data-optimized distribution q ∗ (v)dτ (v) requires exponential runtime O(exp(D)). In contrast, our quantum algorithm can sample each single feature from q ∗ (v)dτ (v) with runtime O(D), which is as fast as the conventional algorithms such as. These conventional algorithms perform an easier task, i.e. sampling from a data-independent distribution dτ (v). Advantageously over the conventional algorithms sampling from dτ (v), quantum algorithm sampling is used from q ∗ (v)dτ (v) to achieve learning with a significantly small number M of features, which is proven to be minimal up to a logarithmic gap. Remarkably, we achieve this without assuming sparsity or low rank of relevant operators.
- To construct this quantum algorithm, we circumvent the difficulty of infinite dimension by formulating a discrete approximation of the problem of sampling a real-valued feature from q ∗ (v)dτ (v). This approximation is equivalent to using fixed-point number representation with rescaling.
- (Theorem 2) The combined M features sampled by the algorithm with regression by stochastic gradient descent to achieve supervised learning in time O(MD), i.e. without canceling out our exponential speedup. This M is minimal up to a logarithmic gap, since we use optimized random features. Thus, by improving the computational bottleneck faced by classical algorithms for sampling optimized random features, we provide a promising framework of quantum machine learning that leverages our O(D) sampling algorithm to achieve the optimal M among all algorithms using random features.
Comments