Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
This Simple Arithmetic Can Optimize Your Main Business Operations
 Originally published in Forbes Deep down, we all know that...
Predictive AI Usually Fails Because It’s Not Usually Valuated
 Originally published in Forbes Why in the world would the...
Panic Over DeepSeek Exposes AI’s Weak Foundation On Hype
 Originally published in Forbes The story about DeepSeek has disrupted...
AI Drives Alphabet’s Moonshot To Save The World’s Electrical Grid
 Originally published in Forbes Note: Ravi Jain, Chief Technology Officer...
SHARE THIS:

4 years ago
Gradient Descent Models Are Kernel Machines (Deep Learning)

 
Originally published in infoproc.blogspot.com, Feb 7, 2021.

This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path.

Roughly speaking, two data points x and x’ are similar, i.e., have large kernel function K(x,x’), if they have similar effects on the model parameters in the gradient descent. With respect to the learning algorithm, x and x’ have similar information content. The learned model y = f(x) matches x to similar data points x_i: the resulting value y is simply a weighted (linear) sum of kernel values K(x,x_i).

This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples. See note added at bottom for more on this point, re: AGI, etc. Given the complexity (e.g., dimensionality) of the ground truth model, one can place bounds on the amount of data required for successful training.

This formulation locates the nonlinearity of deep learning models in the kernel function. The superposition of kernels is entirely linear as long as the loss function is additive over training data.

To continue reading this article, click here.

One thought on “Gradient Descent Models Are Kernel Machines (Deep Learning)

  1. Pingback: Gradient Descent Models Are Kernel Machines (Deep Learning) « Machine Learning Times – NikolaNews

Leave a Reply