Sohl-Dickstein used the principles of diffusion to develop an algorithm for genetic modeling. The idea is simple: The algorithm first converts complex images in the training dataset into simple noise—like going from an ink blob to diffuse light blue water—and then teaches the system how to reverse the process, by turn noise into images.
Here’s how it works: First, the algorithm takes an image from the training set. As before, let’s say that each of the million pixels has some value, and we can plot the image as a dot in a million-dimensional space. The algorithm adds some noise to each pixel at each time step, equivalent to the spread of ink after one small time step. As this process continues, the pixel values become less closely related to their values in the original image, and the pixels look more like a simple noise distribution. (The algorithm also nudges each pixel value towards the origin, the zero value on all those axes, at each time step. This nudge prevents pixel values from being too large for computers work easily.)
Do this for all the images in the dataset, and the initial complex distribution of points in million-space (which cannot be easily described and sampled) turns into a simple normal distribution of points around the origin.
“The sequence of transformations turns your very slow data distribution into a big ball of noise,” Sohl-Dickstein said. This “forward process” leaves you with a distribution that you can easily sample from.
Then there’s the machine learning part: Feed your neural network the noisy images obtained from a step forward and train it to predict the less noisy images that came one step earlier. It will make mistakes at first, so you tweak the network parameters to make it perform better. Eventually, the neural network can reliably turn a noisy image, representative of a sample from the simple distribution, all the way into an image representing a sample from the complex distribution.
The trained network is a full-blown generative model. Now you don’t even need an original image to pass on: You have a complete mathematical description of the simple distribution, so you can sample it directly. The neural network can convert this sample – essentially just static – into a final image that is similar to the image in the training data set.
Sohl-Dickstein recalls the first outputs of his diffusion model. “You would turn around and be like, ‘I think the colored blob looks like a truck,'” he said. “I spent so many months of my life staring at different pixel patterns and trying to see structure that I was like, ‘This is way more structured than I’ve ever had.’ I had a lot of fun.”
Imagination of the Future
Sohl-Dickstein published his diffusion model algorithm in 2015, but it was still far behind what GANs could do. Although diffusion models could sample over the entire distribution and not get stuck throwing out only a subset of images, the images looked worse, and the process was much too slow. “I don’t think this was exciting at the time,” Sohl-Dickstein said.
Two students, neither of whom knew Sohl-Dickstein or each other, would have to connect the dots from this early work to today’s diffusion models such as DALL·E 2. Song, a doctoral student at Stanford at the time, the first. . In 2019 he and his advisor published a new method for building generative models that did not estimate the probability distribution of the data (the high-dimensional surface). Instead, he estimated the gradient of the distribution (think of it as the slope of the high-dimensional surface).