DDIM

Overview

Generalize DDPMs from Markovian to non-Markovian forward processes. The training objective is actually the same. This improves:

Generated Quality
Consistency property - if we generate using a different number of steps, we get similar high-level features.
Semantically meaningful image interpolation via latent variable interpolation.

Derivation

Note that the DDPM objective depends only on the marginal distributions $q(\mathbf{x_t} \mid \mathbf{x_0})$ and not $q(\mathbf{x_t} \mid \mathbf{x_0}, ... , \mathbf{x_T})$

We can think of some reformulations of diffusion models:

\[\alpha_{t-1}\left(\frac{x_t - \sigma_t \epsilon}{\alpha_t} \right) + \sqrt{\sigma^2_{t-1} - \eta^2_t}\hat{\epsilon} + \eta_t \epsilon_t\]

In this case, we are predicting the clean data (in parens). Then we are jumping back to noise level $t-1$, by scaling by $\alpha_{t-1}$, and adding two noise terms.

The first noise term represents the noise that existed in $x_t$ (estimated). The second noise term is fresh noise.

The variance of the noise we add is still $\sigma_{t-1}$.

Also, see

https://www.overleaf.com/read/fgrhhpqmtbgm#a55fc4

Last Reviewed 4/30/25