4. Background: Inverse problems
Imaging system
𝒜
𝐱
Ground truth image
𝜼
noise
𝐲
Measurement
• Problem: recover 𝐱 from noisy measurement 𝐲
• Ill-posed: Infinitely many solutions may exist
• We need to know the prior of the data distribution: how should the image look like?
4 / 25
Due to the equivalence between explicit score matching and denoising score matching, one can train the score function in the same manner as training a residual denoiser.
When we train with multiple noise levels, we arrive at an interesting view of diffusion models, that one can view the data noising process as some linear forward SDE, and the data generating process as the corresponding reverse SDE, where the drift function is governed by the score function. Hence, when you want to sample data, you can discretize the reverse SDE and numerically solve it using the pre-trained score function.
On the other hand, we are interested in solving inverse problems. In the inverse problem setting, our aim is to recover the ground truth x from the noisy measurement y, obtained through some integral imaging system A, polluted with the measurement noise eta. The problem is naturally ill-posed, which means that there exists infinitely many solutions to this problem. Hence, in order to correctly specify which one of the solutions is the one that we want, we need to specify the prior of the data distribution, in other words, how the images looks like.
Examples of such inverse problems include inpainting, deconvolution, and compressed sensing MRI.
Let’s consider the following measurement model. Now, given y, what we want is to sample from the posterior distribution p(x|y).
Then, we can approximate the former term with the pre-trained score function, and the latter simply depends on the measurement model, for example, gaussian or poisson. Note that these gradients can be analytically computed as their functional forms are known. We see that for Gaussian, we are essentially performing gradient descent that minimizes the squared l2 norm of the residual, whereas for Poisson measurements, we are minimizing the squared weighted norm of the residual.
Blind inverse problems consider the case where we do not know the forward operator.
Here, we specifically constrain ourselves to cases where we know the functional form of the forward operator, but do not know the parameters.
By far the most widely acknowledged application of such case is blind deconvolution, or blind deblurring problem, where we do not know which blur kernel generated the measurement.
In order to solve this problem with diffusion, let us write the posterior distribution for blind deconvolution. We see that the posterior distribution is proportional to the product of likelihood,
The image prior,
And the kernel prior, and these two factor out as they are independent.
Now in order to perform posterior sampling, we need to specify the prior model for the image distribution, and the kernel distribution. Note that the development in the field of blind deblurring can be attributed to building more efficient priors that estimate the true priors of the distribution. And our choice here is to use the diffusion prior for both the image, and the kernel.
And in order to use the diffusion prior, we can simply train two score functions independently with denoising score matching. For the image score function, we can easily take some pre-trained model. For the kernel score function, the training is relatively easy, as the data distribution is much less complicated, and the size of the vector is relatively small.
And specifically because those two are independent, our proposal is to construct two parallel diffusion processes for both the image, and the kernel.
Now, in order to do posterior sampling, as in the non-blind inverse problem case, we need to incorporate the gradient of the likelihood, which is again intractable.
However, we can again establish a similar theorem as in DPS, where we can approximate the intractable likelihood term by plugging in the denoised estimates.
Now, visualizing what the equation says, let us first visualize what DPS was doing for the non-blind deblurring problem. As you see in the figure, we use the denoised estimate hat x0, and convolve with k_0 to obtain the measurement. We compute the residual with y, and backpropagate the difference to the next iteration.
In this work of solving blind inverse problem, in which we call BlindDDPM, we simply add another diffusion model, and run the two in parallel. In the intermediate stage, we also use the score function for the kernel to estimate the denoised estimate, and convolve the two denoised estimates to compute the residual. Now the gradient update steps are applied to two streams separately.
Now in practice we found that it is more robust to augment the diffusion prior with sparsity promoting ell_0 regularization.
Until now, for the sake of brevity, I focused on my argument on blind deblurring, but the methodology can be extended to any cases where we know the functional form, and the parameter should be estimated. Another such case would be imaging through turbulence, where the forward model can be roughly approximated with tilt, followed by blur.
When we visualize the results, we see that we can achieve state-of-the-art blind reconstruction results even from blind measurements. Note that the reconstruction of the image and the kernel are jointly performed
We also achieve state-of-the-art results when we apply our method to imaging through turbulence.