Mason Wang

SDEdit

Task - guided image synthesis - create/edit images w/ min. effort

previous = CGAN / GAN inversions, requires user annotations/training data or custom loss functions

SDEdit adds noise to the brush strokes, then denoises.

Can do stroke-based editing, or painting-to-image.

CGAN turns image into edited by training GAN inversion turns the image into latent, then edits.

more noise = more realistic, less faithful.

stroke based synthesis: better in terms of realism and satisfication from users

compositing: better faithfulness, better user satisfaction

Experiments

Tasks

Figures

Tables

Baselines:

Metrics

Realism - measured by humans or NN faithfulness - similar to guide, L2 distance User metrics on realism and faithfulness

More Notes

Stroke based image editing:

-theoretical upper bound on guide and generation, based on L2 distance - to have a shot of being realistic need high noise level, to be faithful, noise level should not be too high. bad guides (white image) mean higher noise level needed - since the closest images to the input are quite far. - binary search for t0 based on user preference. - same t0 usually works for all reasonable guides in the same task.

Conditional GANs - trained on original and edited images, data collection, model retraining -

GAN inversion - input is projected into latent space of a GAN, code is modified, image is resynthesized - all need different losses for different tasks.

other generative models - less used for editing, choi 2021 does conditional image synthesis assuming conditions can be measured from the underlying true image.

Other notes

unlike other inverse problems, do not know measurement function (image to sketch function) realism and faithfulness are not positively correlated, there are random realistic images or use of score-based models to solve inverse problems and methods requiring paired datasets do not apply key hyperparameter is t_0

experimental details in appendix