Project | Proceedings | Arxiv | Supplementary materials
[ICML 2023] Official pytorch implementation of the paper: "SinDDM: A Single Image Denoising Diffusion Model"
With SinDDM, one can train a generative model from a single natural image, and then generate random samples from the given image, for example:
SinDDM can also be used for a line of image manipulation tasks, especially image manipluations guided by text, for example:
See section 4 in our paper for more details about our results and experiments.
If you use this code for your research, please cite our paper:
@inproceedings{kulikov2023sinddm,
title={Sinddm: A single image denoising diffusion model},
author={Kulikov, Vladimir and Yadin, Shahar and Kleiner, Matan and Michaeli, Tomer},
booktitle={International Conference on Machine Learning},
pages={17920--17930},
year={2023},
organization={PMLR}
}
- Random Samples from a Single Example
- SinDDM's Applications
- Citation
- Requirements
- Repository Structure
- Usage Examples
- Data and Pretrained Models
- Sources
python -m pip install -r requirements.txt
This code was tested with python 3.8 and torch 1.13.
├── SinDDM - training and inference code
├── clip - clip model code
├── datasets - the images used in the paper
├── imgs - images used in this repository readme.md file
├── results - pre-trained models
├── text2live_util - code for editing via text, based on text2live code
└── main.py - main python file for initiate model training and for model inference
To train a SinDDM model on your own image e.g. <training_image.png>
, put the desired training image under ./datasets/<training_image>/
, and run
python main.py --scope <training_image> --mode train --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/
This code will also generate random samples starting from the coarsest scale (s=0) of the trained model.
To generate random samples, please first train a SinDDM model on the desired image (as described above) or use a provided pretrained model, then run
python main.py --scope <training_image> --mode sample --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
To sample images in arbitrary sizes, one can add --scale_mul <y> <x>
argument to generate an image that is <y>
times as high and <x>
times as wide as the original image.
To guide the generation to create new content using a given text prompt <text_prompt>
, run
python main.py --scope <training_image> --mode clip_content --clip_text <text_prompt> --strength <s> --fill_factor <f> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
Where strength and fill_factor are the required controllable parameters explained in the paper.
To guide the generation to create a new style for the image using a given text prompt <style_prompt>
, run
python main.py --scope <training_image> --mode clip_style_gen --clip_text <style_prompt> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
Note: One can add the --scale_mul <y> <x>
argument to generate an arbitrary size sample with the given style.
To create a new style for a given image, without changing the original image global structure, run
python main.py --scope <training_image> --mode clip_style_trans --clip_text <text_style> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
To modify an image in a specified ROI (Region Of Interest) with a given text prompt <text_prompt>
, run
python main.py --scope <training_image> --mode clip_roi --clip_text <text_prompt> --strength <s> --fill_factor <f> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
Note: A Graphical prompt will open. The user need to select a ROI within the displayed image.
Here, the user can mark a specific training image ROI and choose where it should appear in the generated samples. If roi_n_tar is passed then the user will be able to choose several _target locations.
python main.py --scope <training_image> --mode roi --roi_n_tar <n__targets> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
A graphical prompt will open and allow the user to choose a ROI from the training image. Then, the user need to choose where it should appear in the resulting samples.
Here as well, one can generate an image with arbitrary shapes using --scale_mul <y> <x>
To harmonize a pasted object into an image, place a naively pasted reference image and the selected mask into ./datasets/<training_image>/i2i/
and run
python main.py --scope <training_image> --mode harmonization --harm_mask <mask_name> --input_image <naively_pasted_image> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
To transfer the style of the training image to a content image, place the content image into ./datasets/<training_image>/i2i/
and run
python main.py --scope <training_image> --mode style_transfer --input_image <content_image> --dataset_folder ./datasets/<training_image>/ --image_name <training_image.png> --results_folder ./results/ --load_milestone 12
We provide several pre-trained models for you to use under ./results/
directory. More models will be available soon.
We provide all the training images we used in our paper under the ./datasets/
directory. All the images we provide are in the dimensions we used for training and are in .png format.
The DDPM code was adapted from the following pytorch implementation of DDPM.
The modified CLIP model as well as most of the code in ./text2live_util/
directory was taken from the official Text2live repository.