ControlNet
What is ControlNet
If you have been using Stable diffusion for a while, you would have known how difficult it is to generate images with precise composition or desired poses. ControlNet a Stable diffusion model lets users control how placement and appearance of images are generated.
In this guide, we will learn how to install and use ControlNet models in Automatic1111.
Install ControlNet in Automatic1111
Below are the steps to install ControlNet in Automatic1111 stable-diffusion-webui.
Navigate to the
Extensions
tab in Automatic1111Click
Install from URL
tab, copy and paste the below URL to "URL for extension's git repository".
https://github.com/Mikubill/sd-webui-controlnet.git
Press Install
button.
We have observed that after you click install, you may not see any progress bar. Wait for few seconds/minutes till you see this message.
Installed into /home/stable-diffusion-webui/extensions/sd-webui-controlnet. Use Installed tab to restart.
.
Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI".
If the extension is succesfully installed, you will see a collapsible section in the txt2img tab by name ControlNet right above the Script drop-down menu.
By clicking on the expansion, it looks as below.
Lets download few ControlNet model weights, to create some awesome images in Automatic1111.
Download ControlNet model weights
We have listed the 13 ControlNet models available in Hugging Face below with their links. You can copy the link to the required model and use wget to download the model weights.
You can also check here for any updated models.
- To download the weights, use the following command.
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
- After downloading the weights, use the following command to move them to the required Automatic1111 folder.
mv *.pth /home/stable-diffusion-webui/extensions/sd-webui-controlnet/models/
You are all set to use the ControlNet extension in the Automatic1111 webui.
Using ControlNet – a simple example
Now ControlNet is installed and required weights are downloaded and placed them in the right path.Let's go through a simple example of generating image using canny edge.
You should have the ControlNet extension installed to follow this section. You can verify by seeing the ControlNet section below.
Text-to-image settings:
ControlNet need to be used with a Stable Diffusion model. In Stable Diffusion checkpoint
dropdown menu select the model you want to use with ControlNet.
Select deliberate_v2.safetensors
to select Deliberate model.
In txt2img tab, pass the prompt and negitive prompt(optional) to be used by ControlNet.Below are the prompts i will be using.
Prompt:
a cute cat in the garden,a masterpiece
Negitive Prompt:
disfigured,ugly
Set the image settings like height, width and other settings.
ControlNet settings:
Now, lets move on to the ControlNet settings.Upload the image in the image canvas
.
Check ✅ the Enable checkbox
, Pixel Perfect
, Allow preview
.
Select a specific model in the Control Type, the corresponding model and its related Preprocessors will be automatically filtered and loaded into the respective dropdown, along with other default setting options.
By clicking on Run Preprocessor
💥 , you will be able to see a preview of the input image after it has been processed.
Now, Click on the Generate
button to start generating images using the ControlNet.
Finally, the GUI looks as below.
When you are done, uncheck the Enable
checkbox to disable the ControlNet extension
Canny
Canny, a classic edge detector, utilizes multi-stage algorithms to extract outlines from input images.It performs preprocessing to maintain the original composition of the input image. It then generates an output image that includes the corresponding outlines.
Depth
Depth, preprocess an input to a grayscale image with black representing deep areas and white representing shallow areas.
There are multiple preprocessors avaialble in depth model.
- depth_midas
- depth_leres
- depth_leres++
- depth_zoe
Below are the images that have used the depth preprocessors to generate a woman cop image based on the given prompt, with the input image being 'milkman'.
OpenPose
Openpose detects human key points like head,shoulders,legs,etc.It is useful for copying the human poses.In simple it is the skeleton view of the image.
The OpenPose preprocessors
- OpenPose
- OpenPose_face
- OpenPose_faceonly
- OpenPose_hand
- OpenPose_full
OpenPose
OpenPose serves as the base preprocessor that detects various human body parts such as the head, hands, legs, nose, ears, knees, ankles, shoulders, etc., from the provided input image.
It then generates an output image with the same pose.
OpenPose_face
OpenPose_face performs all the essential functions of the base preprocessor and extends its capabilities by detecting facial expressions.
OpenPose_faceonly
OpenPose_faceonly specializes in detecting facial expressions while excluding other keypoints. This feature is particularly useful for capturing and replicating facial expressions.
OpenPose_hand
In addition to base preprocessor, OpenPose_hand detects the key points of hands and fingers.
OpenPose_full
OpenPose_full detects everything OpenPose_face and OpenPose_hand do.
MLSD
MLSD is a stright-line detector used to detect stright lines and edges. This preprocessor is particularly useful for architectural elements like room interiors, streets, frames,etc.Any curves will be ignored.
Below is the image that have used the MLSD preprocessors to generate a classic room
image with input image of studyroom.
Scribble
Scribble preprocessors truns the image to a scribble, similar to the one drawn by hand.
The avaialble scribble preprocessors are
- Scribble HED : This is base preprocessor and suitable for recoloring and restyling of image
- Scribble Pidinet: This preprocessor detects curves and stright edges in addtion to the base preprocessor and results more clear lines with details.
- Scribble XDoG: This is an edge detection technique.Need to tune the XDoG threhold and observe the output.
Below are the images that have been generated using scribble multiple Preprocessors to generate the images.
Seg
Segmentation labels the type of objects in the input image.These are used to replicate the shape of objects in the generated images.
The avaialble Segmentation preprocessors are
- seg_ofade20k
- seg_ofcoco
- seg_ufade20k
Below are the images that have been generated using the Seg Preprocessor.
Normal
Normal map provides the information about the orientation of a surface in the image.Image pixels represtes the direction in which the suface is facing instead of color values.This is used to replicate the 3d compositionof the given image.
The Normal map preprocessors
- Normal Midas: The Midas preprocessor is good for isolating subject form the background.
- Normal Bae: The Bae preprocessor is used to render details of both subject and background.
Below are the images that have been generated using the Normal map Preprocessor.
Lineart
Lineart analyses the image and genearte a black and white sketch that appears as scanned images.
The Lineart preprocessors are
- lineart_anime: Analyse the anime images and allows colouring on top of it.
- lineart_anime_denoise: Analyse the anime images with more image details.
- lineart_realistic: Analyse the images with realistic-style lines.
- lineart_coarse: Analyse the images with realistic-style lines and more image details.
Below are the images that have been generated using