What is ControlNet
If you have been using Stable diffusion for a while, you would have known how difficult it is to generate images with precise composition or desired poses. ControlNet a Stable diffusion model lets users control how placement and appearance of images are generated.
In this guide, we will learn how to install and use ControlNet models in Automatic1111.
Install ControlNet in Automatic1111
Below are the steps to install ControlNet in Automatic1111 stable-diffusion-webui.
Navigate to the
Extensionstab in Automatic1111
Install from URLtab, copy and paste the below URL to "URL for extension's git repository".
We have observed that after you click install, you may not see any progress bar. Wait for few seconds/minutes till you see this message.
Installed into /home/stable-diffusion-webui/extensions/sd-webui-controlnet. Use Installed tab to restart..
Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI".
If the extension is successfully installed, you will see a collapsible section in the txt2img tab by name ControlNet right above the Script drop-down menu.
By clicking on the expansion, it looks as below.
Lets download few ControlNet model weights, to create some awesome images in Automatic1111.
Download ControlNet model weights
We have listed the 13 ControlNet models available in Hugging Face below with their links. You can copy the link to the required model and use wget to download the model weights.
You can also check here for any updated models.
- To download the weights, use the following command.
- After downloading the weights, use the following command to move them to the required Automatic1111 folder.
mv *.pth /home/stable-diffusion-webui/extensions/sd-webui-controlnet/models/
You are all set to use the ControlNet extension in the Automatic1111 webui.
Using ControlNet – a simple example
Now ControlNet is installed and required weights are downloaded and placed them in the right path.Let's go through a simple example of generating image using canny edge.
You should have the ControlNet extension installed to follow this section. You can verify by seeing the ControlNet section below.
ControlNet need to be used with a Stable Diffusion model. In
Stable Diffusion checkpoint dropdown menu select the model you want to use with ControlNet.
deliberate_v2.safetensors to select Deliberate model.
In txt2img tab, pass the prompt and negative prompt(optional) to be used by ControlNet.Below are the prompts i will be using.
a cute cat in the garden,a masterpiece
Set the image settings like height, width and other settings.
Now, lets move on to the ControlNet settings.Upload the image in the
Check ✅ the
Select a specific model in the Control Type, the corresponding model and its related Pre-processors will be automatically filtered and loaded into the respective dropdown, along with other default setting options.
By clicking on
Run Preprocessor 💥 , you will be able to see a preview of the input image after it has been processed.
Now, Click on the
Generate button to start generating images using the ControlNet.
Finally, the GUI looks as below.
When you are done, uncheck the
Enable checkbox to disable the ControlNet extension
Canny, a classic edge detector, utilizes multi-stage algorithms to extract outlines from input images.It performs preprocessing to maintain the original composition of the input image. It then generates an output image that includes the corresponding outlines.
Depth, pre-process an input to a grayscale image with black representing deep areas and white representing shallow areas.
There are multiple preprocessors available in depth model.
Below are the images that have used the depth preprocessors to generate a woman cop image based on the given prompt, with the input image being 'milkman'.
Openpose detects human key points like head,shoulders,legs,etc.It is useful for copying the human poses.In simple it is the skeleton view of the image.
The OpenPose preprocessors
OpenPose serves as the base preprocessor that detects various human body parts such as the head, hands, legs, nose, ears, knees, ankles, shoulders, etc., from the provided input image.
It then generates an output image with the same pose.
OpenPose_face performs all the essential functions of the base preprocessor and extends its capabilities by detecting facial expressions.
OpenPose_faceonly specializes in detecting facial expressions while excluding other key points. This feature is particularly useful for capturing and replicating facial expressions.
In addition to base preprocessor, OpenPose_hand detects the key points of hands and fingers.
OpenPose_full detects everything OpenPose_face and OpenPose_hand do.
MLSD is a straight-line detector used to detect straight lines and edges. This preprocessor is particularly useful for architectural elements like room interiors, streets, frames,etc.Any curves will be ignored.
Below is the image that have used the MLSD preprocessors to generate a
classic room image with input image of study room.
Scribble preprocessors turns the image to a scribble, similar to the one drawn by hand.
The available scribble preprocessors are
- Scribble HED : This is base preprocessor and suitable for recoloring and restyling of image
- Scribble Pidinet: This preprocessor detects curves and straight edges in addition to the base preprocessor and results more clear lines with details.
- Scribble XDoG: This is an edge detection technique.Need to tune the XDoG threshold and observe the output.
Below are the images that have been generated using scribble multiple Pre-processors to generate the images.
Segmentation labels the type of objects in the input image.These are used to replicate the shape of objects in the generated images.
The avaialble Segmentation preprocessors are
Below are the images that have been generated using the Seg Preprocessor.
Normal map provides the information about the orientation of a surface in the image.Image pixels represents the direction in which the surface is facing instead of color values.This is used to replicate the 3d composition of the given image.
The Normal map preprocessors
- Normal Midas: The Midas preprocessor is good for isolating subject form the background.
- Normal Bae: The Bae preprocessor is used to render details of both subject and background.
Below are the images that have been generated using the Normal map Preprocessor.
Lineart analyses the image and generate a black and white sketch that appears as scanned images.
The Lineart preprocessors are
- lineart_anime: Analyze the anime images and allows coloring on top of it.
- lineart_anime_denoise: Analyze the anime images with more image details.
- lineart_realistic: Analyze the images with realistic-style lines.
- lineart_coarse: Analyze the images with realistic-style lines and more image details.
Below are the images that have been generated using