Yesterday, Stability.ai finally released their Stable Diffusion model to the public. Having played with NightCafe, Midjourney, and getting access to Dall-E 2, I was excited to be able to run this locally on my own computer. I have a AMD Threadripper 2990WX with an AMD Radeon 6750XT video card. I figured this was plenty powerful enough to get local generation so I didn’t have to pay others when I have plenty of horsepower at home eating up electricity as it is. So I am just going to dive in and keep this short and sweet.
Getting all the components
Firstly we need to get all the pieces to the puzzle, there aren’t many, but it does take a bit to get it all figured out and sorted. I encourage you to clone or extract all of this in a central working directory for easy usage.
- Download Anaconda 3
- For Fedora 36, I downloaded it from
https://www.anaconda.com
, just download the linux installer - Once downloaded, go to where it was downloaded and run the installer using the terminal.
- After it is installed, you should have a
~/anaconda3
directory if using the default settings. - Add
~/anaconda3
to your current path or restart your terminal (it tries to add it to your .bashrc or .zshrc at the end of the run).export PATH="~/anaconda3/bin:$PATH"
- You can test if it worked by running
conda
and seeing help output instead of an error.
- For Fedora 36, I downloaded it from
- Clone the Stable Diffusion source code.
- It can be found at https://github.com/CompVis/stable-diffusion
- Go to the HuggingFace repository for the model and downlaod the model
- The model is located at https://huggingface.co/CompVis/stable-diffusion
- You will need to signup for Hugging Face so that you can accept the terms to the repo.
- Once you have signed up and accepted, scroll down and click the repo for stable-diffusion-v-1-4-original
- This link takes you directly to the v1.4 model, keep an eye on the main huggingface page for new versions and update accordingly.
- At the top, hit the
Files and Versions
tab, and you will see what amounts to a Git repository. You can clone it through git, but for quickness, we will just download directly. - Find the file with the
LFS
label next to it, for v1.4 its calledsd-v1-4.ckpt
and click to download it. It is a large file, so make go grab a drink or a snack depending on your download speeds.
There is an Optimized version of the Stable Diffusion txt2img script, that uses only 4gb of ram versus the 8gb+ of the primary script. This script was made by a researcher that had earlier access to the models and scripts. Stability.ai has mentioned that they will be releasing optimized scripts and such that will even run on a raspberry pi, but they have not yet, nor have they released official AMD instructions or support. THe following information is extra if you need this. If you have an Nvidia card, you can skip the AMD stuff, and if you have a 3090, you can even skip the optimized script, but I recommend the optimized version because even though it takes a bit longer (15 seconds versus 7-9 seconds for 6 images), its easier gets us the default 512x512 size or even higher, and works just as well.
- Download the optimized version from the link below.
- For posterity, I did not write any of this, I watched a video on how to set this up on Windows, where a Google Drive with the scripts was linked. The zip had batch files Windows, and used custom Windows paths for the model and inference yaml. I changed this for Linux compatibility
- You can get my optimized version from my github repo with the code changes.
- Github repository: https://github.com/Daegalus/stable-diffusion-optimized
Setting things up the files and structure
For the purposes of examples, we will assume you cloned and downloaded everything into $SD_WORKSPACE
, which can be any folder where you have everything located.
From here we want to navigate to the stable diffusion cloned repository.
cd $SD_WORKSPACE/stable-diffusion
Now we want to copy the model into the correct folder. This assumes you downloaded the ckpt
checkpoint file into $SD_WORKSPACE
, if you cloned it with git, it will probably be under its own subfolder.
mkdir -p models/ldm/stable-diffusion-v1
cp ../sd-v1-4.ckpt models/ldm/stable-diffusion-v1
Now, if you have Nvidia and you chose to not use the optimized third party scripts, you are done and can move onto the next section Running Stable Diffusion
.
Optimized Stable Diffusion
While still in the stable-diffusion folder, we can copy the new scripts to the folder.
cp ../stable-diffusion-optimized/* .
All set here.
Adding AMD ROCm packages
Ok, this was the trickier part, and i had a lot of trial and error but I finally got it all working and shouln’t be too hard.
I am using Fedora 36 (read footnotes for why this over other distributions). So the commands below or for this distro, please look up the appropriate packages for your distribution.
Since I am not super in the AI dev world, I wasn’t sure which components we needed, so I installed all the ROCm packages available. If you find better info, adjust as needed. AMD has official repos on their website.
sudo dnf install rocm-clinfo rocm-comgr rocm-device-libs rocm-opencl rocm-runtime rocm-smi rocminfo
This will install ROCm 5.2.1 as of this writing.
Setting up the rest of the environment
Now it is almost time to get it all hooked up. First thing we need to do is edit the environment.yaml
file
- Change
pytorch
version to1.12.1
- Change
pytorch-lightning
version to1.5.2
- Save and close.
Now we just run the following command to create our environment
conda env create -f environment.yaml
Wait for it to finish. And then finally we need to activate the environment. In general, make sure this is activated when running further commands, as they all need to run in the environment.
conda activate ldm
You can stop here if you are using Nvidia. You can now run low_ram_nvidia.sh
if using the optimized scripts, or scripts/txt2img.py
if not.
Installing AMD PyTorch
We are going to install PyTorch with ROCm 5.1.1 support, which will work fine on the ROCm 5.2.1 we have installed.
Go to https://pytorch.org/get-started/locally/ and select the following comibnation: Stable, Linux, Pip, Python, ROCm 5.1.1. (ROCm can be a different version if reading in the future.)
IMPORTANT: Make sure you are inside your ldm
conda environemnt with conda activate ldm
before running the pytorch install below.
This will give you a command to run, as of this writing, the command is as follows:
// DONT RUN THIS YET, READ ON.
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
But for Fedora, pip3
will give us the wrong results, make sure you change the command to use pip
as follows:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
Wait for it to finish, and we should be almost set, lets verify our setup.
Run the following commands. the conda activate ldm
will create our conda environment ENV variables, so the right python installation inside conda is used.
conda activate ldm
python -c "import torch; print(torch.cuda.is_available())"
This should return True
if everything is working. Otherwise it will show an error or False
.
If you get an error like below, and you are running an AMD graphics card, primarily a 6700XT or 6750XT, we will need to add an env variable for the above test to work.
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
If we do an export HSA_OVERRIDE_GFX_VERSION="10.3.0"
, and rerun the python command above, it should work.
The reasoning behind this is that most consumer cards aren’t officially supported, and the only one that is included in support for ROCm is for the gfx1030
. The 6700 and 6750 are gfx1031
. While compatible, not directly supported, so we force it.
To figure out which gfx
you have, you can run rocminfo | grep gfx
.
This should get you to the point where you can run Stable diffusion successfully.
Running Stable Diffusion
Moment of truth. Run the following commands depending on your setup:
- AMD, optimized scripts:
low_ram.sh --prompt "A happy dog"
- If you see an error about a
gfx1030_20.kdb
, you can ignore it for now. Fedora doesn’t have themiopen
package with them, and the officialrhel9
repo, wouldn’t load for me. This is a warning, it can be ignored.
- If you see an error about a
- AMD, unoptimized scripts:
scripts/text2img.py --prompt "A happy dog"
- You might need to lower the dimensions if you get memory errors. Add
--W 384 --H 384
to your command.
- You might need to lower the dimensions if you get memory errors. Add
- Nvidia, no optimized scripts:
scripts/txt2img.py --prompt "A happy dog"
- Nvidia, with optimized scripts:
low_ram_nvidia.sh --prompt "A happy dog"
Happy generating!
If you find any errors in my commands above, feel free to reach out to me. Contact info is at the top right of the blog.