Hey everyone!
I’m trying to build a docker container with a small server that I can use to run stable diffusion.
I’ve used the example code from banana.dev as a base and have uploaded my container to runpod.
In the server, I first call a function that initialises the model so it is available as soon as the server is running:
from sanic import Sanic, response
import subprocess
import app as user_src
import torch
# We do the model load-to-GPU step on server startup
# so the model object is available globally for reuse
user_src.init()
# Create the http server app
server = Sanic("my_app")
@server.route('/', methods=["POST"])
def inference(request):
try:
model_inputs = response.json.loads(request.json)
except:
model_inputs = request.json
output = user_src.inference(model_inputs)
return response.json(output)
if __name__ == '__main__':
torch.multiprocessing.set_start_method('spawn', force=True)
server.run(host='0.0.0.0', port=8000, workers=1)
the actual model is defined in the app.py in which first the model is initialised:
def init():
global model
HF_AUTH_TOKEN = os.getenv("HF_AUTH_TOKEN")
# this will substitute the default PNDM scheduler for K-LMS
lms = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
model = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=lms, use_auth_token=HF_AUTH_TOKEN).to('cuda')
and then the inference function uses the global model variable to run inference:
def inference(model_inputs:dict) -> dict:
global model
[...]
# Run the model
with autocast('cuda'):
images = model(
prompt,
width=width,
height=height,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale
)["sample"]
[...]
Every time I’m trying to run inference, I’m getting a RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
error, even though I have a) added the call to use the spawn start method to the server, and b) am (to my understanding) not using multiprocessing. Can you help me understand how CUDA initializes under the hood and how I can fix these errors?
Best and thank you,
Sami