I’m trying to build a docker container with a small server that I can use to run stable diffusion.
I’ve used the example code from banana.dev as a base and have uploaded my container to runpod.
In the server, I first call a function that initialises the model so it is available as soon as the server is running:
from sanic import Sanic, response import subprocess import app as user_src import torch # We do the model load-to-GPU step on server startup # so the model object is available globally for reuse user_src.init() # Create the http server app server = Sanic("my_app") @server.route('/', methods=["POST"]) def inference(request): try: model_inputs = response.json.loads(request.json) except: model_inputs = request.json output = user_src.inference(model_inputs) return response.json(output) if __name__ == '__main__': torch.multiprocessing.set_start_method('spawn', force=True) server.run(host='0.0.0.0', port=8000, workers=1)
the actual model is defined in the app.py in which first the model is initialised:
def init(): global model HF_AUTH_TOKEN = os.getenv("HF_AUTH_TOKEN") # this will substitute the default PNDM scheduler for K-LMS lms = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear") model = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=lms, use_auth_token=HF_AUTH_TOKEN).to('cuda')
and then the inference function uses the global model variable to run inference:
def inference(model_inputs:dict) -> dict: global model [...] # Run the model with autocast('cuda'): images = model( prompt, width=width, height=height, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale )["sample"] [...]
Every time I’m trying to run inference, I’m getting a
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method error, even though I have a) added the call to use the spawn start method to the server, and b) am (to my understanding) not using multiprocessing. Can you help me understand how CUDA initializes under the hood and how I can fix these errors?
Best and thank you,