I am trying to load qwen model in 8 bit and perform lora. I jave 2x V100 32GB. Can some one provide any idea plz. (sorry to disturb you but @ptrblck can you help)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
the code I am using (I didn’t save the original code I was using but can someone help with modifications in this code so that I can split my model properly and utilize both my gpus)
device = “cuda” if torch.cuda.is_available() else “cpu”
model_id = “Qwen/Qwen2.5-VL-3B-Instruct”
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.float16,
load_in_8bit=True, # Load in 8-bit precision
device_map=“auto”
)
min_pixels = 2242828
max_pixels = 2242828
processor = AutoProcessor.from_pretrained(model_id, min_pixels=min_pixels, max_pixels=max_pixels)
processor.tokenizer.padding_side = “right”
Also my gpu-0 gets used to full and throws cuda out of memory error while gpu-1 remains at 9 GB use only.