RunTimeError: Trying to create tensor with negative dimension at loss.backward()

I have a VAE that I want to train but at loss.backward(), it shows:
RuntimeError: Trying to create tensor with negative dimension -17146298352: [-17146298352]

Here is part of my code:

for epoch in range(NUM_EPOCHS):
    torch.cuda.empty_cache()
    loop = tqdm(enumerate(train_loader))
    for i, (image, x_tensor, y_tensor) in loop:
        x_latent = vae.encode(x_tensor.to('cuda').half())
        x_latent = x_latent.latent_dist.sample() * vae.config.scaling_factor

        y_reconstructed = vae.decode(x_latent, return_dict=False)[0]
        loss = loss_fn(y_reconstructed, y_tensor.to('cuda').half())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        loop.set_postfix(loss=loss.item())

Both y_reconstructed and y_tensor have the same shape [1, 4, 1024, 1024]

This is the output of the loss variable:

tensor(0.4946, device='cuda:0', dtype=torch.float16,
       grad_fn=<MseLossBackward0>)

My model is from HF AutoEncoderKL:

vae = AutoencoderKL.from_pretrained('madebyollin/sdxl-vae-fp16-fix', torch_dtype=torch.float16, use_safetensors=True)
vae.decoder.conv_out = torch.nn.Conv2d(128, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
vae.encoder.requires_grad_(False)
vae.to("cuda")
1 Like

i meet the same question! have you solve it? I am finetuning a vae too

Disabling mixed precision helped me

you mean disabling accelerate’s mixed precision? and load sdxl-vae-fp16-fix in fp16?

disabling accelerate’s mp and loading in fp32 but this is for vae from sd1.5 so dunno if it is the case for sdxl

same error, use accelerator give the error