RunTimeError: Trying to create tensor with negative dimension at loss.backward()

I have a VAE that I want to train but at loss.backward(), it shows:
RuntimeError: Trying to create tensor with negative dimension -17146298352: [-17146298352]

Here is part of my code:

for epoch in range(NUM_EPOCHS):
    torch.cuda.empty_cache()
    loop = tqdm(enumerate(train_loader))
    for i, (image, x_tensor, y_tensor) in loop:
        x_latent = vae.encode(x_tensor.to('cuda').half())
        x_latent = x_latent.latent_dist.sample() * vae.config.scaling_factor

        y_reconstructed = vae.decode(x_latent, return_dict=False)[0]
        loss = loss_fn(y_reconstructed, y_tensor.to('cuda').half())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        loop.set_postfix(loss=loss.item())

Both y_reconstructed and y_tensor have the same shape [1, 4, 1024, 1024]

This is the output of the loss variable:

tensor(0.4946, device='cuda:0', dtype=torch.float16,
       grad_fn=<MseLossBackward0>)

My model is from HF AutoEncoderKL:

vae = AutoencoderKL.from_pretrained('madebyollin/sdxl-vae-fp16-fix', torch_dtype=torch.float16, use_safetensors=True)
vae.decoder.conv_out = torch.nn.Conv2d(128, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
vae.encoder.requires_grad_(False)
vae.to("cuda")
1 Like

i meet the same question! have you solve it? I am finetuning a vae too

Disabling mixed precision helped me

you mean disabling accelerate’s mixed precision? and load sdxl-vae-fp16-fix in fp16?

disabling accelerate’s mp and loading in fp32 but this is for vae from sd1.5 so dunno if it is the case for sdxl