I have a VAE that I want to train but at loss.backward()
, it shows:
RuntimeError: Trying to create tensor with negative dimension -17146298352: [-17146298352]
Here is part of my code:
for epoch in range(NUM_EPOCHS):
torch.cuda.empty_cache()
loop = tqdm(enumerate(train_loader))
for i, (image, x_tensor, y_tensor) in loop:
x_latent = vae.encode(x_tensor.to('cuda').half())
x_latent = x_latent.latent_dist.sample() * vae.config.scaling_factor
y_reconstructed = vae.decode(x_latent, return_dict=False)[0]
loss = loss_fn(y_reconstructed, y_tensor.to('cuda').half())
optimizer.zero_grad()
loss.backward()
optimizer.step()
loop.set_postfix(loss=loss.item())
Both y_reconstructed
and y_tensor
have the same shape [1, 4, 1024, 1024]
This is the output of the loss variable:
tensor(0.4946, device='cuda:0', dtype=torch.float16,
grad_fn=<MseLossBackward0>)
My model is from HF AutoEncoderKL:
vae = AutoencoderKL.from_pretrained('madebyollin/sdxl-vae-fp16-fix', torch_dtype=torch.float16, use_safetensors=True)
vae.decoder.conv_out = torch.nn.Conv2d(128, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
vae.encoder.requires_grad_(False)
vae.to("cuda")