Adding a penalty of sparsity of the data to the loss function in GAN

I have implemented a GAN to generate graphs (basically I am generating the adjacency matrix). The implementation of the training GAN:

def train_gan(loader, discriminator, generator, optimizer_d, optimizer_g, _noise_dim=500, _epochs=100):
    lossesG = []
    lossesD = []
    for epoch in range(1, _epochs):
        for data in loader:
            data_real = data.to(device)
            #noise:
            data_fake = make_geometric_noise_data(_noise_dim).to(device).detach()
            fake_sample = generator(data_fake).detach()
            out_real = discriminator(data_real).detach()
            fake_sample.batch = torch.tensor([1]).to(device)
            out_fake = discriminator(fake_sample)
            #real: 1, fake: 0
            loss_real = F.binary_cross_entropy_with_logits(out_real, torch.ones_like(out_real))
            loss_fake = F.binary_cross_entropy_with_logits(out_fake, torch.zeros_like(out_fake))
            lossD = (loss_real + loss_fake)/2
            #discriminator.zero_grad()
            lossD.backward()
            optimizer_d.step()
            #train generator
            lossD_fakeSample = discriminator(fake_sample)
            lossG = F.binary_cross_entropy_with_logits(lossD_fakeSample, torch.ones_like(lossD_fakeSample))
            #generator.zero_grad()
            lossG.backward()
            optimizer_g.step()
        lossesG.append(lossG.item())
        lossesD.append(lossD.item())
        print(f'Epoch: {epoch:03d}, LossG: {lossG:.4f}, LossD: {lossD:.4f}')

The real data’s adjacency matrix has low sparsity, however, the generated matrices are not sparse. I want to add a penalty for large sparsity:

sparsity_fake = find_sparsity(fake_sample)
sparsity_real = find_sparsity(data_real)
criterion(torch.tensor([sparsity_real]), torch.tensor([sparsity_fake]))
criterion = nn.CrossEntropyLoss()

However, when I use this sparsity in the loss function (lossG += sparsity_loss ), I get this error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

You are rewrapping the outputs into new tensors which will break the computation graph. Use the output tensors directly instead and make sure both tenors are attached to a computation graph and show a valid .grad_fn attribute.

Thanks! I solved the issue. However, I am not seeing any changes in the lossS (sparsity loss). Is my loss detached from the computational graph?

def train_gan(loader, discriminator, generator, optimizer_d, optimizer_g, _noise_dim=100, _epochs=400, lambda_constraint=1):
    for epoch in range(1, _epochs):
        for data in loader:
            data_real = data.to(device)
            #noise:
            data_fake = make_geometric_noise_data(_noise_dim).to(device).detach()
            fake_sample_original = generator(data_fake)
            fake_sample = fake_sample_original.detach()
            out_real = discriminator(data_real).detach()
            fake_sample.batch = torch.tensor([1]).to(device)
            out_fake = discriminator(fake_sample)
            #real: 1, fake: 0
            sparsity_fake = find_sparsity(fake_sample_original)
            sparsity_real = find_sparsity(data_real)
            loss_real = F.binary_cross_entropy_with_logits(out_real, torch.ones_like(out_real))
            loss_fake = F.binary_cross_entropy_with_logits(out_fake, torch.zeros_like(out_fake))
            lossD = (loss_real + loss_fake)/2 
            lossD.backward()
            optimizer_d.step()
            lossD_fakeSample = discriminator(fake_sample)
            lossS = F.binary_cross_entropy_with_logits(torch.tensor([sparsity_real], requires_grad=True) , torch.tensor([sparsity_fake], requires_grad=True))
            lossG = F.binary_cross_entropy_with_logits(lossD_fakeSample, torch.ones_like(lossD_fakeSample))
            lossS.backward()
            lossG.backward()
            optimizer_g.step()

Since sparsity is dependent on the generator network’s parameters, optimizer_g and optimizer_d are sufficient.

Yes, some tensors are still detached from the computation graph.
E.g. out_real since you are explicitly calling .detach() on it:

out_real = discriminator(data_real).detach()

which then creates loss_real as a constant which is irrelevant to the gradient computation.
You are also still rewrapping tensors:

lossS = F.binary_cross_entropy_with_logits(torch.tensor([sparsity_real], requires_grad=True) , torch.tensor([sparsity_fake], requires_grad=True))

which will also detach them from the computation graph as previously explained.

Thanks for your reply @ptrblck. Actually, the “sparsity” is a float. I have to make it a tensor to be able to pass it to the loss function. What is the correct way to be able to use the sparsity (type: float) in my loss function, and make it have gradient enabled (or not getting: AttributeError: 'float' object has no attribute 'backward'?

You won’t be able to use plain Python floating point values, since PyTorch, and in particular Autograd, is only tracking differentiable operations on tensors.
Could you describe how this floating point value was computed?

Sure.

#Graph: torch_geometric.data.data.Data
def find_sparsity(graph):
    return graph.edge_index.shape[1] / (graph.num_nodes ** 2)

Assuming graph.num_nodes is an integer (not a tensor) no computation graph was created and thus no gradients can be computed.
To be able to call backward on a tensor you have to create this tensor from differentiable operations so that Autograd can then compute the corresponding gradients.
If you use plain floats and ints, there won’t be anything Autograd can compute.

I see. Thanks a lot!