Is it possible to updated optimizer based on loss from a "lookup table"

chtapodi · September 25, 2021, 7:23pm

I am working on a GAN where the generator outputs a one-hot tensor.
Each index in the tensor has an associated array of characteristics that are not involved with the generator. I am trying to figure out if it is possible to calculate loss based on the sum of these characteristics e.g. sum(characteristic_tensor[where(onehot_tensor>0)]).

My current approach is training one discriminator on onehot tensors, and the other on summed characteristics. My goal is to update the generator based on the loss for both of these discriminators.

The problem I am hitting is that I cannot determine if it is possible to convert the onehot tensor to a summed tensor and still have its graph connected to the generator, or if this is even something that it makes sense to do.

The other approach I can think of is detaching the tensor, summing it, calculating its loss, and then updating the optimizer based on this gradient. However I cannot figure out how to do this if it is possible.

Is what I am trying to do feasible with how autograd works?

#This is how I am trying to update the generator after the discriminators have been trained
    noise = torch.randn(b_size, 100, device=device)
    
    generator.zero_grad()
    
    #generate fake data (b_size,46)
    fake_batch=generator(noise)
    
    #get onehot loss
    true_labels=torch.empty((b_size,), dtype=torch.float, device=device).uniform_(0.9, 1.00)
    
    e_output = e_discriminator(fake_batch).view(-1)
    e_err=criterion(e_output,true_labels)
    
    #get sum loss
    #convert the onhot tensor to a sum of attributes (b_size,30)
    sum_batch=torch.from_numpy(controller.encoded_batch_to_sum(fake_batch.detach().cpu().numpy())).float().to(device)
    
    f_output = f_discriminator(sum_batch).view(-1)
    f_err=criterion(f_output,true_labels)
    
    err=e_err+f_err
    err.backward()
    g_opt.step()

JuanFMontesinos · September 28, 2021, 10:26pm

No, once you call the lookuptable and get a tensor from there it diesconnets as there is no differentiable operation between both.
The only coherent thing you can do is to get that tensor by a multiplication or some sort of attention.
you can set values<0 to zero. Thus, no gradient will flow through there.

Another plausible option is you to impose a loss on the probability of the lookup. The use the lookup and use a layer to combine those characteristics with your output.

chtapodi · September 29, 2021, 6:24pm

Thank you. I was concerned that this would be the case.