I am working with A2C and A3C systems. I have an environment where at each step, I can do several action, each action at his own advantage. I am trying to design a custom loss function where I can vary the advantage (delta) for each datapoint, but I stumble upon some issue when I tried to do that:
import numpy as np import torch import torch.nn as nn import torch.optim as optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.fc = nn.Linear(10, 4) def forward(self, x): x = self.fc(x) x = nn.Softmax(dim=1)(x) return x def custom_loss(delta): def loss(y_pred, y_true): y_pred_clamped = torch.clamp(y_pred, 1e-8, 1 - 1e-8) log_likelihood = y_true * torch.log(y_pred_clamped) return torch.sum(-log_likelihood * delta) return loss batch_size = 32 n_sample = 10000 network = Net() optimizer = optim.Adam(network.parameters(), lr=0.01) x = torch.ones((1,10)) print(network(x), '\n') target = [0.2, 0.4, 0.3, 0.1] for i in range(int(n_sample/batch_size)): optimizer.zero_grad() delta = torch.ones((batch_size)) # Each sample of the dataset has its own delta, but by simplicity I set them all at 1 inputs = torch.ones((batch_size,10)) targets = torch.FloatTensor(batch_size*[target]) outputs = network(inputs) loss = custom_loss(delta)(outputs, targets) loss.backward() optimizer.step() x = torch.ones((1,10), dtype=torch.float) print(network(x))
RuntimeError: The size of tensor a (4) must match the size of tensor b (32) at non-singleton dimension 1
I understand why it doesn’t work, and I understand that it is a weird question. But it would be very convenient for me if something like this could work. I am guessing that the loss function is computed like it was one line at the time (even if I assume it’s parallelized). Maybe it is possible to vectorise the loss part ?
Any help would be appreciated!