I am working with A2C and A3C systems. I have an environment where at each step, I can do several action, each action at his own advantage. I am trying to design a custom loss function where I can vary the advantage (delta) for each datapoint, but I stumble upon some issue when I tried to do that:

```
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc = nn.Linear(10, 4)
def forward(self, x):
x = self.fc(x)
x = nn.Softmax(dim=1)(x)
return x
def custom_loss(delta):
def loss(y_pred, y_true):
y_pred_clamped = torch.clamp(y_pred, 1e-8, 1 - 1e-8)
log_likelihood = y_true * torch.log(y_pred_clamped)
return torch.sum(-log_likelihood * delta)
return loss
batch_size = 32
n_sample = 10000
network = Net()
optimizer = optim.Adam(network.parameters(), lr=0.01)
x = torch.ones((1,10))
print(network(x), '\n')
target = [0.2, 0.4, 0.3, 0.1]
for i in range(int(n_sample/batch_size)):
optimizer.zero_grad()
delta = torch.ones((batch_size)) # Each sample of the dataset has its own delta, but by simplicity I set them all at 1
inputs = torch.ones((batch_size,10))
targets = torch.FloatTensor(batch_size*[target])
outputs = network(inputs)
loss = custom_loss(delta)(outputs, targets)
loss.backward()
optimizer.step()
x = torch.ones((1,10), dtype=torch.float)
print(network(x))
```

This returns:

RuntimeError: The size of tensor a (4) must match the size of tensor b (32) at non-singleton dimension 1

I understand why it doesn’t work, and I understand that it is a weird question. But it would be very convenient for me if something like this could work. I am guessing that the loss function is computed like it was one line at the time (even if I assume it’s parallelized). Maybe it is possible to vectorise the loss part ?

Any help would be appreciated!

Thomas,