Hello,
I have a tensor of the shape (batch_size, bag_size, num_channels, height, width)
for multiple instance learning. Where each batch consists of a certain number of bags (batch size), each bag consists of a certain number of images (bag size).
I would like to take that tensor, and stack all of the images from every bag along one dimension, resulting in a tensor (batch_size * bag_size, num_channels, height, width)
, without mixing the data. It seems that view
is not be the right tool here?
H = cnn_model(x.view(-1, num_channels, width, height)) # x of shape (batch_size, bag_size, num_channels, height, width)
When I compute the loss as a simple sum of outputs, weighting just with one bag in the batch, while zero weighting the rest of the bags. Then calling backward()
, the resulting gradients are still with respect to every input. What would be the solution, how could I otherwise restack (batch_size, bag_size, num_channels, height, width)
into (batch_size * bag_size, num_channels, height, width)
?
weights = torch.zeros(H.size())
weights.data[1,:] = 1
loss = torch.dot(weights.view(-1), H.view(-1))
loss.backward()