# Stack tensor onto itself along the dimension

Hello,

I have a tensor of the shape `(batch_size, bag_size, num_channels, height, width)` for multiple instance learning. Where each batch consists of a certain number of bags (batch size), each bag consists of a certain number of images (bag size).

I would like to take that tensor, and stack all of the images from every bag along one dimension, resulting in a tensor `(batch_size * bag_size, num_channels, height, width)`, without mixing the data. It seems that `view` is not be the right tool here?

``````H = cnn_model(x.view(-1, num_channels, width, height)) # x of shape (batch_size, bag_size, num_channels, height, width)
``````

When I compute the loss as a simple sum of outputs, weighting just with one bag in the batch, while zero weighting the rest of the bags. Then calling `backward()`, the resulting gradients are still with respect to every input. What would be the solution, how could I otherwise restack `(batch_size, bag_size, num_channels, height, width)` into `(batch_size * bag_size, num_channels, height, width)`?

``````weights = torch.zeros(H.size())
weights.data[1,:] = 1
loss = torch.dot(weights.view(-1), H.view(-1))
loss.backward()
``````

Since the flattened dimensions are consecutive, the elements won’t be mixed.
Here is a small example to show what the operation is doing:

``````batch_size, bag_size, num_channels, height, width = 2, 3, 2, 2, 2
x = torch.arange(batch_size*bag_size*num_channels*height*width).view(
batch_size, bag_size, num_channels, height, width)
print(x)
x = x.view(-1, num_channels, height, width)
print(x)
``````
1 Like