Hello,

I have a tensor of the shape `(batch_size, bag_size, num_channels, height, width)`

for multiple instance learning. Where each batch consists of a certain number of bags (batch size), each bag consists of a certain number of images (bag size).

I would like to take that tensor, and stack all of the images from every bag along one dimension, resulting in a tensor `(batch_size * bag_size, num_channels, height, width)`

, without mixing the data. It seems that `view`

is not be the right tool here?

```
H = cnn_model(x.view(-1, num_channels, width, height)) # x of shape (batch_size, bag_size, num_channels, height, width)
```

When I compute the loss as a simple sum of outputs, weighting just with one bag in the batch, while zero weighting the rest of the bags. Then calling `backward()`

, the resulting gradients are still with respect to every input. What would be the solution, how could I otherwise restack `(batch_size, bag_size, num_channels, height, width)`

into `(batch_size * bag_size, num_channels, height, width)`

?

```
weights = torch.zeros(H.size())
weights.data[1,:] = 1
loss = torch.dot(weights.view(-1), H.view(-1))
loss.backward()
```