Dataloader generate sequence of images of FashionMNIST

Hi All,

I have to implement a model that uses videos as inputs (so sequence of images).
However, before using the actual data, I am supposed to test the model with the FashionMNIST dataset.
But since the FashionMNIST dataset only contains single images, I need to generate the sequences. Does anyone know how to adjust the Dataloader in such a way that it does return a list of images/tensors and not only single images? So let’s say my sequence length is 10 and my batch_size 16, I think then my Dataloader should return 16 lists of 10 images.

Many thanks in advance!

A simple way would be to set the batch size to 16*10 and reshape the tensors inside the DataLoader loop into the desired shape.
If you need a specific order of images, you could use e.g. a BatchSampler and apply a sliding window approach inside the __getitem__, but I don’t think this is necessary for your test, since the images are unordered.

Thanks a lot! Could you give me some hint how to adjust the DataLoader accordingly? I am new to PyTorch and couldn’t find any matching information.

This should work:

loader = DataLoader(dataset, batch_size=16*10)
for data, target in loader:
    data = data.view(16, 10, 1, 28, 28) # in case you want to use 5-dimensional tensors as the input

Thanks a lot! I was able to adjust it.
However, I am still running into an error. I am using a CNN LSTM and I want to return the output for every sequence in the LSTM. When trying to do that, I get the following error:

Expected target size (20, 20), got torch.Size([20, 10])

This error is raised in the criterion when the model output shapes do not match the expected target shape.
Based on the error message I assume that a reshaping operation in the model might be wrong and the model output shape is thus also wrong.

Thank you so much! I was able to solve it.
I only got one more question. Currently, my model only works with a fix sequene and batchsize. But when I want to predict values, I will sometimes have a batch_size = 1. Since I am initializing the LSTM with the batch_size, likes this:

model = Combine(input_dim, hidden_dim, layer_dim, output_dim)

where hidden_dim corresponds to the batch_size, I get an error when using a different batch_size since I use the hidden_dim when initializing the hidden and cell state. Is there a way to make this variable so that it works for variable batch_sizes?

You could initialize the stats using the batch size of the current input in the forward pass.
Depending on the memory layout you are using, it should be either x.size(0) or x.size(1).

Thanks for this but I get ‘out of memory’ issue when I use a batch size greater than 64. I have a concatenated CNN-LSTM model where the outputs of the CNN are sent to the LSTM. The inputs to the CNN are batches of images e.g., (64,3,299,299)-> (batch size, number of channels, W,H). However, I want the sequence length of the LSTM to be much greater than the batch size e.g. 600. Since I cant set the batch size to 600 due to ‘out of memory’ error, how can solve this?

I don’t know how you are reshaping the input or activation before feeding it into the nn.LSTM, but note that the sequence length is defined in the temporal dimension not the batch dimension.
I.e. by default the nn.LSTM module expects an input in the shape [seq_len, batch_size, nb_features], while it seems you are trying to increase the batch size and are running out of memory.

Okay. so If I get you right, I should model the sequence of images into 5-dimension dataloaders before passing into the CNN i.e., (batch_size, seq_len, C,W,H). That way, the output of the CNN (batch_size, seq_len, cnn_output_features) can easily be structured into the shape `[seq_len, batch_size, nb_features]. Am I correct?

That might be correct. However, I’m currently unsure how your actual model looks like. I assume you are using a loop to create the output features of the CNN, concatenate these activations, and pass it afterwards to the RNN?

Hello @ptrblck, Thanks so much for this. I think I am implementing it differently and may be that’s why I get an increasing loss instead. So will you have a separate loss for the CNN and RNN? can you please demonstrate what you are saying with code. Here is my model. Note: it is an image regression problem:

def CNN(num_outputs):    
    preNet = models.inception_v3(pretrained=True,aux_logits=False)
    num_ftrs = preNet.fc.in_features
    preNet.fc = nn.Linear(num_ftrs, num_outputs)
    return preNet
class BiGRU(nn.Module):
    def __init__(self, input_features, hidden_size,num_layers,output_dim,seq_len,batch_size):
        super(BiGRU, self).__init__()        
        self.cnn = CNN(input_features)         
        self.gru = nn.GRU(input_features, hidden_size, num_layers=num_layers, bidirectional=True)        
        self.out1 = nn.Linear(hidden_size*2, output_dim)      
    def forward(self, input):       
        input1 = self.cnn(input.view(batch_size*seq_len,C,H,W))               
        output, hidden = self.gru(input1.view(seq_len,-1,input_features))         
        pred = self.out1(output.view(-1,seq_len,hidden_size*2)) 
        return pred

model = BiGRU(input_features, hidden_size,num_layers,output_dim,seq_len,batch_size)
loss = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

So the input to the model has the following shape: (batch_size, seq_len, C,H,W) and output is (batch_size, seq_len, output_dim). But the model is not learning as loss and error keep increasing.

Below is my Dataset code to create a sliding sequence of images and the sequence of images are later grouped into batches using :, batch_size = batch_size,shuffle = False,drop_last = False), to produce (batch_size, seq_len, C,H,W) dataloaders.

class CustomImageDataset(Dataset):
    def __init__(self, img_labels, img_dir, seq_length, transform=None):
        self.img_labels = img_labels
        self.img_dir = img_dir
        self.seq_length = seq_length
        self.transform = transform

    def __len__(self):
        return len(self.img_labels)-self.seq_length

    def __getitem__(self, idx):
        start = idx
        end = idx + self.seq_length
        print('Getting images from {} to {}'.format(start, end))
        indices = list(range(start, end))
        images = []
        labels = []
        for i in indices:
            img_path = os.path.join(self.img_dir, self.img_labels.iloc[i, 0])
            image = read_image(img_path)
            if self.transform:
                image = self.transform(image)
            labels.append(torch.tensor(self.img_labels.iloc[i, 1:]))
        x = torch.stack(images)
        y = torch.stack(labels)
        return x, y

I think the view operations are wrong, as they would interleave the tensor:

        output, hidden = self.gru(input1.view(seq_len,-1,input_features))         
        pred = self.out1(output.view(-1,seq_len,hidden_size*2)) 

as shown in this code snippet:

batch_size, seq_len, c, h, w = 2, 3, 4, 2, 2
x = torch.zeros(batch_size, seq_len, c, h, w)
# set batch1 to ones
x[1] = 1.

y = x.view(seq_len, batch_size, -1)
print(y[:, 1]) # not all ones, as y is interleaved

z = x.permute(1, 0, 2, 3, 4).view(seq_len, batch_size, -1)
print(z[:, 1]) # all ones

@ptrblck wow! I clearly see. Let me try using the permute function and see if training improves. What about my implementation of CNN and RNN as a single model and the Custom Dataset generator? Do they make sense or will you suggest a different implementation? because from your previous suggestion, you train CNN separately from the RNN but I am confused how you will implement the loss for the CNN and RNN.

I think your approach is training these modules end-to-end makes sense and I wouldn’t try to train them separately.

@ptrblck So looking at what you proposed, it works for formatting a 5-dimension input into a 3-dimension for the RNN. However, my architecture has a CNN model before the RNN that can only take 4-dimension inputs and not 5 dimensions. Also, the CNN outputs a 2-dimension output, which is the input to the RNN. So, how do I use the permute operation to format the 2-dimension CNN output into 3-dimension input to the RNN or how do I use the permute operation in my model?

For example,

output_features = 100 #number of output features of CNN
batch_size, seq_len, c, h, w = 2, 3, 4, 2, 2
x = torch.zeros(batch_size, seq_len, c, h, w)

input1 = self.cnn( x.view(batch_size * seq_len, c, h , w) )  # input1 shape = ( 6, 100)

output, hidden = self.gru(input1.permute ?????

In your example you are already flattening the batch and temporal dimensions into the batch dimension before feeding the input to the model and would need to revert this operation before permuting the output and feeding it to the RNN.

Okay thanks. I will still love to see how to use a for loop to concatenate the output of CNN for input into the RNN as you first suggested. Can you kindly provide a simple code of how the training will look like especially the loss and optimisers. Will love to compare performance with my model because my model still doesnt seem to be training well.

The proposed for loop would execute each time step (slicing the input in dim1) in the CNN, which would be a workaround assuming you would run out of memory otherwise. Since your flattening operation works fine, I would prefer your approach and there won’t be a difference except my for loop approach being slower.