Vision Transformer for image classification

John4 · January 24, 2023, 10:50am

Hello! I’m implementing a ViT to be applied to (76,50,50,116) images where the possible classes are [0,1,2,3]. When I run the script with a reduced dataset (ex. 5 images) it works, but when I use the entire dataset the “IndexError: too many indices for tensor of dimension 1” appears in the following part:

class MyMSA(nn.Module):
 def __init__(self, dim, n_heads=2):
    super(MyMSA, self).__init__()
    self.dim = dim
    self.n_heads = n_heads
    assert dim % n_heads == 0, f"Can't divide dimension {dim} into {n_heads} heads"
    d_heads = int(dim / n_heads)
    self.q_map = nn.ModuleList([nn.Linear(d_heads, d_heads) for _ in range(self.n_heads)])
    self.k_map = nn.ModuleList([nn.Linear(d_heads, d_heads) for _ in range(self.n_heads)])
    self.v_map = nn.ModuleList([nn.Linear(d_heads, d_heads) for _ in range(self.n_heads)])
    self.d_heads = d_heads
    self.softmax = nn.Softmax(dim=-1)

 def forward(self, sequences):
    result = []
    for sequence in sequences:
        seq_result = []
        for head in range(self.n_heads):
            q_map = self.q_map[head]
            k_map = self.k_map[head]
            v_map = self.v_map[head]
            seq = sequence[:, head * self.d_heads: (head + 1) * self.d_heads]  # Error here
            q, k, v = q_map(seq), k_map(seq), v_map(seq)
            attention = self.softmax(q @ k.T / (self.d_heads ** 0.5))
            seq_result.append(attention @ v)
        result.append(torch.hstack(seq_result))
    return torch.cat([torch.unsqueeze(r, dim=0) for r in result])

I use a batch size of 4. Does anyone have an idea of the cause and how to solve it?

ptrblck · January 25, 2023, 1:27am

Could you check which line of code raises the error and post the shapes of all used tensors, please?

John4 · January 25, 2023, 8:46am

The error is at:

seq = sequence[:, head * self.d_heads: (head + 1) * self.d_heads]

The images are(76, 116, 50, 50), the labels are (76,). During the training loop the images and the labels become (4, 116, 50, 50) and (4,) respectively, while y_predicted is (4, 4). k, q, v and seq are (2501,2), sequence is (2501,4).
As loss I used CELoss so I transformed the labels this way:

labelSet = torch.from_numpy(labels['a']).squeeze().long() - 2  # from (2,3,4,5) to (0,1,2,3)

Also, I noticed that when I use for example 8 images the error appears, while with 12 images it works.
To solve I used:

sequence = sequence.reshape(-1, 4)

But then I get RuntimeError: CUDA out of memory

ptrblck · January 25, 2023, 9:00am

Thanks for the posting the line of code. Based on the error message it seems sequence contains a single dimension for unknown reasons during the training.
Also, this sounds a bit weird:

as it seems you are changing the batch size from 76 to 4 at one point. Could you explain if this is expected and how to interpret the change in the number of samples in the batch?

John4 · January 25, 2023, 9:06am

Thanks to you! Actually, the images and the labels are 76 during the data loading (entire dataset), then I used torch.utils.data.DataLoader with a batch_size of 4 so during the training it takes 4 images at a time. I think it sholud be ok. An other thing is that I used 3 convolutional layers in the model to reduce the number of channels from 116 to 10.

ptrblck · January 25, 2023, 9:15am

Ah OK, yes the batch size is expected to be smaller and I misunderstood the post thinking 76 would already be the entire batch size.
Could you print the shape of sequence during the training and see how it changes?

John4 · January 25, 2023, 9:30am

Yes, effectively sequence changes from (2501, 4) to (4,) during the training. I don’t understand the reason.

ptrblck · January 25, 2023, 9:58am

Based on your code you are passing sequences to your model’s forward method and are then iterating this object so check where sequences comes from and how its shape is defined.

John4 · January 25, 2023, 11:08am

Actually, I made a really stupid error by setting the training size not multiple of the batch_size. You helped me to arrive to the critical problem so thank you very much!