Hi everyone,

I’m currently fine-tuning the RETFound model(ViT for retinal images) for a classification task and encountering a persistent error related to the `LayerNorm`

layer. Below are the details of my setup and the problem:

Environment:

Platform: Google Colab

Libraries: Using pre-installed versions; unable to upgrade or downgrade versions.

Model Details:

Model: RETFound (a Vision Transformer model)

Final Layers Configuration:

`fc_norm`

:`LayerNorm((1024,), eps=1e-06, elementwise_affine=True)`

`head`

:`Linear(in_features=1024, out_features=2, bias=True)`

Issue:

I receive the following error during training:

```
RuntimeError: Given normalized_shape=[1024], expected input with shape [*, 1024], but got input of size[16]
```

This error occurs at the line where the `head`

layer is applied:

```
# Apply fc_norm on the pooled output
outcome = self.fc_norm(x)
print(f"After fc_norm: {outcome.shape}")
# Apply head layer
outcome = self.head(outcome)
```

Here are what I have tried to debug.

- Tensor Shapes: After applying
`fc_norm`

, the output tensor shape is`[16, 1024]`

, which seems correct. I print the tensor shapes at various stages and confirm that the tensor shape is as expected. - Device Check: Both the model and data are confirmed to be on the same CUDA device.
- Initialization: The
`head`

layer is initialized with:

```
Parameter containing:
tensor([[ 1.1735e-05, 5.8067e-06, 1.3148e-05, ..., 1.8168e-05,
-1.0532e-05, 9.9731e-06],
[ 1.4166e-05, -1.3691e-05, -1.1060e-05, ..., 9.7297e-06,
5.3208e-05, 1.3818e-05]], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)
```

This confirms that the `head`

layer is correctly initialized.

4. LayerNorm Behavior: The error indicates a mismatch between the expected and actual input shape for `LayerNorm`

. Even though `fc_norm`

seems to be correctly applied with the shape `[16, 1024]`

, the issue persists.

Question:

What could be causing the shape mismatch with `LayerNorm`

given that:

- The
`fc_norm`

output tensor has the shape`[16, 1024]`

, which is correct according to the model configuration. - Both the model and data are correctly placed on the same CUDA device.

Any insights or suggestions on resolving this issue would be greatly appreciated.