Hi everyone,
I’m currently fine-tuning the RETFound model(ViT for retinal images) for a classification task and encountering a persistent error related to the LayerNorm
layer. Below are the details of my setup and the problem:
Environment:
Platform: Google Colab
Libraries: Using pre-installed versions; unable to upgrade or downgrade versions.
Model Details:
Model: RETFound (a Vision Transformer model)
Final Layers Configuration:
fc_norm
:LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
head
:Linear(in_features=1024, out_features=2, bias=True)
Issue:
I receive the following error during training:
RuntimeError: Given normalized_shape=[1024], expected input with shape [*, 1024], but got input of size[16]
This error occurs at the line where the head
layer is applied:
# Apply fc_norm on the pooled output
outcome = self.fc_norm(x)
print(f"After fc_norm: {outcome.shape}")
# Apply head layer
outcome = self.head(outcome)
Here are what I have tried to debug.
- Tensor Shapes: After applying
fc_norm
, the output tensor shape is[16, 1024]
, which seems correct. I print the tensor shapes at various stages and confirm that the tensor shape is as expected. - Device Check: Both the model and data are confirmed to be on the same CUDA device.
- Initialization: The
head
layer is initialized with:
Parameter containing:
tensor([[ 1.1735e-05, 5.8067e-06, 1.3148e-05, ..., 1.8168e-05,
-1.0532e-05, 9.9731e-06],
[ 1.4166e-05, -1.3691e-05, -1.1060e-05, ..., 9.7297e-06,
5.3208e-05, 1.3818e-05]], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)
This confirms that the head
layer is correctly initialized.
4. LayerNorm Behavior: The error indicates a mismatch between the expected and actual input shape for LayerNorm
. Even though fc_norm
seems to be correctly applied with the shape [16, 1024]
, the issue persists.
Question:
What could be causing the shape mismatch with LayerNorm
given that:
- The
fc_norm
output tensor has the shape[16, 1024]
, which is correct according to the model configuration. - Both the model and data are correctly placed on the same CUDA device.
Any insights or suggestions on resolving this issue would be greatly appreciated.