Issues with LayerNorm in Fine-Tuning RETFound Model(ViT) on Google Colab

Hi everyone,

I’m currently fine-tuning the RETFound model(ViT for retinal images) for a classification task and encountering a persistent error related to the LayerNorm layer. Below are the details of my setup and the problem:

Environment:

Platform: Google Colab
Libraries: Using pre-installed versions; unable to upgrade or downgrade versions.

Model Details:
Model: RETFound (a Vision Transformer model)
Final Layers Configuration:

  • fc_norm: LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
  • head: Linear(in_features=1024, out_features=2, bias=True)

Issue:
I receive the following error during training:

RuntimeError: Given normalized_shape=[1024], expected input with shape [*, 1024], but got input of size[16]

This error occurs at the line where the head layer is applied:

# Apply fc_norm on the pooled output
outcome = self.fc_norm(x)
print(f"After fc_norm: {outcome.shape}")
# Apply head layer
outcome = self.head(outcome)

Here are what I have tried to debug.

  1. Tensor Shapes: After applying fc_norm, the output tensor shape is [16, 1024], which seems correct. I print the tensor shapes at various stages and confirm that the tensor shape is as expected.
  2. Device Check: Both the model and data are confirmed to be on the same CUDA device.
  3. Initialization: The head layer is initialized with:
Parameter containing:
tensor([[ 1.1735e-05,  5.8067e-06,  1.3148e-05,  ...,  1.8168e-05,
         -1.0532e-05,  9.9731e-06], 
        [ 1.4166e-05, -1.3691e-05, -1.1060e-05,  ...,  9.7297e-06,
         5.3208e-05,  1.3818e-05]], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)

This confirms that the head layer is correctly initialized.
4. LayerNorm Behavior: The error indicates a mismatch between the expected and actual input shape for LayerNorm. Even though fc_norm seems to be correctly applied with the shape [16, 1024], the issue persists.

Question:
What could be causing the shape mismatch with LayerNorm given that:

  • The fc_norm output tensor has the shape [16, 1024], which is correct according to the model configuration.
  • Both the model and data are correctly placed on the same CUDA device.

Any insights or suggestions on resolving this issue would be greatly appreciated.

I’m struggling… Is it because of library version?

Is the code you are using from GitHub - rmaphoh/RETFound_MAE: RETFound - A foundation model for retinal image? Are you using the official Colab notebook?

Thank you so much for your reply, Tony!

Yes, I am using the code from the GitHub repository rmaphoh/RETFound_MAE.

I have reviewed the official Colab notebook, but I find it quite challenging to navigate. My preference is to write and edit code in Jupyter Notebook. However, the Colab notebook sets up a virtual environment and doesn’t utilize Jupyter for editing, which complicates things for me.

In my attempts to install the required libraries using !pip install in Colab, I’ve encountered compatibility issues with Python 3.10. As a result, I’ve had to use different library versions, which might have contributed to the errors I’m experiencing (I am not confident with this though).

Here’s the code I’m using to load the model and its weights, as outlined in the repository’s README:

import torch
import models_vit
from util.pos_embed import interpolate_pos_embed
from timm.models.layers import trunc_normal_

# Call the model
model = models_vit.__dict__['vit_large_patch16'](
    num_classes=2,
    drop_path_rate=0.2,
    global_pool=True,
)

# Load RETFound weights
checkpoint = torch.load('RETFound_cfp_weights.pth', map_location='cpu')
checkpoint_model = checkpoint['model']
state_dict = model.state_dict()
for k in ['head.weight', 'head.bias']:
    if k in checkpoint_model and checkpoint_model[k].shape != state_dict[k].shape:
        print(f"Removing key {k} from pretrained checkpoint")
        del checkpoint_model[k]

# Interpolate position embedding
interpolate_pos_embed(model, checkpoint_model)

# Load pre-trained model
msg = model.load_state_dict(checkpoint_model, strict=False)

assert set(msg.missing_keys) == {'head.weight', 'head.bias', 'fc_norm.weight', 'fc_norm.bias'}

# Manually initialize fc layer
trunc_normal_(model.head.weight, std=2e-5)

print("Model = %s" % str(model))

Given these challenges, do you think I should abandon using Jupyter Notebook and follow the official Colab notebook closely? I’m new to this setup and I have no idea about how to initialize the head or to adapt the model for different tasks, such as switching from classification to regression.

I appreciate your insights, Tony. It’s comforting to know that I’m not alone in navigating the complexities of Colab notebooks/RETFound.

I found a recent model RETFound-Green. They are using PyTorch 2.2. It is better to use this model if possible. Its model weights is available as described in README.

Certainly! Here’s an edited version of your message:


With RETFound-Green, I didn’t encounter this error. In the preprint paper on RETFound-Green, it mentions that “RETFound-MEH (which is the original) specifically requires a five-year-old version of the Python programming language (version 3.7.5), which is now considered ‘end-of-life’ and no longer officially supported [30].”

Thank you so much, Tony! I’ll proceed with working on the RETFound-Green model.