RuntimeError: Error(s) in loading state_dict for SimCLR:

I am trying to replicate SimCLR model with link Google Colab using my dataset and model. When I execute using my model, it gives the error as below. Not able to understand where is the problem.

RuntimeError                              Traceback (most recent call last)
<ipython-input-34-4b3055af976b> in <module>
      4                             temperature=0.07,
      5                             weight_decay=1e-4,
----> 6                             max_epochs=100)

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1603         if len(error_msgs) > 0:
   1604             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1605                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1606         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1607 

RuntimeError: Error(s) in loading state_dict for SimCLR:
	Missing key(s) in state_dict: "convnet.conv1.bias". 
	Unexpected key(s) in state_dict: "convnet.bn1.weight", "convnet.bn1.bias", "convnet.bn1.running_mean", "convnet.bn1.running_var", "convnet.bn1.num_batches_tracked", "convnet.layer1.0.conv1.weight", "convnet.layer1.0.bn1.weight", "convnet.layer1.0.bn1.bias", "convnet.layer1.0.bn1.running_mean", "convnet.layer1.0.bn1.running_var", "convnet.layer1.0.bn1.num_batches_tracked", "convnet.layer1.0.conv2.weight", "convnet.layer1.0.bn2.weight", "convnet.layer1.0.bn2.bias", "convnet.layer1.0.bn2.running_mean", "convnet.layer1.0.bn2.running_var", "convnet.layer1.0.bn2.num_batches_tracked", "convnet.layer1.1.conv1.weight", "convnet.layer1.1.bn1.weight", "convnet.layer1.1.bn1.bias", "convnet.layer1.1.bn1.running_mean", "convnet.layer1.1.bn1.running_var", "convnet.layer1.1.bn1.num_batches_tracked", "convnet.layer1.1.conv2.weight", "convnet.layer1.1.bn2.weight", "convnet.layer1.1.bn2.bias", "convnet.layer1.1.bn2.running_mean", "convnet.layer1.1.bn2.running_var", "convnet.layer1.1.bn2.num_batches_tracked", "convnet.layer2.0.conv1.weight", "convnet.layer2.0.bn1.weight", "convnet.layer2.0.bn1.bias", "convnet.layer2.0.bn1.running_mean", "convnet.layer2.0.bn1.running_var", "convnet.layer2.0.bn1.num_batches_tracked", "convnet.layer2.0.conv2.weight", "convnet.layer2.0.bn2.weight", "convnet.layer2.0.bn2.bias", "convnet.layer2.0.bn2.running_mean", "convnet.layer2.0.bn2.running_var", "convnet.layer2.0.bn2.num_batches_tracked", "convnet.layer2.0.downsample.0.weight", "convnet.layer2.0.downsample.1....
	size mismatch for convnet.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).
	size mismatch for convnet.fc.0.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1, 31488]).
	size mismatch for convnet.fc.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
	size mismatch for convnet.fc.2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 256]).

My model is as:

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)
input_dim = 16

hidden_dim = 100

layer_dim = 1

output_dim = 1

model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)

Below is the simclr class where the model is called

class SimCLR(pl.LightningModule):
    
    def __init__(self, hidden_dim, lr, temperature, weight_decay, max_epochs=100):
        super().__init__()
        self.save_hyperparameters()
        assert self.hparams.temperature > 0.0, 'The temperature must be a positive float!'
        # Base model f(.)
        self.convnet = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
      #  self.convnet = Net()
      #  self.convnet =  torchvision.models.resnet18(pretrained=False, 
      #                                             num_classes=2*hidden_dim)
        # The MLP for g(.) consists of Linear->ReLU->Linear 
        self.convnet.fc = nn.Sequential(
            self.convnet.fc,  # Linear(ResNet output, 4*hidden_dim)
            nn.ReLU(inplace=True),
            nn.Linear(2*hidden_dim, hidden_dim)
        )

Based on the error message it seems you are trying to load a state_dict of a resnet-like model into your custom model.
Make sure you are storing the state_dict from the same model class you want to load it into again to avoid these missing and unexpected key errors.

In the code below, pretrained model is used. Where should I make the change to upload my model’s state dictionary.

def train_simclr(batch_size, max_epochs=100, **kwargs):
    trainer = pl.Trainer(default_root_dir=os.path.join(CHECKPOINT_PATH, 'SimCLR'),
                         gpus=1 if str(device)=='cuda:0' else 0,
                         max_epochs=max_epochs,
                         callbacks=[ModelCheckpoint(save_weights_only=True, mode='max', monitor='val_acc_top5'),
                                    LearningRateMonitor('epoch')],
                         )
    trainer.logger._default_hp_metric = None # Optional logging argument that we don't need

    # Check whether pretrained model exists. If yes, load it and skip training
    pretrained_filename = os.path.join(CHECKPOINT_PATH, 'SimCLR.ckpt')
    if os.path.isfile(pretrained_filename):
        print(f'Found pretrained model at {pretrained_filename}, loading...')
        model = SimCLR.load_from_checkpoint(pretrained_filename) # Automatically loads the model with the saved hyperparameters
    else:
        train_ldr = T.utils.data.DataLoader(X_train, batch_size=batch_size, shuffle=True, 
                                       drop_last=True, pin_memory=True, num_workers=NUM_WORKERS)
        val_ldr = T.utils.data.DataLoader(X_val, batch_size=batch_size, shuffle=False, 
                                     drop_last=False, pin_memory=True, num_workers=NUM_WORKERS)
        pl.seed_everything(42) # To be reproducable
        model = SimCLR(max_epochs=max_epochs, **kwargs)
        trainer.fit(model, train_ldr, val_ldr)
        model = SimCLR.load_from_checkpoint(trainer.checkpoint_callback.best_model_path) # Load best checkpoint after training

    return model

I’m not familiar enough with Lightning and don’t know what exactly:

model = SimCLR.load_from_checkpoint(trainer.checkpoint_callback.best_model_path) # Load best checkpoint after training

is doing. Usually you would just store and load the state_dict form an nn.Module.
E.g. if you’ve created a custom nn.Module called MyModel, this would be the general workflow:

model = MyModel()
# train your model
...

# save state_dict
torch.save(model.state_dict(), path)

# in another script: create a new model object and load the state_dict
model = MyModel()
model.load_state_dict(torch.load(path))

How should I give the path in torch.save(…path) and torch.load(path) such that my model is saved and loaded again in Windows. When I give

 filepath = "model.pt" 
where model = SimCLR() which I want to save and load, it gives error:
FileNotFoundError: [Errno 2] No such file or directory: 'model.pt'

Also,

When I give path as: r'C:\Users\anupa\Desktop\model.pth' 
Error: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\anupa\\Desktop\\model.pth'

torch.save would accept any valid path with a (new) file name. Make sure the directories exist as torch.save will not recursively create them.

Thank you for all the solutions , I have corrected them. A Error occurs while training my model. I understand that the error is due to the shape. But I need to specify the shape as [batch size, row, cols] for my model to train. But after few executions getting the error as:

torch.Size([1, 256, 41])
torch.Size([256, 1, 41])
TRAINING SAMPLES tensor([[[-0.3023, -0.2928, -0.1115,  ..., -0.1473, -0.0746, -0.1150]],

        [[-0.3025, -0.2955, -0.0908,  ..., -0.1473, -0.0655, -0.0479]],

        [[-0.3028, -0.2956, -0.0986,  ..., -0.1473, -0.0746, -0.0624]],

        ...,

        [[-0.3024, -0.2932, -0.1101,  ..., -0.1473, -0.0746, -0.1060]],

        [[-0.3024, -0.2939, -0.1127,  ..., -0.1473, -0.0746, -0.1102]],

        [[-0.3024, -0.2935, -0.1058,  ..., -0.1473, -0.0708, -0.1162]]])
output tensor([[-0.2123, -0.0592, -0.3308,  ...,  0.1562, -0.0809,  0.1073],
        [-0.2112, -0.0593, -0.3305,  ...,  0.1537, -0.0814,  0.1066],
        [-0.2117, -0.0592, -0.3306,  ...,  0.1547, -0.0812,  0.1068],
        ...,
        [-0.2123, -0.0592, -0.3308,  ...,  0.1563, -0.0809,  0.1073],
        [-0.2111, -0.0593, -0.3305,  ...,  0.1534, -0.0815,  0.1065],
        [-0.2014, -0.0610, -0.3274,  ...,  0.1304, -0.0855,  0.1002]],
       grad_fn=<AddmmBackward>)
X tensor([[-0.0189, -0.0053, -0.0294,  ...,  0.0139, -0.0072,  0.0095],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0137, -0.0072,  0.0095],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0138, -0.0072,  0.0095],
        ...,
        [-0.0189, -0.0053, -0.0294,  ...,  0.0139, -0.0072,  0.0096],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0137, -0.0073,  0.0095],
        [-0.0179, -0.0054, -0.0290,  ...,  0.0116, -0.0076,  0.0089]],
       grad_fn=<DivBackward0>)
X.t tensor([[-0.0189, -0.0188, -0.0188,  ..., -0.0189, -0.0188, -0.0179],
        [-0.0053, -0.0053, -0.0053,  ..., -0.0053, -0.0053, -0.0054],
        [-0.0294, -0.0294, -0.0294,  ..., -0.0294, -0.0294, -0.0290],
        ...,
        [ 0.0139,  0.0137,  0.0138,  ...,  0.0139,  0.0137,  0.0116],
        [-0.0072, -0.0072, -0.0072,  ..., -0.0072, -0.0073, -0.0076],
        [ 0.0095,  0.0095,  0.0095,  ...,  0.0096,  0.0095,  0.0089]],
       grad_fn=<TBackward>)
sim_matrix tensor([[1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9991],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9993],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9992],
        ...,
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9991],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9993],
        [0.9991, 0.9993, 0.9992,  ..., 0.9991, 0.9993, 1.0000]],
       grad_fn=<MmBackward>)
Training [100%]	Loss: 5.5426
torch.Size([1, 83, 41])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-33-436bf227754e> in <module>
     17    print(X_tr.shape)
     18 
---> 19    X_tr = X_tr.view(256, 1, 41)
     20    print(X_tr.shape)
     21    print('TRAINING SAMPLES', X_tr)

RuntimeError: shape '[256, 1, 41]' is invalid for input of size 3403

*CODE

 model = SimCLR()
    
    batch_size=256
    loss_func = N_XENT()
#    filepath =  '/Desktop/Untitled Folder/'
#    writer = SummaryWriter(log_dir="checkpoints")
    optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
    epochs: int = 1
    loss_list = []  
    model.train()
#    device = model.device
    for epoch in range(epochs):
       total_loss = []
    
    for batch_idx, (X_tr, Y_tr) in enumerate(train_ldr):
       X_tr = X_tr.unsqueeze(0)
       print(X_tr.shape)
       
       X_tr = X_tr.view(256, 1, 41)
       print(X_tr.shape)
       print('TRAINING SAMPLES', X_tr) 
     
       Y_tr = Y_tr.type(torch.LongTensor)

       optimizer.zero_grad()

        # Forward pass
       output = model(X_tr)
       print('output', output)
         
       loss = loss_func(output)
        # Backward pass
       loss.backward()
        # Optimize the weights
       optimizer.step()
        
       total_loss.append(loss.item())


       loss_list.append(sum(total_loss)/len(total_loss))
       print('Training [{:.0f}%]\tLoss: {:.4f}'.format(
        100. * (epoch + 1) / epochs, loss_list[-1]))
       if i % 10 == 0:
        # torch.save(model.state_dict(),os.path.join("checkpoints",f"model-{i}.pt"))
         torch.save(model.state_dict(), 'model.pt')

The view operation fails since the number of elements in the tensor doesn’t match the number of elements in the desired view:

torch.Size([1, 83, 41])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-33-436bf227754e> in <module>
     17    print(X_tr.shape)
     18 
---> 19    X_tr = X_tr.view(256, 1, 41)
     20    print(X_tr.shape)
     21    print('TRAINING SAMPLES', X_tr)

RuntimeError: shape '[256, 1, 41]' is invalid for input of size 3403

Could the last batch be smaller and container e.g. only 83 samples?
If so, you might want to use X_tr = X_tr.view(X_tr.size(0), 1 , 41) instead
I’m also unsure what this view operation is supposed to do so make sure my suggestion fits your actual use case.

Thank you for the reply. But when I use

X_tr = X_tr.view(X_tr.size(0), 1 , 41)
~~~,
 it throws error 
RuntimeError: shape '[1, 1, 41]' is invalid for input of size 10496

I want to take batch size 256 and not 1 for my model training where my model is given as:

class SimCLR(nn.Module):
    def __init__(self, device = "cpu", out_dim=128, input_shape=(256,1,41)):
        super(SimCLR,self).__init__()
        self.input_shape=input_shape
#        self.f = Net()
        self.f = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
        self.f.maxpool = nn.Identity()
   #     self.fc1=nn.Identity()
        self.f.fc =nn.Identity()
        self.g = nn.Sequential(nn.Linear( in_features=1, out_features=2048), nn.ReLU(),
        nn.Linear(in_features=2048, out_features = 2048))
        self.f.to(device)
        self.g.to(device)

    def forward(self,x):
        h = self.f(x)
        return self.g(h)

****LSTM model

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        x = self.fc(x[:, -1, :]) 
        # out.size() --> 100, 10
       
        return (x)
LSTM model dimensions
input_dim = 41, hidden_dim = 1, layer_dim = 1, output_dim = 1

Do I need to make some more changes in my model parameters. Not sure

I still don’t know why the view is used, but in case your data has the batch dimension in dim1 and you would like to permute the batch dimension into dim0 use:

X_tr = X_tr.permute(1, 0, 2)

instead.

@ptrblck sir , For the below error, I have followed the steps as suggested to save the model’s state dict during training and load again. But still, it gives the error while loading the state dict. What actually needs to be corrected is not clear as now I am using my model and saving it and not Resnet.

RuntimeError: Error(s) in loading state_dict for SimCLR:
	Missing key(s) in state_dict: "projector.0.weight", "projector.2.weight". 
	Unexpected key(s) in state_dict: "f.lstm.weight_ih_l0", "f.lstm.weight_hh_l0", "f.lstm.bias_ih_l0", "f.lstm.bias_hh_l0", "g.0.weight", "g.0.bias", "g.2.weight", "g.2.bias". 

Model

class SimCLR(nn.Module):
    def __init__(self, device = "cpu", out_dim=128, input_shape=(256,1,41)):
        super(SimCLR,self).__init__()
        self.input_shape=input_shape
   #     self.f = Net()
        self.f = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
        self.f.maxpool = nn.Identity()
   #     self.fc1=nn.Identity()
        self.f.fc =nn.Identity()
        self.g = nn.Sequential(nn.Linear( in_features=1, out_features=2048), nn.ReLU(),
        nn.Linear(in_features=2048, out_features = 2048))
        self.f.to(device)
        self.g.to(device)

    def forward(self,x):
        h = self.f(x)
        return self.g(h)

Load the state dict using model’s object:

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR(LSTMModel, 128, 41)
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out    
      ~~~

The error points to a key mismatch while loading the state_dict as your model object seems to have a projector attribute, which seems to be an nn.Sequential container while the state_dict stored the f and g attributes.
Your code also doesn’t fit the error message as projector is undefined, so please share a minimal, executable code snippet which would reproduce the error.

After data loading, m passing LSTM model to simCLR model

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        x = self.fc(x[:, -1, :]) 
        # out.size() --> 100, 10
       
        return (x)
LSTM model dimensions
input_dim = 41, hidden_dim = 1, layer_dim = 1, output_dim = 1

**Loss Function

class N_XENT(nn.Module):

    def forward(self, X, T=0.5):
        X = nn.functional.normalize(X,dim=1)
        print('X', X)
        print('X.t', X.t())
        sim_matrix = torch.mm(X,X.t())
        print('sim_matrix', sim_matrix)
        sim_matrix = sim_matrix.clamp(min=1e-7) / T

        sim_matrix = sim_matrix - torch.eye(sim_matrix.shape[0],sim_matrix.shape[1]).to(sim_matrix.device) * 1e5

        ## Make array indicating positive samples

        pos = torch.arange(X.shape[0])
        pos[1::2] -=1
        pos[::2] +=1 
        return nn.functional.cross_entropy(input=sim_matrix,target=pos.long().to(sim_matrix.device))

**Training

 model = SimCLR()
    
    batch_size=256
    i = 0
    loss_func = N_XENT()

    optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
    epochs: int = 1
    loss_list = []  
    acc = 0.0
    total_num = 0
    model.train()

    for epoch in range(epochs):
       total_loss = []
    
    for batch_idx, (X_tr, Y_tr) in enumerate(train_ldr):
       X_tr = X_tr.unsqueeze(0)
       print(X_tr.shape)
       

       X_tr = X_tr.permute(1, 0, 2)
       print(X_tr.shape)
       print('TRAINING SAMPLES', X_tr) 
     
       Y_tr = Y_tr.type(torch.LongTensor)

       optimizer.zero_grad()
       output = model(X_tr)
       print('output', output)
         
       loss = loss_func(output)
       loss.backward()
      
       optimizer.step()
        
       total_loss.append(loss.item())


       loss_list.append(sum(total_loss)/len(total_loss))
       print('Training [{:.0f}%]\tLoss: {:.4f}'.format(
        100. * (epoch + 1) / epochs, loss_list[-1]))
       
       if epoch % 10 == 0:
        
         torch.save(model.state_dict(), 'model.pt')

For Testing my data samples using SimCLR method with the help of LInearhead model, mloading my SimCLR model state dict,

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR(LSTMModel, 128, 41)
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out    

where it gives the error during testing.

Any inputs on this will be greatly helpful to me.

Your code is unfortunately not executable and after fixing it, saving and loading the models works:

model = SimCLR()
torch.save(model.state_dict(), 'model.pt')
loaded = LinearHeadModel(None)

E.g. in your current code you are using smclr = SimCLR(LSTMModel, 128, 41) which does not match the previous definition of SimCLR.

Thank you sir for the reply. Is there a way to compute the accuracy of the simCLR() model directly after training and without using Linear head model as my code is executable till the training. The accuracy in the below code takes the parameter “out” which is returned by the forward() of “class LinearHeadModel”.

**Linear head model"

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR()
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out    

***Compute Accuracy"

 model = LinearHeadModel(simclr_model_dict=model.state_dict(), num_classes=2)
#    model = LinearHeadModel(None)
    parameters = [param for param in model.parameters() if param.requires_grad is True]  # trainable parameters
    optimizer = torch.optim.SGD(
    parameters,
    0.1,  
    momentum=0.9,
    weight_decay=0.,
    nesterov=True)
    for epoch in range(epochs):
        total_loss = []
        batch_size=256

        loss_func = N_XENT()
        optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
        epochs: int = 1
        loss_list = []  
        acc = 0.0
        total_num = 0
          
    for batch_idx, (X_ts, Y_ts) in enumerate(test_ldr):
        X_ts = X_ts.unsqueeze(0)
        print(X_ts.shape)
        X_ts = X_ts.view(256, 1, 41)
        print(X_ts.shape)
#        print('TESTING SAMPLES', X_ts) 
     
        Y_ts = Y_ts.type(torch.LongTensor)

        optimizer.zero_grad()

      
        output = model(X_ts)
        print('output', output)
         
        optimizer.zero_grad()
        loss = torch.nn.functional.cross_entropy(out,target.to(device))
        total_num +=X_ts.size(0)
        loss.backward()
        optimizer.step()

        total_loss += loss.detach().item() * X_ts.size(0)
        correct = (torch.argmax(out.to("cpu").data,1) == target.data).float().sum()
        acc += float(100.0*(correct))
        loader.set_description(f"Epoch: {i}, training_loss: {loss}, accuracy :{acc/total_num}")
        print(f"Epoch {i} training loss: {total_loss/total_num} acc : {acc/total_num}")

I don’t know how exactly you want to use LinearHeadModel as its definition is also wrong.
In its forward you are using undefined self.f and self.features attributes, are creating randomly initialized linear layers etc.

If you want to compute the accuracy from the SimCLR model you should be able to directly execute its forward pass.

I am trying to replicate “simclr/eval.py at main · larsh0103/simclr · GitHub” with my LSTM model in place of resnet in simCLR() function. Now when I save my simclr()model and try to test my dataset using Linear head model, it gives me the error.
In LinearHeadModel, I am trying to use the layers of simclr model by creating its object.
I simply want to compute my accuracy with or without linear head. How should I modify my LinearHeadModel for testing

Thanks a lot for your earlier replies. How should I call self.f and self. g from simCLR()method in Linear head model such that they are accessible in linear head with the help of object created in Linear Head as right now they are inaccessible.

In the original code:

class LinearHeadModel(nn.Module):
    def __init__(self, simclr_model_dict,num_classes=10):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        self.device = self._get_device()
        if simclr_model_dict:
            print("loading feature extractor")
            smclr = SimCLR(out_dim=128)
            smclr.load_state_dict(torch.load(simclr_model_dict, map_location=torch.device(self.device)))
            self.features = smclr.f

            # ## Freeze feature extractor
            # for param in self.features.parameters():
            #     param.requires_grad = False
            

            self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))

    def _get_device(self):
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print("Running on:", device)
        return device    

    def forward(self,x):
        h = self.features(x)
        out = self.g(h)
        return 

LinearHeadModel initializes self.features as smclr.f so make sure you are also assigning the corresponding attriibute to it. It can be smclr.f or any other attribute name since you’ve changes the SimCLR model.
Later self.g is initialized with a new Linear layer so you might want to just do the same.