RuntimeError: Error(s) in loading state_dict for SimCLR:

AP_M · August 26, 2022, 5:43pm

I am trying to replicate SimCLR model with link Google Colab using my dataset and model. When I execute using my model, it gives the error as below. Not able to understand where is the problem.

RuntimeError                              Traceback (most recent call last)
<ipython-input-34-4b3055af976b> in <module>
      4                             temperature=0.07,
      5                             weight_decay=1e-4,
----> 6                             max_epochs=100)

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1603         if len(error_msgs) > 0:
   1604             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1605                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1606         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1607 

RuntimeError: Error(s) in loading state_dict for SimCLR:
	Missing key(s) in state_dict: "convnet.conv1.bias". 
	Unexpected key(s) in state_dict: "convnet.bn1.weight", "convnet.bn1.bias", "convnet.bn1.running_mean", "convnet.bn1.running_var", "convnet.bn1.num_batches_tracked", "convnet.layer1.0.conv1.weight", "convnet.layer1.0.bn1.weight", "convnet.layer1.0.bn1.bias", "convnet.layer1.0.bn1.running_mean", "convnet.layer1.0.bn1.running_var", "convnet.layer1.0.bn1.num_batches_tracked", "convnet.layer1.0.conv2.weight", "convnet.layer1.0.bn2.weight", "convnet.layer1.0.bn2.bias", "convnet.layer1.0.bn2.running_mean", "convnet.layer1.0.bn2.running_var", "convnet.layer1.0.bn2.num_batches_tracked", "convnet.layer1.1.conv1.weight", "convnet.layer1.1.bn1.weight", "convnet.layer1.1.bn1.bias", "convnet.layer1.1.bn1.running_mean", "convnet.layer1.1.bn1.running_var", "convnet.layer1.1.bn1.num_batches_tracked", "convnet.layer1.1.conv2.weight", "convnet.layer1.1.bn2.weight", "convnet.layer1.1.bn2.bias", "convnet.layer1.1.bn2.running_mean", "convnet.layer1.1.bn2.running_var", "convnet.layer1.1.bn2.num_batches_tracked", "convnet.layer2.0.conv1.weight", "convnet.layer2.0.bn1.weight", "convnet.layer2.0.bn1.bias", "convnet.layer2.0.bn1.running_mean", "convnet.layer2.0.bn1.running_var", "convnet.layer2.0.bn1.num_batches_tracked", "convnet.layer2.0.conv2.weight", "convnet.layer2.0.bn2.weight", "convnet.layer2.0.bn2.bias", "convnet.layer2.0.bn2.running_mean", "convnet.layer2.0.bn2.running_var", "convnet.layer2.0.bn2.num_batches_tracked", "convnet.layer2.0.downsample.0.weight", "convnet.layer2.0.downsample.1....
	size mismatch for convnet.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([3, 1, 1]).
	size mismatch for convnet.fc.0.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1, 31488]).
	size mismatch for convnet.fc.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
	size mismatch for convnet.fc.2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 256]).

My model is as:

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)

input_dim = 16

hidden_dim = 100

layer_dim = 1

output_dim = 1

model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)

Below is the simclr class where the model is called

class SimCLR(pl.LightningModule):
    
    def __init__(self, hidden_dim, lr, temperature, weight_decay, max_epochs=100):
        super().__init__()
        self.save_hyperparameters()
        assert self.hparams.temperature > 0.0, 'The temperature must be a positive float!'
        # Base model f(.)
        self.convnet = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
      #  self.convnet = Net()
      #  self.convnet =  torchvision.models.resnet18(pretrained=False, 
      #                                             num_classes=2*hidden_dim)
        # The MLP for g(.) consists of Linear->ReLU->Linear 
        self.convnet.fc = nn.Sequential(
            self.convnet.fc,  # Linear(ResNet output, 4*hidden_dim)
            nn.ReLU(inplace=True),
            nn.Linear(2*hidden_dim, hidden_dim)
        )

ptrblck · August 27, 2022, 2:16am

Based on the error message it seems you are trying to load a state_dict of a resnet-like model into your custom model.
Make sure you are storing the state_dict from the same model class you want to load it into again to avoid these missing and unexpected key errors.

AP_M · August 27, 2022, 5:56am

In the code below, pretrained model is used. Where should I make the change to upload my model’s state dictionary.

def train_simclr(batch_size, max_epochs=100, **kwargs):
    trainer = pl.Trainer(default_root_dir=os.path.join(CHECKPOINT_PATH, 'SimCLR'),
                         gpus=1 if str(device)=='cuda:0' else 0,
                         max_epochs=max_epochs,
                         callbacks=[ModelCheckpoint(save_weights_only=True, mode='max', monitor='val_acc_top5'),
                                    LearningRateMonitor('epoch')],
                         )
    trainer.logger._default_hp_metric = None # Optional logging argument that we don't need

    # Check whether pretrained model exists. If yes, load it and skip training
    pretrained_filename = os.path.join(CHECKPOINT_PATH, 'SimCLR.ckpt')
    if os.path.isfile(pretrained_filename):
        print(f'Found pretrained model at {pretrained_filename}, loading...')
        model = SimCLR.load_from_checkpoint(pretrained_filename) # Automatically loads the model with the saved hyperparameters
    else:
        train_ldr = T.utils.data.DataLoader(X_train, batch_size=batch_size, shuffle=True, 
                                       drop_last=True, pin_memory=True, num_workers=NUM_WORKERS)
        val_ldr = T.utils.data.DataLoader(X_val, batch_size=batch_size, shuffle=False, 
                                     drop_last=False, pin_memory=True, num_workers=NUM_WORKERS)
        pl.seed_everything(42) # To be reproducable
        model = SimCLR(max_epochs=max_epochs, **kwargs)
        trainer.fit(model, train_ldr, val_ldr)
        model = SimCLR.load_from_checkpoint(trainer.checkpoint_callback.best_model_path) # Load best checkpoint after training

    return model

ptrblck · August 27, 2022, 7:12am

I’m not familiar enough with Lightning and don’t know what exactly:

model = SimCLR.load_from_checkpoint(trainer.checkpoint_callback.best_model_path) # Load best checkpoint after training

is doing. Usually you would just store and load the state_dict form an nn.Module.
E.g. if you’ve created a custom nn.Module called MyModel, this would be the general workflow:

model = MyModel()
# train your model
...

# save state_dict
torch.save(model.state_dict(), path)

# in another script: create a new model object and load the state_dict
model = MyModel()
model.load_state_dict(torch.load(path))

AP_M · September 2, 2022, 7:01am

How should I give the path in torch.save(…path) and torch.load(path) such that my model is saved and loaded again in Windows. When I give

 filepath = "model.pt" 
where model = SimCLR() which I want to save and load, it gives error:
FileNotFoundError: [Errno 2] No such file or directory: 'model.pt'

Also,

When I give path as: r'C:\Users\anupa\Desktop\model.pth' 
Error: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\anupa\\Desktop\\model.pth'

ptrblck · September 2, 2022, 7:04am

torch.save would accept any valid path with a (new) file name. Make sure the directories exist as torch.save will not recursively create them.

AP_M · September 7, 2022, 6:27am

Thank you for all the solutions , I have corrected them. A Error occurs while training my model. I understand that the error is due to the shape. But I need to specify the shape as [batch size, row, cols] for my model to train. But after few executions getting the error as:

torch.Size([1, 256, 41])
torch.Size([256, 1, 41])
TRAINING SAMPLES tensor([[[-0.3023, -0.2928, -0.1115,  ..., -0.1473, -0.0746, -0.1150]],

        [[-0.3025, -0.2955, -0.0908,  ..., -0.1473, -0.0655, -0.0479]],

        [[-0.3028, -0.2956, -0.0986,  ..., -0.1473, -0.0746, -0.0624]],

        ...,

        [[-0.3024, -0.2932, -0.1101,  ..., -0.1473, -0.0746, -0.1060]],

        [[-0.3024, -0.2939, -0.1127,  ..., -0.1473, -0.0746, -0.1102]],

        [[-0.3024, -0.2935, -0.1058,  ..., -0.1473, -0.0708, -0.1162]]])
output tensor([[-0.2123, -0.0592, -0.3308,  ...,  0.1562, -0.0809,  0.1073],
        [-0.2112, -0.0593, -0.3305,  ...,  0.1537, -0.0814,  0.1066],
        [-0.2117, -0.0592, -0.3306,  ...,  0.1547, -0.0812,  0.1068],
        ...,
        [-0.2123, -0.0592, -0.3308,  ...,  0.1563, -0.0809,  0.1073],
        [-0.2111, -0.0593, -0.3305,  ...,  0.1534, -0.0815,  0.1065],
        [-0.2014, -0.0610, -0.3274,  ...,  0.1304, -0.0855,  0.1002]],
       grad_fn=<AddmmBackward>)
X tensor([[-0.0189, -0.0053, -0.0294,  ...,  0.0139, -0.0072,  0.0095],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0137, -0.0072,  0.0095],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0138, -0.0072,  0.0095],
        ...,
        [-0.0189, -0.0053, -0.0294,  ...,  0.0139, -0.0072,  0.0096],
        [-0.0188, -0.0053, -0.0294,  ...,  0.0137, -0.0073,  0.0095],
        [-0.0179, -0.0054, -0.0290,  ...,  0.0116, -0.0076,  0.0089]],
       grad_fn=<DivBackward0>)
X.t tensor([[-0.0189, -0.0188, -0.0188,  ..., -0.0189, -0.0188, -0.0179],
        [-0.0053, -0.0053, -0.0053,  ..., -0.0053, -0.0053, -0.0054],
        [-0.0294, -0.0294, -0.0294,  ..., -0.0294, -0.0294, -0.0290],
        ...,
        [ 0.0139,  0.0137,  0.0138,  ...,  0.0139,  0.0137,  0.0116],
        [-0.0072, -0.0072, -0.0072,  ..., -0.0072, -0.0073, -0.0076],
        [ 0.0095,  0.0095,  0.0095,  ...,  0.0096,  0.0095,  0.0089]],
       grad_fn=<TBackward>)
sim_matrix tensor([[1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9991],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9993],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9992],
        ...,
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9991],
        [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 0.9993],
        [0.9991, 0.9993, 0.9992,  ..., 0.9991, 0.9993, 1.0000]],
       grad_fn=<MmBackward>)
Training [100%]	Loss: 5.5426
torch.Size([1, 83, 41])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-33-436bf227754e> in <module>
     17    print(X_tr.shape)
     18 
---> 19    X_tr = X_tr.view(256, 1, 41)
     20    print(X_tr.shape)
     21    print('TRAINING SAMPLES', X_tr)

RuntimeError: shape '[256, 1, 41]' is invalid for input of size 3403

*CODE

 model = SimCLR()
    
    batch_size=256
    loss_func = N_XENT()
#    filepath =  '/Desktop/Untitled Folder/'
#    writer = SummaryWriter(log_dir="checkpoints")
    optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
    epochs: int = 1
    loss_list = []  
    model.train()
#    device = model.device
    for epoch in range(epochs):
       total_loss = []
    
    for batch_idx, (X_tr, Y_tr) in enumerate(train_ldr):
       X_tr = X_tr.unsqueeze(0)
       print(X_tr.shape)
       
       X_tr = X_tr.view(256, 1, 41)
       print(X_tr.shape)
       print('TRAINING SAMPLES', X_tr) 
     
       Y_tr = Y_tr.type(torch.LongTensor)

       optimizer.zero_grad()

        # Forward pass
       output = model(X_tr)
       print('output', output)
         
       loss = loss_func(output)
        # Backward pass
       loss.backward()
        # Optimize the weights
       optimizer.step()
        
       total_loss.append(loss.item())


       loss_list.append(sum(total_loss)/len(total_loss))
       print('Training [{:.0f}%]\tLoss: {:.4f}'.format(
        100. * (epoch + 1) / epochs, loss_list[-1]))
       if i % 10 == 0:
        # torch.save(model.state_dict(),os.path.join("checkpoints",f"model-{i}.pt"))
         torch.save(model.state_dict(), 'model.pt')

ptrblck · September 7, 2022, 6:35am

The view operation fails since the number of elements in the tensor doesn’t match the number of elements in the desired view:

torch.Size([1, 83, 41])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-33-436bf227754e> in <module>
     17    print(X_tr.shape)
     18 
---> 19    X_tr = X_tr.view(256, 1, 41)
     20    print(X_tr.shape)
     21    print('TRAINING SAMPLES', X_tr)

RuntimeError: shape '[256, 1, 41]' is invalid for input of size 3403

Could the last batch be smaller and container e.g. only 83 samples?
If so, you might want to use X_tr = X_tr.view(X_tr.size(0), 1 , 41) instead
I’m also unsure what this view operation is supposed to do so make sure my suggestion fits your actual use case.

AP_M · September 7, 2022, 6:45am

Thank you for the reply. But when I use

X_tr = X_tr.view(X_tr.size(0), 1 , 41)
~~~,
 it throws error

RuntimeError: shape '[1, 1, 41]' is invalid for input of size 10496

I want to take batch size 256 and not 1 for my model training where my model is given as:

class SimCLR(nn.Module):
    def __init__(self, device = "cpu", out_dim=128, input_shape=(256,1,41)):
        super(SimCLR,self).__init__()
        self.input_shape=input_shape
#        self.f = Net()
        self.f = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
        self.f.maxpool = nn.Identity()
   #     self.fc1=nn.Identity()
        self.f.fc =nn.Identity()
        self.g = nn.Sequential(nn.Linear( in_features=1, out_features=2048), nn.ReLU(),
        nn.Linear(in_features=2048, out_features = 2048))
        self.f.to(device)
        self.g.to(device)

    def forward(self,x):
        h = self.f(x)
        return self.g(h)

****LSTM model

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        x = self.fc(x[:, -1, :]) 
        # out.size() --> 100, 10
       
        return (x)

LSTM model dimensions
input_dim = 41, hidden_dim = 1, layer_dim = 1, output_dim = 1

Do I need to make some more changes in my model parameters. Not sure

ptrblck · September 7, 2022, 6:59am

I still don’t know why the view is used, but in case your data has the batch dimension in dim1 and you would like to permute the batch dimension into dim0 use:

X_tr = X_tr.permute(1, 0, 2)

instead.

AP_M · September 8, 2022, 5:13pm

@ptrblck sir , For the below error, I have followed the steps as suggested to save the model’s state dict during training and load again. But still, it gives the error while loading the state dict. What actually needs to be corrected is not clear as now I am using my model and saving it and not Resnet.

RuntimeError: Error(s) in loading state_dict for SimCLR:
	Missing key(s) in state_dict: "projector.0.weight", "projector.2.weight". 
	Unexpected key(s) in state_dict: "f.lstm.weight_ih_l0", "f.lstm.weight_hh_l0", "f.lstm.bias_ih_l0", "f.lstm.bias_hh_l0", "g.0.weight", "g.0.bias", "g.2.weight", "g.2.bias".

Model

class SimCLR(nn.Module):
    def __init__(self, device = "cpu", out_dim=128, input_shape=(256,1,41)):
        super(SimCLR,self).__init__()
        self.input_shape=input_shape
   #     self.f = Net()
        self.f = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
        self.f.maxpool = nn.Identity()
   #     self.fc1=nn.Identity()
        self.f.fc =nn.Identity()
        self.g = nn.Sequential(nn.Linear( in_features=1, out_features=2048), nn.ReLU(),
        nn.Linear(in_features=2048, out_features = 2048))
        self.f.to(device)
        self.g.to(device)

    def forward(self,x):
        h = self.f(x)
        return self.g(h)

Load the state dict using model’s object:

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR(LSTMModel, 128, 41)
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out    
      ~~~

ptrblck · September 8, 2022, 8:53pm

The error points to a key mismatch while loading the state_dict as your model object seems to have a projector attribute, which seems to be an nn.Sequential container while the state_dict stored the f and g attributes.
Your code also doesn’t fit the error message as projector is undefined, so please share a minimal, executable code snippet which would reproduce the error.

AP_M · September 9, 2022, 1:35am

After data loading, m passing LSTM model to simCLR model

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.layer_dim = layer_dim

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.dropout = nn.Dropout(0.1)
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        x, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
#        x = self.dropout(x)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        x = self.fc(x[:, -1, :]) 
        # out.size() --> 100, 10
       
        return (x)

LSTM model dimensions
input_dim = 41, hidden_dim = 1, layer_dim = 1, output_dim = 1

**Loss Function

class N_XENT(nn.Module):

    def forward(self, X, T=0.5):
        X = nn.functional.normalize(X,dim=1)
        print('X', X)
        print('X.t', X.t())
        sim_matrix = torch.mm(X,X.t())
        print('sim_matrix', sim_matrix)
        sim_matrix = sim_matrix.clamp(min=1e-7) / T

        sim_matrix = sim_matrix - torch.eye(sim_matrix.shape[0],sim_matrix.shape[1]).to(sim_matrix.device) * 1e5

        ## Make array indicating positive samples

        pos = torch.arange(X.shape[0])
        pos[1::2] -=1
        pos[::2] +=1 
        return nn.functional.cross_entropy(input=sim_matrix,target=pos.long().to(sim_matrix.device))

**Training

 model = SimCLR()
    
    batch_size=256
    i = 0
    loss_func = N_XENT()

    optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
    epochs: int = 1
    loss_list = []  
    acc = 0.0
    total_num = 0
    model.train()

    for epoch in range(epochs):
       total_loss = []
    
    for batch_idx, (X_tr, Y_tr) in enumerate(train_ldr):
       X_tr = X_tr.unsqueeze(0)
       print(X_tr.shape)
       

       X_tr = X_tr.permute(1, 0, 2)
       print(X_tr.shape)
       print('TRAINING SAMPLES', X_tr) 
     
       Y_tr = Y_tr.type(torch.LongTensor)

       optimizer.zero_grad()
       output = model(X_tr)
       print('output', output)
         
       loss = loss_func(output)
       loss.backward()
      
       optimizer.step()
        
       total_loss.append(loss.item())


       loss_list.append(sum(total_loss)/len(total_loss))
       print('Training [{:.0f}%]\tLoss: {:.4f}'.format(
        100. * (epoch + 1) / epochs, loss_list[-1]))
       
       if epoch % 10 == 0:
        
         torch.save(model.state_dict(), 'model.pt')

For Testing my data samples using SimCLR method with the help of LInearhead model, mloading my SimCLR model state dict,

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR(LSTMModel, 128, 41)
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out

where it gives the error during testing.

AP_M · September 9, 2022, 9:33am

Any inputs on this will be greatly helpful to me.

ptrblck · September 9, 2022, 5:28pm

Your code is unfortunately not executable and after fixing it, saving and loading the models works:

model = SimCLR()
torch.save(model.state_dict(), 'model.pt')
loaded = LinearHeadModel(None)

E.g. in your current code you are using smclr = SimCLR(LSTMModel, 128, 41) which does not match the previous definition of SimCLR.

AP_M · September 10, 2022, 5:04am

Thank you sir for the reply. Is there a way to compute the accuracy of the simCLR() model directly after training and without using Linear head model as my code is executable till the training. The accuracy in the below code takes the parameter “out” which is returned by the forward() of “class LinearHeadModel”.

**Linear head model"

class LinearHeadModel(nn.Module):
    def __init__(self,simclr_model_dict, num_classes=2):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        model = SimCLR(LSTMModel, 128, 41)
        if simclr_model_dict:
            print("loading feature extractor")
       
        smclr = SimCLR()
        
        smclr.load_state_dict(torch.load('model.pt', map_location='cpu'))
        
        
    def forward(self,x):
        
        self.features = self.f(x)
                
        self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))
        h = self.features(x)
        out = self.g(h)
        return out

***Compute Accuracy"

 model = LinearHeadModel(simclr_model_dict=model.state_dict(), num_classes=2)
#    model = LinearHeadModel(None)
    parameters = [param for param in model.parameters() if param.requires_grad is True]  # trainable parameters
    optimizer = torch.optim.SGD(
    parameters,
    0.1,  
    momentum=0.9,
    weight_decay=0.,
    nesterov=True)
    for epoch in range(epochs):
        total_loss = []
        batch_size=256

        loss_func = N_XENT()
        optimizer = torch.optim.SGD(model.parameters(),lr=0.3* (batch_size/256), momentum=0.9)
        epochs: int = 1
        loss_list = []  
        acc = 0.0
        total_num = 0
          
    for batch_idx, (X_ts, Y_ts) in enumerate(test_ldr):
        X_ts = X_ts.unsqueeze(0)
        print(X_ts.shape)
        X_ts = X_ts.view(256, 1, 41)
        print(X_ts.shape)
#        print('TESTING SAMPLES', X_ts) 
     
        Y_ts = Y_ts.type(torch.LongTensor)

        optimizer.zero_grad()

      
        output = model(X_ts)
        print('output', output)
         
        optimizer.zero_grad()
        loss = torch.nn.functional.cross_entropy(out,target.to(device))
        total_num +=X_ts.size(0)
        loss.backward()
        optimizer.step()

        total_loss += loss.detach().item() * X_ts.size(0)
        correct = (torch.argmax(out.to("cpu").data,1) == target.data).float().sum()
        acc += float(100.0*(correct))
        loader.set_description(f"Epoch: {i}, training_loss: {loss}, accuracy :{acc/total_num}")
        print(f"Epoch {i} training loss: {total_loss/total_num} acc : {acc/total_num}")

ptrblck · September 10, 2022, 5:56am

I don’t know how exactly you want to use LinearHeadModel as its definition is also wrong.
In its forward you are using undefined self.f and self.features attributes, are creating randomly initialized linear layers etc.

If you want to compute the accuracy from the SimCLR model you should be able to directly execute its forward pass.

AP_M · September 10, 2022, 8:32am

I am trying to replicate “simclr/eval.py at main · larsh0103/simclr · GitHub” with my LSTM model in place of resnet in simCLR() function. Now when I save my simclr()model and try to test my dataset using Linear head model, it gives me the error.
In LinearHeadModel, I am trying to use the layers of simclr model by creating its object.
I simply want to compute my accuracy with or without linear head. How should I modify my LinearHeadModel for testing

AP_M · September 12, 2022, 5:32am

Thanks a lot for your earlier replies. How should I call self.f and self. g from simCLR()method in Linear head model such that they are accessible in linear head with the help of object created in Linear Head as right now they are inaccessible.

ptrblck · September 12, 2022, 6:49am

In the original code:

class LinearHeadModel(nn.Module):
    def __init__(self, simclr_model_dict,num_classes=10):
        super(LinearHeadModel,self).__init__()
        self.num_classes=num_classes
        self.device = self._get_device()
        if simclr_model_dict:
            print("loading feature extractor")
            smclr = SimCLR(out_dim=128)
            smclr.load_state_dict(torch.load(simclr_model_dict, map_location=torch.device(self.device)))
            self.features = smclr.f

            # ## Freeze feature extractor
            # for param in self.features.parameters():
            #     param.requires_grad = False
            

            self.g = nn.Sequential(nn.Linear(512, out_features=self.num_classes, bias=True))

    def _get_device(self):
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print("Running on:", device)
        return device    

    def forward(self,x):
        h = self.features(x)
        out = self.g(h)
        return

LinearHeadModel initializes self.features as smclr.f so make sure you are also assigning the corresponding attriibute to it. It can be smclr.f or any other attribute name since you’ve changes the SimCLR model.
Later self.g is initialized with a new Linear layer so you might want to just do the same.