Model is not updating

Hi! I am a beginner in deep learning. I have implemented the following model. The problem I am facing now is that the weights and bias are not being updated. When I plot the losses, the losses are periodic(looks the same every epoch). Can anyone please help me see what is causing this? Thank you!

class MyModel(torch.nn.Module):

def __init__(self):

    """

    In the constructor we instantiate two nn.Linear modules and assign them as

    member variables.

    """

    super(MyModel, self).__init__()

    

    # Pooling layer

    self.pool = torch.nn.MaxPool1d(20, stride=2)

    self.pool_avg = torch.nn.AvgPool1d(127)

    # Time(left) convolution

    self.time1 = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=16, stride=1, padding=17)

    self.time2 = torch.nn.Conv1d(in_channels=64, out_channels=128, kernel_size=16, stride=1, padding=17)

    self.time3 = torch.nn.Conv1d(in_channels=128, out_channels=256, kernel_size=16, stride=1, padding=17)

    # Frequency(right) convolution

    self.freq1 = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=16, stride=1, padding=17)

    self.freq2 = torch.nn.Conv1d(in_channels=64, out_channels=128, kernel_size=16, stride=1, padding=17)

    self.freq3 = torch.nn.Conv1d(in_channels=128, out_channels=256, kernel_size=16, stride=1, padding=17)

    # Fully conencted layer

    self.linear1 = torch.nn.Linear(512, 256)

    self.linear2 = torch.nn.Linear(256, 128)

    self.linear3 = torch.nn.Linear(128, 64)

    # Fianl layer

    self.final = torch.nn.Softmax(dim=1)

def forward(self, time_domian, freq_domain, clean_result):

    """

    In the forward function we accept a Tensor of input data and we must return

    a Tensor of output data. We can use Modules defined in the constructor as

    well as arbitrary operators on Tensors.

    """

    # input dimension

    time_domian = time_domian.unsqueeze(1)

    freq_domain = freq_domain.unsqueeze(1)

    # Time(left) convolution

    # print(f"len time domain: {time_domian.shape}")

    time1_out = self.time1(time_domian)

    time1_out = self.pool(time1_out)

    

    # print(f"len time1_out: {time1_out.shape}")

    time2_out = self.time2(time1_out)

    time2_out = self.pool(time2_out)

    # print(f"len time2_out: {time2_out.shape}")

    time3_out = self.time3(time2_out)

    time3_out = self.pool(time3_out)

    # print(f"len time3_out: {time3_out.shape}")

    # Frequency(right) convolution

    freq1_out = self.freq1(freq_domain)

    freq1_out = self.pool(freq1_out)

    freq2_out = self.freq2(freq1_out)

    freq2_out = self.pool(freq2_out)

    freq3_out = self.freq3(freq2_out)

    freq3_out = self.pool(freq3_out)

    # print(f"len freq3_out: {freq3_out.shape}")

    # Connection

    conv_out = torch.cat((time3_out, freq3_out), dim=1)

    conv_out_ave = torch.squeeze(self.pool_avg(conv_out))

                    

    # print(f"len conv_out: {conv_out.shape}")

    # print(f"len conv_out_ave: {conv_out_ave.shape}")

    # Fully conencted layer

    fc1_out = self.linear1(conv_out_ave).clamp(min=0)    # relu

    # print(f"len fc1_out: {fc1_out.shape}")

    fc2_out = self.linear2(fc1_out).clamp(min=0)    # relu

    # print(f"len fc2_out: {fc2_out.shape}")

    fc3_out = self.linear3(fc2_out).clamp(min=0)    # relu

    #print(f"len fc3_out: {fc3_out.shape}")

    # print(fc3_out)

    # Final layer

    final_out =  torch.max(self.final(fc3_out), dim=1)[0]

    # print(f"len final_out: {final_out.shape}")

    # print(final_out)

    return final_out

The torch.max operation in:

final_out =  torch.max(self.final(fc3_out), dim=1)[0]

will allow the gradient to pass to the max. value only and will set all other values to zero, so I’m not sure if that’s really what you want.
If you are dealing with e.g. a multi-class classification, pass the raw logits to nn.CrossEntropyLoss instead of using torch.max or applying softmax.

Do you mean passing “fc3_out” directly to loss_fn?

Also, the code below is how I am performing forward and backward propagation. I wonder if this part of the code is causing the gradient not updating:

    y_pred = model(time, freq, clean_result=False)

    # convert to 1 & -1

    for i in range(len(y_pred)):

        if y_pred[i] >= 0.5:

            y_pred[i] = 1

        else: 

            y_pred[i] = 0

    # Compute loss

    loss = loss_fn(y_pred, y)

    # Zero gradients, perform a backward pass, and update the weights.

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

what is loss function you are using?

the loop in which you are changing probablities to prediction is causing the problem!!!

Can you please elaborate on “the loop in which you are changing probablities to prediction is causing the problem”? Thank you!

I am using the following loss function:

loss_fn = torch.nn.BCELoss()

# define learning rate and optimizer

learning_rate = 1e-4

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

so yo need to remove

    for i in range(len(y_pred)):

        if y_pred[i] >= 0.5:

            y_pred[i] = 1

        else: 

            y_pred[i] = 0

because by doing this you are converting probablities to labels but bceLoss expects probablities in “y_pred”

also for this you need to apply sigmoid function to last layer output!!
and last layer should be a linear layer that transforms data to n*1 dimensions

1 Like

also are you doing binary classification or multilabel??
if binary your last layer should be linear layer for 128 X 64 mapping to 64 X 1
if multi class :
loss_function = CrossEntropyLoss
remove the layer which is doing max(self.final)
and make sure tht y contains encoded label according to target and dtype of y to be as torch.Long