Inconsistent prediction result in CNN

sanipanwala · January 3, 2023, 10:51am

Hello,

I have done training using text classification CNN. I’m getting one issue during prediction.

When I ran prediction with my trained model using 3 sentiments every time I got different prediction results.

    prediction = model(processed_input)
    probabilities = F.softmax(prediction, dim=1)
    probabilities = probabilities.detach().cpu().numpy()

Here is the code that predicts the result. Let me know if anything needs to be changed.

Thanks,
Sani

ptrblck · January 3, 2023, 5:18pm

Maybe your model uses random operations, such as nn.Dropout layers in its forward, which you should disable during inference? Call model.eval() and check if this helps getting the same predictions.

sanipanwala · January 4, 2023, 6:43am

Hello @ptrblck,

Thanks for your response.

I have already used model.eval() for prediction but got different results.

Thanks,
Sani

ptrblck · January 4, 2023, 7:24am

Could you post the model definition so that I could try to reproduce it, please?

sanipanwala · January 5, 2023, 9:34am

Hello @ptrblck

Here is my model definition. Please guide me for better results.

class CharacterLevelCNN(nn.Module):
    def __init__(self, args, number_of_classes):
        super(CharacterLevelCNN, self).__init__()

        # define conv layers
	# RLJ changed to 1d due to runtime warnings
        self.dropout_input = nn.Dropout1d(args.dropout_input)
        
        #RLJ introduced the large feature size via arg modelfeaturesize
        convlayerfeaturesize = 256
        fclayerfeaturesize=1024
        if args.modelfeaturesize=="large":
            convlayerfeaturesize=1024
            fclayerfeaturesize=2048
        print("Conv Layer Feature Size {}".format(convlayerfeaturesize))
        print("FC Layer Feature Size {}".format(fclayerfeaturesize))
        
        self.conv1 = nn.Sequential(
            nn.Conv1d(
                args.number_of_characters + len(args.extra_characters),
                convlayerfeaturesize,
                kernel_size=7,
                padding=0,
            ),
            nn.ReLU(),
            nn.MaxPool1d(3),
        )

        self.conv2 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=7, padding=0), nn.ReLU(), nn.MaxPool1d(3)
        )

        self.conv3 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv4 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv5 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv6 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU(), nn.MaxPool1d(3)
        )

        # compute the  output shape after forwarding an input to the conv layers

        input_shape = (
            args.batch_size,
            args.max_length,
            args.number_of_characters + len(args.extra_characters),
        )
        print("FC Input Shape {}".format(input_shape))
        self.output_dimension = self._get_conv_output(input_shape)
        print("FC Input Size {}".format(self.output_dimension))

        # define linear layers

        self.fc1 = nn.Sequential(
            nn.Linear(self.output_dimension, fclayerfeaturesize), nn.ReLU(), nn.Dropout(0.5)
        )

        self.fc2 = nn.Sequential(nn.Linear(fclayerfeaturesize, fclayerfeaturesize), nn.ReLU(), nn.Dropout(0.5))

        self.fc3 = nn.Linear(fclayerfeaturesize, number_of_classes)

        # initialize weights
        #RLJ changed from default values of 0.0 and 0.05 to the mean and std per the Yan Lecun paper based on whether it is a large or small model
        #self._create_weights()

        if args.modelfeaturesize=="small":
            self._create_weights(0, 0.05)
        else:
            self._create_weights(0, 0.02)

    # utility private functions

    def _create_weights(self, mean=0.0, std=0.05):
        for module in self.modules():
            if isinstance(module, nn.Conv1d) or isinstance(module, nn.Linear):
                module.weight.data.normal_(mean, std)

    def _get_conv_output(self, shape):
        x = torch.rand(shape)
        x = x.transpose(1, 2)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = x.view(x.size(0), -1)
        output_dimension = x.size(1)
        return output_dimension

    # forward

    def forward(self, x):
        x = self.dropout_input(x)
        x = x.transpose(1, 2)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
		
		
================================
In the Above logic I have passed the below parameter:
1. dropout= 0.5 
2. model size = small
3. batch_size = 16
4. max_length = 10000
5. number_of_characters = 69
6. extra_characters = ""

Thanks,
Sani

ptrblck · January 5, 2023, 9:47am

Thanks for the code.
I cannot reproduce the issue and get the same outputs after calling model.eval():

class CharacterLevelCNN(nn.Module):
    def __init__(self, number_of_classes):
        super(CharacterLevelCNN, self).__init__()

        # define conv layers
	# RLJ changed to 1d due to runtime warnings
        self.dropout_input = nn.Dropout1d(0.5)
        
        #RLJ introduced the large feature size via arg modelfeaturesize
        convlayerfeaturesize = 256
        fclayerfeaturesize=1024
        modelfeaturesize = "small"
        if modelfeaturesize=="large":
            convlayerfeaturesize=1024
            fclayerfeaturesize=2048
        print("Conv Layer Feature Size {}".format(convlayerfeaturesize))
        print("FC Layer Feature Size {}".format(fclayerfeaturesize))
        
        number_of_characters = 69
        self.conv1 = nn.Sequential(
            nn.Conv1d(
                number_of_characters + len(""),
                convlayerfeaturesize,
                kernel_size=7,
                padding=0,
            ),
            nn.ReLU(),
            nn.MaxPool1d(3),
        )

        self.conv2 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=7, padding=0), nn.ReLU(), nn.MaxPool1d(3)
        )

        self.conv3 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv4 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv5 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU()
        )

        self.conv6 = nn.Sequential(
            nn.Conv1d(convlayerfeaturesize, convlayerfeaturesize, kernel_size=3, padding=0), nn.ReLU(), nn.MaxPool1d(3)
        )

        # compute the  output shape after forwarding an input to the conv layers

        input_shape = (
            16,
            10000,
            number_of_characters + len(""),
        )
        print("FC Input Shape {}".format(input_shape))
        self.output_dimension = self._get_conv_output(input_shape)
        print("FC Input Size {}".format(self.output_dimension))

        # define linear layers

        self.fc1 = nn.Sequential(
            nn.Linear(self.output_dimension, fclayerfeaturesize), nn.ReLU(), nn.Dropout(0.5)
        )

        self.fc2 = nn.Sequential(nn.Linear(fclayerfeaturesize, fclayerfeaturesize), nn.ReLU(), nn.Dropout(0.5))

        self.fc3 = nn.Linear(fclayerfeaturesize, number_of_classes)

        # initialize weights
        #RLJ changed from default values of 0.0 and 0.05 to the mean and std per the Yan Lecun paper based on whether it is a large or small model
        #self._create_weights()

        if modelfeaturesize=="small":
            self._create_weights(0, 0.05)
        else:
            self._create_weights(0, 0.02)

    # utility private functions

    def _create_weights(self, mean=0.0, std=0.05):
        for module in self.modules():
            if isinstance(module, nn.Conv1d) or isinstance(module, nn.Linear):
                module.weight.data.normal_(mean, std)

    def _get_conv_output(self, shape):
        x = torch.rand(shape)
        x = x.transpose(1, 2)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = x.view(x.size(0), -1)
        output_dimension = x.size(1)
        return output_dimension

    # forward

    def forward(self, x):
        x = self.dropout_input(x)
        x = x.transpose(1, 2)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x
    
model = CharacterLevelCNN(10)
x = torch.randn(16, 10000, 69)

# all are different since dropout is still active
for _ in range(10):
    out = model(x)
    print(out.double().abs().sum())
# tensor(19179.5103, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(19980.1256, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(19489.9541, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(21702.8869, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(19419.2389, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(19746.8942, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(17694.0387, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(19148.0841, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(18543.3931, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(18682.2872, dtype=torch.float64, grad_fn=<SumBackward0>)    


# now all outputs are equal
model.eval()
for _ in range(10):
    out = model(x)
    print(out.double().abs().sum())
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)
# tensor(5934.1699, dtype=torch.float64, grad_fn=<SumBackward0>)

sanipanwala · January 9, 2023, 9:13am

Hi @ptrblck ,

Thank you for your response.

I’m still not getting good results when I’m training with a few strings as well as a few random CSV data.
Below is an example of training data and testing data.

MY TRAINING DATA

sentiment,text
"1","[a]"
"0","[b]"
"1","[c]"
"0","[d]"
"1","[e]"
"0","[f]"
"1","[g]"
"0","[h]"
"1","[i]"

MY TESTING DATA

sentiment,text
"0","[A]"
"1","[B]"
"1","[C]"
"0","[D]"

tensor(0.0494, device='cuda:0', dtype=torch.float64)
tensor(0.0485, device='cuda:0', dtype=torch.float64)
tensor(0.0474, device='cuda:0', dtype=torch.float64)
tensor(0.0488, device='cuda:0', dtype=torch.float64)
tensor(0.0492, device='cuda:0', dtype=torch.float64)
tensor(0.0484, device='cuda:0', dtype=torch.float64)
tensor(0.0469, device='cuda:0', dtype=torch.float64)

Thanks,
Sani

ptrblck · January 9, 2023, 5:35pm

I’m unsure what “good” results mean in this context as you’ve described to see “inconsistent” predictions (i.e. different or random outputs) even after calling model.eval().
Are you still getting these inconsistent results using your own minimal code snippet?

sanipanwala · January 10, 2023, 8:13am

Hi @ptrblck ,

Sorry for the confusion.

Your code snippet is providing perfect results but when I provide CSV data as I mentioned above during training I’m getting inconsistent results yet.

Thanks,
Sani