Models barely detects anything when using .eval() during inference

I trained and validate my model on my own custom dataset. Everything went fine with great results on both train and val subsets.

Note: I use BatchNorm layers and mixed precision during training for both cases described below

Now I want to deploy the model on a device and use my pre-trained weights to run only inference. The way I load the model is this:

    def get_model(self, device):
        model = get_seg_model(base_path=self.base_path)
        # NOTE: having .eval() on causes the model to not detect anything when using our weights
        model = model.eval()
        model = model.to(device)  # Load model onto GPU
        return model

    def load_weights(self, model):
        """A simple function to load the pre-trained weights onto the model.

        Args:
            model (): A pytorch module containing the layers and branches of the model
        """
        print("LOADING MODEL WEIGHTS")
        checkpoint = torch.load(self.weights_path, map_location=self.device)
        state_dict = checkpoint["state_dict"]
        new_state_dict = OrderedDict()
        for k, v in state_dict.items():
            k = k[7:]
            new_state_dict[k] = v
        model.load_state_dict(new_state_dict)
        print("CHECKPOINT LOADED")

First I load the model and then the weights. The reason I modify the weights is that the model was trained using DDP, and I need to remove the module name from each weight. Anyways, after I get my input and turn it into a tensor, I run it through my model:

        with torch.no_grad():
            with torch.cuda.amp.autocast():
                # Pass input frame through model to get prediction
                output_img = self.model(input_frame)

And process the output image using the exact same code.

I have 2 different weight files. Each weight file was obtained using a different dataset. Weights_A was obtained using an online dataset and Weights_B was obtained using my dataset.

  • Lets say I load Weights_A from the online dataset. Having model.eval() plays no difference. At either case, the model performs fine with similar predictions.

  • On the other hand, when I load Weights_B from my dataset, having model.eval() enabled causes the model to predict almost nothing, whereas not using model.eval() causes the model to perform fine.

So the only differences between these 2 cases are the weights, and their datasets respectively. Everything else is the same (same loss function, batch size, momentum, optimizer, etc)

What got me thinking, is that my dataset is not diverse enough (its large but every image is very very similar) meaning that the statistics learned are not transferable for real-world data, compared to the online dataset which has a lot of different images.

Another note on the only inference script: Because the model will be used in many situations, irl I do not normalize the input data (video frames). The only process I do to the input data is this:

    def transform_input(self, frame):
        """Transform the input frame by turning it into a tensor.

        Args:
            frame (cv2 object): The frame that will go through the model for inference. It can be a cv2 object or a PIL Image.

        Returns:
            tensor: The image to be processed by the model
        """
        tensor_transform = transforms.ToTensor()

        return tensor_transform(frame).unsqueeze(0)

and this is how I run inference:

def run_inference(
        self,
        input_frame_raw,
    ):
        """Run inference on a given frame

        Args:
            input_frame_raw: Raw frame coming in from the video stream
        """
        height, width, channels = input_frame_raw.shape
        input_frame = self.transform_input(input_frame_raw)
        if self.onCuda:
            input_frame = input_frame.cuda()

        assert not self.model.training, "WARNING: Model is in training mode."

        with torch.no_grad():
            with torch.cuda.amp.autocast():
                # Pass input frame through model to get prediction
                output_img = self.model(input_frame)
            # Run prediction through the LMDS algorithm
            pred_count, pred_points = self.count_algorithm(input_img=output_img)
            out_frame, out_points_array = self.plot_points(
                points=pred_points, img=input_frame_raw
            )  # Draw points on the original frame

        self.total_count.append(pred_count)
        avg_count = int(np.mean(self.total_count))

Sorry for the long post, thank you for reading :slight_smile:

I found the problem. It’s rather silly, but I will post it here in case anyone else encounters it. While training I was normalizing my data based on the mean and std. On my separate inference script, i forgot to add that line:

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

So each frame was not normalized the same way the model was trained.