RuntimeError during evaluation upsample_bilinear2d_out_frame" not implemented for 'Byte'

jitesh · March 1, 2020, 2:04pm

I have trained a custom object detection model using the steps described in this link. I am able to train my model but when I try to evaluate it at the end of an epoch, I get the following error

Epoch: [0] Total time: 0:00:06 (0.2223 s / it)
creating index...
index created!
Traceback (most recent call last):
  File "train.py", line 106, in <module>
    evaluate(model, data_loader_test, device=device)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/sarvani/Desktop/flir/test_frcnn/custom/engine.py", line 107, in evaluate
    outputs = model(image)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 47, in forward
    images, targets = self.transform(images, targets)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/transform.py", line 41, in forward
    image, target = self.resize(image, target)
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/transform.py", line 70, in resize
    image[None], scale_factor=scale_factor, mode='bilinear', align_corners=False)[0]
  File "/home/sarvani/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/nn/functional.py", line 2503, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'Byte'

My code to load the data is as follows

class CustomDataset(torch.utils.data.Dataset):
    
    def __init__(self, root_dir,transform=None):
        self.root = root_dir
        self.rgb_imgs = list(sorted(os.listdir(os.path.join(root_dir, "rgb/"))))
        self.annotations = list(sorted(os.listdir(os.path.join(root_dir, "annotations/"))))


        self._classes = ('__background__',  # always index 0
                         'car','person','bicycle','dog','other')

        self._class_to_ind = {'car':'1', 'person':'2', 'bicycle':'3', 'dog':'4','other':'5'}
	    

    def __len__(self):
        return len(self.rgb_imgs)

    def __getitem__(self, idx):
        self.num_classes = 6

        img_rgb_path = os.path.join(self.root, "rgb/", self.rgb_imgs[idx])   
        img = Image.open(img_rgb_path)
        img = np.array(img)
        img = img.transpose((2, 0, 1))
        img = torch.from_numpy(img)

        filename = os.path.join(self.root,'annotations',self.annotations[idx])
        tree = ET.parse(filename)
        objs = tree.findall('object')

        num_objs = len(objs)
        labels = np.zeros((num_objs), dtype=np.float32)
        seg_areas = np.zeros((num_objs), dtype=np.float32)
        
        boxes = []
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)

            cls = self._class_to_ind[obj.find('name').text.lower().strip()]
            boxes.append([x1, y1, x2, y2])
            labels[ix] = cls
        
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        image_id = torch.tensor([idx])
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
        labels = torch.as_tensor(labels, dtype=torch.float32)
    
        target =  {'boxes': boxes,
                'labels': labels,
                'area': area,
                "image_id":image_id
                }
        target["iscrowd"] = iscrowd
        return img,target

My train.py is as follows

num_classes = 6
model = fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

device = torch.device('cuda')
model = model.cuda()

dataset_train = CustomDataset('FLIR/images/train')
dataset_val = CustomDataset('FLIR/images/val')


data_loader_train = torch.utils.data.DataLoader(
    dataset_train, batch_size=4, shuffle=True,collate_fn=utils.collate_fn)

data_loader_test = torch.utils.data.DataLoader(
    dataset_val, batch_size=4 shuffle=False,collate_fn=utils.collate_fn)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params,lr=0.05,weight_decay=0.0005)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

num_epochs = 30

for epoch in range(num_epochs):
    train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=1)
    lr_scheduler.step()
    evaluate(model, data_loader_test, device=device)

The evaluation function long with the necessary files used are at this link.

Can someone please help me out.

ptrblck · March 2, 2020, 12:46am

Could you check the tensor.type() of the input to your model during evaluation?
It seems you are passing uint8 tensors, while FloatTensors are expected.

mapostig · June 1, 2020, 11:37am

Hello!
I am following the same tutorial (but using my own dataset)
Within the transforms I do a cast in order to cast the image tensors to type(torch.FloatTensor)

class ToTensor(object):
    def __call__(self, image, target):

      image = image.transpose((2, 0, 1)) 
      return torch.from_numpy(np.flip(image,axis=0).copy()).type(torch.FloatTensor), target

Nevertheless, during training:

num_epochs = 10
for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, train_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()

I obtain the following error:

Loss is nan, stopping training
{‘loss_classifier’: tensor(2.0606, device=‘cuda:0’, grad_fn=</NllLossBackward/>), ‘loss_box_reg’: tensor(0., device=‘cuda:0’, grad_fn=</DivBackward0/>), ‘loss_objectness’: tensor(213.0078, device=‘cuda:0’,
grad_fn=</BinaryCrossEntropyWithLogitsBackward/>), ‘loss_rpn_box_reg’: tensor(nan, device=‘cuda:0’, grad_fn=</DivBackward0/>)}
An exception has occurred, use %tb to see the full traceback.

SystemExit: 1

Why is this happening?
Moreover the image visualization after the tensor cast is very poor.

Thankyou

ptrblck · June 2, 2020, 1:47am

You error message seems to raise a NaN loss or do you also see the “not implemented for Byte” error?
In the former case, could you create a new topic with a description of your model and use case (e.g. are you using mixed-precision training, what type or model etc.) and tag me there please?

If you are visualizing with matplotlib, you might have to check the current range, as matplotlib might try to normalize and clip the image.

mapostig · June 2, 2020, 11:28am

Hello!

Finally I was able to solve the problem.
My boxes where in the format [xmin, ymin, w, h] whilst they should be as [xmin, ymin, xmax, ymax]
Moreover, the image values to feed the network were between (0, 255) and they should be between (-1, 1) (isn’t it?)

Also, when I create the model I should pass the mean and std of the images set:

"""#Finetuning the model"""

num_classes  = 2
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, image_mean = mean, image_std = std)

Otherwise, the fasterrcnn_resnet50_fpn would use some default mean and std (from ImageNet maybe?) -see part of fasterrcnn_resnet50_fpn code below-

if image_mean is None:
            image_mean = [0.485, 0.456, 0.406]
        if image_std is None:
            image_std = [0.229, 0.224, 0.225]
        transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

Thankyou so much for your time!!

mapostig · June 2, 2020, 12:06pm

Moreover, I saw the “not implemented for Byte” error, and when I cast the images to FloatTensors as you suggested, the error was the NaN loss.

ptrblck · June 3, 2020, 5:02am

Usually you would normalize the input images to have a zero mean and a unit variance via transforms.Normalize, which is beneficial for the training.
The data type is float32 by default, which would explain the byte dtype error.

That’s not good. Could you post your complete code (if possible), so that I could take a look, please?

mapostig · July 6, 2020, 11:18am

Hello,

At the moment I cannot post the code since till I have my Lab permission…
Nevetheless, at the end the problem wasn’t solved only by passing the mean and std to the backbone

Thanks a lot for your help

ptrblck · July 7, 2020, 3:07am

Sure, in case you want to debug a bit further:

Add torch.autograd.set_detect_anomaly(True) at the beginning of your script. This would yield a stack trace with the operation, which caused the first NaN output.
If you are using mixed-precision training (via native amp, apex, or your manual implementation), disable it for the sake of debugging.
Update to the latest PyTorch version, if not already done.
Check all torch.log, torch.sqrt, torch.pow calls etc. which might easily create Infs/NaNs.

Let me know, if you can release the model or a smaller fake example, which would reproduce this issue.

Daniel_Ajisafe · November 10, 2023, 9:06am

I found the pre-trained model was saved in HalfTensor (fp16) using pytorch “autocast” while my input data was in fp32. When I used with torch.inference_mode(): immediately after the dataloader, the problem was solved. I am guessing both input tensor and model are now in fp16. But interestingly, this is not the case when I print the dtype of the input tensor.

ptrblck · November 10, 2023, 2:51pm

torch.amp.autocast is a context manager to apply mixed-precision training by transforming input activations to lower dtypes if possible and safe. It will not change any parameter dtype and thus autocast is not responsible for your model’s float16 parameters.