Loss problem in net finetuning

You could add a print statement with the image path into your Dataset to debug, which images are throwing this error.

I’m going to drive home from the office now, later tonight I will try this and update you with my progress.
Thank you very much for your help till now, I’ll write back here later tonight.

1 Like

Ok so I’m finally back at it, yesterday night I tried with some new images but I got the same error.
I tried also printing which image is being loaded by the DataLoader when the error is thrown, but it’s never the same image, so I guess the problem is not in the data itself but in my Dataset class.

Is there something wrong in this code?

import os
import torch.utils.data
from PIL import Image
from PIL import ImageFile


class MyDataset(torch.utils.data.Dataset):

    def __init__(self, root_dir_img, root_dir_gt, transform=None):

        self.root_dir_img = root_dir_img
        self.root_dir_gt = root_dir_gt
        self.transform = transform

        img_names = [os.path.join(root_dir_img, name) for name in os.listdir(root_dir_img)]

        gt_names = [os.path.join(root_dir_gt, name) for name in os.listdir(root_dir_gt)]

        self.img_files = []
        self.gt_files = []

        for i in range(len(img_names)):
            self.img_files.append(Image.open(img_names[i]))
            self.gt_files.append(Image.open(gt_names[i]))

    def __len__(self):
        return len(self.img_files)

    def __getitem__(self, idx):

        img = self.img_files[idx]
        gt = self.gt_files[idx]

        sample = {'image': img, 'mask': gt}

        if self.transform:
            sample = self.transform(sample)
            img = sample['image']
            gt = sample['mask']

        return img, gt

I’m not sure how many images you have, but could you move the Image.open function to __getitem__?
Usually you should get a warning, of too many files are open, so this shouldn’t be an issue, but we could try that.

Also, I still don’t know, what your self.transform function is. It can’t be the train_transform you posted, since you are using a dict, which shouldn’t work.

Could you post the code of transform?

This is my new dataset class:

class MyDataset(Dataset):

    def __init__(self, root_dir_img, root_dir_gt, transform=None):

        self.root_dir_img = root_dir_img
        self.root_dir_gt = root_dir_gt
        self.transform = transform

        self.img_names = [os.path.join(root_dir_img, name) for name in os.listdir(root_dir_img)]
        self.gt_names = [os.path.join(root_dir_gt, name) for name in os.listdir(root_dir_gt)]

        self.img_names.sort()
        self.gt_names.sort()

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, idx):

        img = Image.open(self.img_names[idx])
        gt = Image.open(self.gt_names[idx])

        sample = {'image': img, 'mask': gt}

        if self.transform:
            sample = self.transform(sample)
            # img = sample['image']  # can I remove this lines?
            # gt = sample['mask']

        return img, gt

And this is the code I’m using to transform (I’m posting only the parts I modified from the basic pytorch transforms.py, tell me if you need something more):

class ColorJitter(object):
    """Randomly change the brightness, contrast and saturation of an image.

    Args:
        brightness (float): How much to jitter brightness. brightness_factor
            is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
        contrast (float): How much to jitter contrast. contrast_factor
            is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
        saturation (float): How much to jitter saturation. saturation_factor
            is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
        hue(float): How much to jitter hue. hue_factor is chosen uniformly from
            [-hue, hue]. Should be >=0 and <= 0.5.
    """

    def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):
        self.brightness = brightness
        self.contrast = contrast
        self.saturation = saturation
        self.hue = hue

    @staticmethod
    def get_params(brightness, contrast, saturation, hue):
        """Get a randomized transform to be applied on image.

        Arguments are same as that of __init__.

        Returns:
            Transform which randomly adjusts brightness, contrast and
            saturation in a random order.
        """
        transforms = []
        if brightness > 0:
            brightness_factor = np.random.uniform(max(0, 1 - brightness), 1 + brightness)
            transforms.append(Lambda(lambda img: adjust_brightness(img, brightness_factor)))

        if contrast > 0:
            contrast_factor = np.random.uniform(max(0, 1 - contrast), 1 + contrast)
            transforms.append(Lambda(lambda img: adjust_contrast(img, contrast_factor)))

        if saturation > 0:
            saturation_factor = np.random.uniform(max(0, 1 - saturation), 1 + saturation)
            transforms.append(Lambda(lambda img: adjust_saturation(img, saturation_factor)))

        if hue > 0:
            hue_factor = np.random.uniform(-hue, hue)
            transforms.append(Lambda(lambda img: adjust_hue(img, hue_factor)))

        np.random.shuffle(transforms)
        transform = Compose(transforms)

        return transform

    def __call__(self, sample):
        """
        Args:
            img (PIL Image): Input image.

        Returns:
            PIL Image: Color jittered image.
        """
        img, mask = sample['image'], sample['mask']
        transform = self.get_params(self.brightness, self.contrast,
                                    self.saturation, self.hue)
        img = transform(img)

        return {'image': img, 'mask': mask}


# ...


class RandomResizedCrop(object):
    """Crop the given PIL Image to random size and aspect ratio.

    A crop of random size of (0.08 to 1.0) of the original size and a random
    aspect ratio of 3/4 to 4/3 of the original aspect ratio is made. This crop
    is finally resized to given size.
    This is popularly used to train the Inception networks.

    Args:
        size: expected output size of each edge
        interpolation: Default: PIL.Image.BILINEAR
    """

    def __init__(self, size, interpolation=Image.BILINEAR):
        self.size = (size, size)
        self.interpolation = interpolation

    @staticmethod
    def get_params(img):
        """Get parameters for ``crop`` for a random sized crop.

        Args:
            img (PIL Image): Image to be cropped.

        Returns:
            tuple: params (i, j, h, w) to be passed to ``crop`` for a random
                sized crop.
        """
        for attempt in range(10):
            area = img.size[0] * img.size[1]
            target_area = random.uniform(0.08, 1.0) * area
            aspect_ratio = random.uniform(3. / 4, 4. / 3)

            w = int(round(math.sqrt(target_area * aspect_ratio)))
            h = int(round(math.sqrt(target_area / aspect_ratio)))

            if random.random() < 0.5:
                w, h = h, w

            if w <= img.size[0] and h <= img.size[1]:
                i = random.randint(0, img.size[1] - h)
                j = random.randint(0, img.size[0] - w)
                return i, j, h, w

        # Fallback
        w = min(img.size[0], img.size[1])
        i = (img.size[1] - w) // 2
        j = (img.size[0] - w) // 2
        return i, j, w, w

    def __call__(self, sample):
        """
        Args:
            img (PIL Image): Image to be flipped.

        Returns:
            PIL Image: Randomly cropped and resize image.
        """
        i, j, h, w = self.get_params(sample['image'])
        return resized_crop(sample, i, j, h, w, self.size, self.interpolation)


# ...


class RandomHorizontalFlip(object):
    """Horizontally flip the given PIL Image randomly with a probability of 0.5."""

    def __call__(self, sample):
        """
        Args:
            img (PIL Image): Image to be flipped.

        Returns:
            PIL Image: Randomly flipped image.
        """
        if random.random() < 0.5:
            return hflip(sample)
        return sample


class RandomVerticalFlip(object):
    """Vertically flip the given PIL Image randomly with a probability of 0.5."""

    def __call__(self, sample):
        """
        Args:
            img (PIL Image): Image to be flipped.

        Returns:
            PIL Image: Randomly flipped image.
        """
        if random.random() < 0.5:
            return vflip(sample)
        return sample


def hflip(sample):
    """Horizontally flip the given PIL Image.

    Args:
        sample (PIL Image): Image to be flipped.

    Returns:
        PIL Image:  Horizontall flipped image.
    """

    img, mask = sample['image'], sample['mask']

    if not _is_pil_image(img):
        raise TypeError('img should be PIL Image. Got {}'.format(type(img)))

    img = img.transpose(Image.FLIP_LEFT_RIGHT)
    mask = mask.transpose(Image.FLIP_LEFT_RIGHT)

    return {'image': img, 'mask': mask}


def vflip(sample):
    """Vertically flip the given PIL Image.

    Args:
        img (PIL Image): Image to be flipped.

    Returns:
        PIL Image:  Vertically flipped image.
    """

    img, mask = sample['image'], sample['mask']

    if not _is_pil_image(img):
        raise TypeError('img should be PIL Image. Got {}'.format(type(img)))

    img = img.transpose(Image.FLIP_TOP_BOTTOM)
    mask = mask.transpose(Image.FLIP_TOP_BOTTOM)

    return {'image': img, 'mask': mask}


# ...


class ToTensor(object):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return to_tensor(pic)


def to_tensor(sample):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    See ``ToTensor`` for more details.
    Args:
        pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
    Returns:
        Tensor: Converted image.
    """

    pic, mask = sample['image'], sample['mask']
    if not(_is_pil_image(pic) or _is_numpy_image(pic)):
        raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))

    if isinstance(pic, np.ndarray):
        # handle numpy array
        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility
        if isinstance(img, torch.ByteTensor):
            img = img.float()

        return {'image': pic, 'mask': mask}

    if accimage is not None and isinstance(pic, accimage.Image):
        nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32)
        pic.copyto(nppic)
        pic = torch.from_numpy(nppic)
        return {'image': pic, 'mask': mask}

    # handle PIL Image
    if pic.mode == 'I':
        img = torch.from_numpy(np.array(pic, np.int32, copy=False))
    elif pic.mode == 'I;16':
        img = torch.from_numpy(np.array(pic, np.int16, copy=False))
    elif pic.mode == 'F':
        img = torch.from_numpy(np.array(pic, np.float32, copy=False))
    else:
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
    # PIL image mode: 1, L, P, I, F, RGB, YCbCr, RGBA, CMYK
    if pic.mode == 'YCbCr':
        nchannel = 3
    elif pic.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(pic.mode)
    img = img.view(pic.size[1], pic.size[0], nchannel)
    # put it from HWC to CHW format
    # yikes, this transpose takes 80% of the loading time/CPU
    img = img.transpose(0, 1).transpose(0, 2).contiguous()
    if isinstance(img, torch.ByteTensor):
        img = img.float() / 255.0
        # img = img.float()

    # handle PIL Image
    if mask.mode == 'I':
        img2 = torch.from_numpy(np.array(mask, np.int32, copy=False))
    elif mask.mode == 'I;16':
        img2 = torch.from_numpy(np.array(mask, np.int16, copy=False))
    elif mask.mode == 'F':
        img2 = torch.from_numpy(np.array(mask, np.float32, copy=False))
    else:
        img2 = torch.ByteTensor(torch.ByteStorage.from_buffer(mask.tobytes()))
    # PIL image mode: 1, L, P, I, F, RGB, YCbCr, RGBA, CMYK
    if mask.mode == 'YCbCr':
        nchannel = 3
    elif mask.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(mask.mode)
    img2 = img2.view(mask.size[1], mask.size[0], nchannel)
    # put it from HWC to CHW format
    # yikes, this transpose takes 80% of the loading time/CPU
    img2 = img2.transpose(0, 1).transpose(0, 2).contiguous()
    if isinstance(img2, torch.ByteTensor):
        img2 = img2.float()

    return {'image': img, 'mask': img2}

The code looks beautiful. At least while skimming through it I couldn’t find any issues.

Since the error appears on different images, we should find the reason for it.
Could you remove the transformation from all images and try it again? Also, try to remove all multiprocessing.

Ok, so, I’m back again with this problem.
I tried to create some toy code in order to understand what should I fix.
This is the Dataset code:

import os
from PIL import Image
from torch.utils.data import Dataset

class MyDataset(Dataset):

    def __init__(self, root_dir_img, root_dir_gt, transform=None):

        self.root_dir_img = root_dir_img
        self.root_dir_gt = root_dir_gt
        self.transform = transform

        self.img_names = [os.path.join(root_dir_img, name) for name in os.listdir(root_dir_img)]
        self.gt_names = [os.path.join(root_dir_gt, name) for name in os.listdir(root_dir_gt)]

        self.img_names.sort()
        self.gt_names.sort()

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, idx):

        img = Image.open(self.img_names[idx])
        gt = Image.open(self.gt_names[idx])

        sample = {'image': img, 'mask': gt}

        if self.transform:
            sample = self.transform(sample)
            # img = sample['image']
            # gt = sample['mask']

        return img, gt

This is the toy code just to understand if the dataloader is working correctly:

from data import MyDataset
import matplotlib.pyplot as plt
import transforms
import torch

img_size = 224
root_dir_img='./path/to/dataset/images'
root_dir_gt='./path/to/dataset/gt'

transform_train = transforms.Compose([
    transforms.ToTensor()
])

train_mydataset = MyDataset(root_dir_img, root_dir_gt, transform_train)
train_loader = torch.utils.data.DataLoader(
    train_mydataset,
    batch_size=24,
    shuffle=True,
    num_workers=1,
    pin_memory=True
)

for batch_idx, (img, gt) in enumerate(train_loader):

    img = img.cuda(async=True)
    gt = gt.cuda(async=True)

    fig = plt.figure()
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)
    ax1.imshow(img)
    ax2.imshow(gt, 'gray')
    plt.show()

And this is the error I get:

Traceback (most recent call last):
  File ".../test.py", line 29, in <module>
    for batch_idx, (img, gt) in enumerate(train_loader):
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File ".../data.py", line 77, in __getitem__
    sample = self.transform(sample)
  File ".../transforms.py", line 584, in __call__
    sample = t(sample)
  File ".../transforms.py", line 603, in __call__
    return to_tensor(pic)
  File ".../transforms.py", line 101, in to_tensor
    img2 = img2.view(mask.size[1], mask.size[0], nchannel)
RuntimeError: invalid argument 2: size '[224 x 224 x 1]' is invalid for input with 6272 elements at /pytorch/torch/lib/TH/THStorage.c:41

This is the to_tensor function that seems to give me problem:

def to_tensor(sample):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    See ``ToTensor`` for more details.
    Args:
        pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
    Returns:
        Tensor: Converted image.
    """

    pic, mask = sample['image'], sample['mask']
    if not(_is_pil_image(pic) or _is_numpy_image(pic)):
        raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))

    if isinstance(pic, np.ndarray):
        # handle numpy array
        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility
        if isinstance(img, torch.ByteTensor):
            img = img.float()

        return {'image': pic, 'mask': mask}

    if accimage is not None and isinstance(pic, accimage.Image):
        nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32)
        pic.copyto(nppic)
        pic = torch.from_numpy(nppic)
        return {'image': pic, 'mask': mask}

    # handle PIL Image
    if pic.mode == 'I':
        img = torch.from_numpy(np.array(pic, np.int32, copy=False))
    elif pic.mode == 'I;16':
        img = torch.from_numpy(np.array(pic, np.int16, copy=False))
    elif pic.mode == 'F':
        img = torch.from_numpy(np.array(pic, np.float32, copy=False))
    else:
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
    # PIL image mode: 1, L, P, I, F, RGB, YCbCr, RGBA, CMYK
    if pic.mode == 'YCbCr':
        nchannel = 3
    elif pic.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(pic.mode)
    img = img.view(pic.size[1], pic.size[0], nchannel)
    # put it from HWC to CHW format
    # yikes, this transpose takes 80% of the loading time/CPU
    img = img.transpose(0, 1).transpose(0, 2).contiguous()
    if isinstance(img, torch.ByteTensor):
        img = img.float() / 255.0
        # img = img.float()

    # handle PIL Image
    if mask.mode == 'I':
        img2 = torch.from_numpy(np.array(mask, np.int32, copy=False))
    elif mask.mode == 'I;16':
        img2 = torch.from_numpy(np.array(mask, np.int16, copy=False))
    elif mask.mode == 'F':
        img2 = torch.from_numpy(np.array(mask, np.float32, copy=False))
    else:
        img2 = torch.ByteTensor(torch.ByteStorage.from_buffer(mask.tobytes()))
    # PIL image mode: 1, L, P, I, F, RGB, YCbCr, RGBA, CMYK
    if mask.mode == 'YCbCr':
        nchannel = 3
    elif mask.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(mask.mode)
    img2 = img2.view(mask.size[1], mask.size[0], nchannel)
    # put it from HWC to CHW format
    # yikes, this transpose takes 80% of the loading time/CPU
    img2 = img2.transpose(0, 1).transpose(0, 2).contiguous()
    if isinstance(img2, torch.ByteTensor):
        img2 = img2.float()

    return {'image': img, 'mask': img2}

I’m currently debugging this function and it seems the error is thrown in converting the mask:

# handle PIL Image
    if mask.mode == 'I':
        img2 = torch.from_numpy(np.array(mask, np.int32, copy=False))
    elif mask.mode == 'I;16':
        img2 = torch.from_numpy(np.array(mask, np.int16, copy=False))
    elif mask.mode == 'F':
        img2 = torch.from_numpy(np.array(mask, np.float32, copy=False))
    else:
        img2 = torch.ByteTensor(torch.ByteStorage.from_buffer(mask.tobytes()))

The condition fail and else is called, returning a tensor of shape [6328], which doesn’t fit the resolution of [1, 218, 226].

Add the following line:

elif mask.mode == '1':
        img2 = torch.from_numpy(np.array(mask, np.uint8, copy=False))

I modified this part of the code:

    # ...
    if mask.mode == 'I':
        img2 = torch.from_numpy(np.array(mask, np.int32, copy=False))
    elif mask.mode == 'I;16':
        img2 = torch.from_numpy(np.array(mask, np.int16, copy=False))
    elif mask.mode == 'F':
        img2 = torch.from_numpy(np.array(mask, np.float32, copy=False))
    elif mask.mode == '1':      # line added
        img2 = torch.from_numpy(np.array(mask, np.uint8, copy=False))
    else:
        img2 = torch.ByteTensor(torch.ByteStorage.from_buffer(mask.tobytes()))

    if mask.mode == 'YCbCr':
        nchannel = 3
    elif mask.mode == 'I;16':
        nchannel = 1
    else:
        nchannel = len(mask.mode)
    img2 = img2.view(mask.size[1], mask.size[0], nchannel)
    # ...

But this is the new error:

Traceback (most recent call last):
  File ".../test.py", line 29, in <module>
    for batch_idx, (img, gt) in enumerate(train_loader):
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 135, in default_collate
    return [default_collate(samples) for samples in transposed]
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 135, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File ".../venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 112, in default_collate
    return torch.stack(batch, 0, out=out)
  File ".../venv/lib/python3.6/site-packages/torch/functional.py", line 66, in stack
    return torch.cat(inputs, dim, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 249 and 264 in dimension 2 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

I think this error is thrown, because your image sizes might differ, so that torch.cat cannot concatenate them into one batch.
Add a resize to your transformation and try it again.
Your example image was [218, 226], so img_size=224 from your code won’t work. Set it to a lower value just for debugging porposes, e.g. img_size=100.

That looks good! I will try using this dataloader with my net and I’ll let you know!

Looks like it works! Thank you very much!
Is there a way for me to donate to help pytorch development? It’s an amazing framework and the effort you put in this forum is really something else!

1 Like

Is your original code working now? I.e. is the loss decreasing the whole time without the jump after one epoch?
If so, it was really a nasty bug. :wink:

I don’t know anything about donations, but it would be great if you would like to participate in this awesome community (forum, github issues, etc.) :slight_smile:

These are my results now:

They are not good results, but the dataset is very very small (~ 200 images), so it’s kinda hard to finetune the final layer with so few new images. Anyway the “one-epoch-jump” bug is solved! That was really a strange behaviour.

That’s good to hear!

You can check, if your model/preprocessing etc. is right, if you just pick a small sample of your training set (e.g. 10 images and its targets) and overfit massively on it, i.e. accuracy should be approx. 100%.
If your model is not able to overfit on such a small dataset, something else could be broken.

Do you mean I should try to run a train session with few images (10) , many epochs (50) and a quite high learning rate (0.01) to overfit on it? Could this be a good method to check my model?

Yes, exactly! I’m not sure regarding the learning rate. You should play around with it a bit.
Since the dataset is really small, the training should only take very little time.

If you can overfit really badly and achieve nearly perfect accuracy, your model is at least capable of learning the small dataset, so that you don’t have some obvious errors.
If that’s not the case, we would have to look back into the training procedure.

I tried with 10 images, 100 epochs and learning rate 0.01 (*0.1 every 30 epochs).
Is this a problem?

I’m finetuning only the last layer of Segnet starting from some pretrained weights.
Should I try with more layer?