I have a lot of one channel images and wanted to make use of transfer learning. All the pretrained models that are available in torchvision expects 3 channel images, so for time being I have added padding of zeros to make 1 channel image,3 channel image but seems that it’s not helping as my loss isn’t going down. Is there any better way to deal with this ? I am using ResNet50 for this purpose.
Instead of filling the additional channels with zeros you could convert the grayscale image to an RGB one with
This might work in most use cases.
Alternatively, you could keep your grayscale image and add a conv layer in front of your model, which outputs 3 channels. For this approach you would need to train this layer though.
I dont’t have images, I have a dataframe which contains strokes. So I’m using this, so not sure how to use your first approach. The function which converts strokes to images is:
def get_ims(raw_strokes): image = Image.new("P", (28,28), color=255) image_draw = ImageDraw.Draw(image) for stroke in eval(raw_strokes): for i in range(len(stroke)-1): image_draw.line([stroke[i], stroke[i], stroke[i+1], stroke[i+1]], fill=0, width=6) return np.array(image)
I’m sure someone will be able to tell you if there’s a smarter way to do this, but I think what you can do is:
RGB_image = [image] + [image] + [image]
RGB_image = [image] * 3
You can make your own transform that accomplishes it for you:
transform_rgb = transforms.Lambda(lambda image: image.convert('RGB'))
I’m not sure how performant this is, though, given the lambda operation and how it’s on the PIL Image and not the tensor.
What is the difference between:
- Converting the image to an RGB
RGB_image = [image] * 3?
How did the Pytorch’s pretrained models manage this?