How to get a tested image with original size after go through a CNN

Hi everyone!! :smiley:

My problem is the following: I am working in a image colorization problem that a grayscale image is the input of a CNN. So, the final dimension of the feature maps from the CNN is (313 channels x 64 width x 64 height).

So, when I am testing the model, I get as result an RGB image of size (3 x 64 x 64) but the original size is another and higher value of width and height.

Does anyone know how to generate the final image of the test with the original size? I already tried to apply a resize, it worked, but the image was blurred. Does anyone know any strategy for maintaining the size of the original image in the final image and without getting blurry?

Best regards,

Matheus Santos.

Instead of resizing (and thus interpolating) you could try to use nn.ConvTranspose2d layers to increase the spatial resolution.
Alternatively, you could also try to keep the spatial size constant throughout the model, but that might not work that well.

I see! I will try!

But, are you suggesting placing these nn.ConvTranspose2d layers on the model during the training step?
I ask this, because the architecture would be different from the one in the article if I do this.

Would I use it only in the test step? In the test step I would submit grayscale image to the trained model and it would return the 313x64x64 feature maps.
After obtaining the feature maps, would I use the nn.ConvTranspose2d layers to enlarge the image? The problem with this is that the weights present in this layer nn.ConvTranspose2d would not be trained right?

You would need to train these layers, otherwise you should get random outputs.
Is the original paper also returning the output in the original shape?
If so, could you explain their architecture a bit or link the paper here?

Yes. The original implementation, in the test case, returns the color image at approximately the original size. It uses scipy.ndimage.zoom () (scipy.ndimage.zoom — SciPy v1.11.4 Manual) to increase the image size. I tried it here, but it didn’t work, the image was blurry.

I’ll send you the article link: https://arxiv.org/pdf/1603.08511.pdf

In the page 24 is the architecture configuration.

Thanks for the information.
Do you know, which setting (order) the authors used for the spline interpolation in the zoom call?
Was the smaller image also blurry in your case?

No, the smaller image that has the size of the model output is not blurry but is hard to compare with the original image and measure the performance of the colorization because is too small, in some cases we can’t see the details ehehehehhehe :laughing:

I will put the test code here. The original implementation was made using Caffe. The line of code that the authors use the zoom call is in the end of the code, the entire red line.
They use this 1.*H_orig/H_out,1.*W_orig/W_out,1 because theses values are the zoom factor that will be applied to each dimensions(width, height, layers).

import numpy as np
import os
import skimage.color as color
import matplotlib.pyplot as plt
import scipy.ndimage.interpolation as sni
import caffe
import argparse

def parse_args():
    parser = argparse.ArgumentParser(description='iColor: deep interactive colorization')
    parser.add_argument('-img_in',dest='img_in',help='grayscale image to read in', type=str)
    parser.add_argument('-img_out',dest='img_out',help='colorized image to save off', type=str)
    parser.add_argument('--gpu', dest='gpu', help='gpu id', type=int, default=0)
    parser.add_argument('--prototxt',dest='prototxt',help='prototxt filepath', type=str, default='./models/colorization_deploy_v2.prototxt')
    parser.add_argument('--caffemodel',dest='caffemodel',help='caffemodel filepath', type=str, default='./models/colorization_release_v2.caffemodel')

    args = parser.parse_args()
    return args

if __name__ == '__main__':
	args = parse_args()

	caffe.set_mode_gpu()
	caffe.set_device(args.gpu)

	# Select desired model
	net = caffe.Net(args.prototxt, args.caffemodel, caffe.TEST)

	(H_in,W_in) = net.blobs['data_l'].data.shape[2:] # get input shape
	(H_out,W_out) = net.blobs['class8_ab'].data.shape[2:] # get output shape

	pts_in_hull = np.load('./resources/pts_in_hull.npy') # load cluster centers
	net.params['class8_ab'][0].data[:,:,0,0] = pts_in_hull.transpose((1,0)) # populate cluster centers as 1x1 convolution kernel
	# print 'Annealed-Mean Parameters populated'

	# load the original image
	img_rgb = caffe.io.load_image(args.img_in)

	img_lab = color.rgb2lab(img_rgb) # convert image to lab color space
	img_l = img_lab[:,:,0] # pull out L channel
	(H_orig,W_orig) = img_rgb.shape[:2] # original image size

	# create grayscale version of image (just for displaying)
	img_lab_bw = img_lab.copy()
	img_lab_bw[:,:,1:] = 0
	img_rgb_bw = color.lab2rgb(img_lab_bw)

	# resize image to network input size
	img_rs = caffe.io.resize_image(img_rgb,(H_in,W_in)) # resize image to network input size
	img_lab_rs = color.rgb2lab(img_rs)
	img_l_rs = img_lab_rs[:,:,0]

	net.blobs['data_l'].data[0,0,:,:] = img_l_rs-50 # subtract 50 for mean-centering
	net.forward() # run network

	ab_dec = net.blobs['class8_ab'].data[0,:,:,:].transpose((1,2,0)) # this is our result
	`ab_dec_us = sni.zoom(ab_dec,(1.*H_orig/H_out,1.*W_orig/W_out,1)) # upsample to match size of original image L`
	img_lab_out = np.concatenate((img_l[:,:,np.newaxis],ab_dec_us),axis=2) # concatenate with original image L
	img_rgb_out = (255*np.clip(color.lab2rgb(img_lab_out),0,1)).astype('uint8') # convert back to rgb

	plt.imsave(args.img_out, img_rgb_out)

I was looking here, I found this function:
torch.nn.functional. upsample ( input , size=None , scale_factor=None , mode=‘nearest’ , align_corners=None )

Do you think that it will work to resize the final image?

I am thinking of applying this function to the output that CNN generates which is a tensor (1, 313, 56, 56). I would apply this function with the original dimensions of the image to this CNN output.
Do you think it works?

upsample would also interpolate the image, similar to transforms.Resize. You could of course compare different interpolation techniques and chose the best looking one.

Yeah! I see!
I will try this.

Thanks!!