Change VGG16 layers for retraining with (1, 512, 512) grayscale images

So i read through this thread (among many others).

However, I can´t seem to find an answer to my specific problem. I would like to train vgg16 from scratch with a large number of my own black and white images at sizes other than (224x224) with a total of 15 classes. I figured out how to change the output class number and change the input channel number to 1 (by simply modifying the original source code which I know is not really how it should be done). Changing the input image size gives me a size mismatch error. How should I correctly do this? Specifically, (1, 512,512) for input layer and 15 classes for output layer. Note: I actually want to change the number of channels on the input layer of the vgg network, not fake the grayscale images to have 3 channels.

Thanks in advance!


first, I personally think modifying the source code (well a copy) is a great way to do this!
typically, the size mismatch is when you move from the conv-layers (which don’t care about the height and width unless you have to few pixels left) to the Linear (aka fully connected) layers. These need to know the size of the inputs.
My not terribly elegant way of finding out is to put a print (x.size()) right before the first call to a linear layer, reading off the size, remove the print and adjust the network definition.

Best regards


1 Like

Thanks! It appears your suggestion is probably the best course to take to figure out the size needed for each layer. I was curious if there was another option, but no one else has suggested an alternative.