Can the Resnet50 output a 2d label?

nikiguo93 · July 7, 2021, 8:46pm

I want to use resnet50 to predict something and I hope that the resnet50 can output a label with a dimension of 17*100. how can I change the fc layer to match this dimension? it is confusing.

ptrblck · July 9, 2021, 7:23am

Could you explain your use case a bit and what these dimensions would represent?
The standard ResNet would already output a 2D tensor in the shape [batch_size, nb_classes], but I assume you want to use another additional dimension?

nikiguo93 · July 9, 2021, 9:09pm

Sure. I want to use Resnet50 to predict the EEGResponse to image stimuli. The input to the DNN is image data (5005003), and the output should be the EEGResponse (17*100). I don’t know how to modify the fc layer to make it output something like that?

nikiguo93 · July 9, 2021, 9:10pm

the image data is 500 *500 *3

ptrblck · July 9, 2021, 11:14pm

I’m still unsure about the output shape. If you would like to get an output in the shape [batch_size, 17*100=1700], you could define the out_features of the last linear layer to be 1700.
The input shape should work, since the resnets in torchvision use adaptive pooling layers before feeding the activation to the linear layer which makes the input shape more flexible.
Note, however, that the input should have the shape [batch_size, channels, height, width], so you might need to permute it.

nikiguo93 · July 10, 2021, 9:31am

Thanks for your reply. Sorry for the unclear description. Actually I am a beginner of Pytorch and DNN.

In the image classfication problem, I would have 1654*10 images and 1654 labels. Images are fed into the Resnet50 and label would be predicted. The batch size is 64.
Then the problem is changed to replace the label with EEG data (something recording the neural activity and the EEG data corresponding to a image is of size 17 *100, a matrix which is quite different from that of label in (1).
in 1) the output feature of fc should be something like 1000? But in 2) how to modify the out_feature size of fc layer? I am confused.

p.s the image size is 500 *500 *3 （height *width *channel). And I would incorparate all the images into a matrix 16540 * 500 *500 *3. The corresponding EEG data matrix would be 16540 * 17 *100. Then I do batch on the image with batch size of 64. In my case how to permute the input and output?

ptrblck · July 11, 2021, 5:56am

Are the posted numbers representing the number of samples for the images and targets? If so, are 10 images corresponding to a single label?
I still don’t know how the output would be treated, but one approach would be to use a linear layer with out_features=1700 and reshape the output to [batch_size, 17, 100]. However, it depends on your actual use case to know, if this would be the right approach.
The out_features are passed as the second argument in the nn.Linear as described in the docs.

To permute the input from [16540, 500, 500, 3] to a channels-first format you could use x = x.permute(0, 3, 1, 2).

nikiguo93 · July 12, 2021, 8:38pm

Yes there are 1654 categories and each category has 10 images
yeah this also make me confused.
thank you for telling me about that.