I am trying to get this sorted once and for all. How does Pytorch read dimensions? If I get my size of MNIST, and I see torch.tensor([28, 28]), I read width, height. What does Pytorch read if I feed that into a network (since I have to unsqueeze() it for it to work)? Does it read batch_size: 28, of a 1d tensor of [28] values?
Can someone help me with this or point me to some solid documentation/articles about it?
# get hypothetical tensor.size()
torch.tensor([28]) # [?]
torch.tensor([1, 16]) # [?, ?]
torch.tensor([12, 1, 6]) # [?, ?, ?]
torch.tensor([32, 1, 12, 12]) # [batch_size, channels, height, width]
torch.tensor([32, 3, 2, 16, 28]) # [batch_size, channels, depth, height, width]
Is the first dimension always batch_size? Are the last 2 always height x width?
I just can’t find the logic anywhere to understand what the clean systematic way of thinking of this is, or what the design choices were. Your help is greatly appreciated.
Lastly, when I am coming out of a conv layer, and want to pass into a Linear layer, which view() parameter should be -1, and which dimensions (using the terminology above) should be multiplied together?
example
# x.size() = torch.tensor([32, 21, 12, 12])
# should I flatten it like this?
x = x.view(-1, 21 * 12 * 12)
# or like this:
x = x.view(-1, 32 * 12 * 12)
#or like this:
x = x.view(32, -1)
# or other?
@ptrblck, you’re always great at bringing clarity to these sorts of things…