Why do some inputs need an axis for batch size but some don't?

I’m building a simple network that takes in two numbers and learns how to add them.

import torch 

add1= torch.randint(0,9,size=[6000])
add2= torch.randint(0,9,size=[6000])
add_sum = add1 + add2

This a pretty simple network

from torch import nn
from torch.nn import functional as F 

class Net(nn.Module):

  def __init__(self):
    super().__init__()
    self.linear1 = nn.Linear(2,20)
    self.linear2 = nn.Linear(20,1)

  def forward(self,x1,x2):
    inp = torch.cat((x1[None],x2[None])).float()
    out = self.linear1(inp)
    out = F.relu(out)
    out = self.linear2(out)

    return out

Here’s the training loop

net = Net()
optim = torch.optim.AdamW(net.parameters(),lr=0.1)
criterion = nn.MSELoss()

for i in range(len(add1)):

  out = net(add1[i],add2[i])
  loss = criterion(out,add_sum[i].float())
  optim.zero_grad()
  loss.backward()
  optim.step()

  if i%500==0: print(loss)
  1. Now the input here doesn’t have batch size, yet it works. But sometimes pytorch inference doesn’t work without a batch size. Why is that?
  2. If I’m trying to input a vector of n features to an NN that starts with a linear layer
    should be the shape (n,) or (n,1) or (1,n) ? Pretty confused about that here.
  3. Is the way I’m handling the input the right way? Or is there a better way to do it?

Essentially a broader question would be, is there a guide on the shapes and types the tensors have to be for different models and loss functions?

Yes, the docs mention the expected shape for each layer and I would stick it to it. My general rule is that a batch dimension is expected in nn.Modules. While e.g. linear layers could work with an input having a single dimension, you would have to verify what’s applied internally (which dimension is broadcasted etc.), so I would prefer to use the documented approach.

1 Like