Why the function defined to take in a [784] tensor and output a [10] tensor can take in [100, 28, 28] tensor and output [100, 10] tensor?

Here is the myNet, defined as:

class myMLP(torch.nn.Module):

    def __init__(self):
        super(myMLP, self).__init__()

        self.model=torch.nn.Sequential(
            torch.nn.Linear(784, 200),
            torch.nn.Dropout(0.3), #drop 30%
            torch.nn.LeakyReLU(inplace=True),
            torch.nn.Linear(200, 200),
            torch.nn.Dropout(0.4), #drop 40%
            torch.nn.LeakyReLU(inplace=True),
            torch.nn.Linear(200, 10),
        )
    
    def forward(self, x):
        x=self.model(x)

        return x

which is defined to take in [784] tensor and output [10] tensor.
while when I use it in the following way, and successfully:

for epoch in range(maxepoch):
    myNet.train()
    for batch_idx, (data, target) in enumerate(train_loader):

        data=data.view(-1, 28*28)
        print(data.size())
        data, target=data.to(device), target.to(device)
        logits=myNet(data)
        print(logits.size())
        loss=loss_function(logits, target)

It took in a [100, 28, 28] data each time without reporting error, and output [100, 10] logits.

How does that work? Thanks.

Additionally you are flattening the inputs to [-1, 28*28=784] in this line of code:

data=data.view(-1, 28*28)

which passes the input to the model in the expected shape.

Thanks.

But, how about the 100 in [100, 784], is that a built-in automatic extension? the broadcasting principle? I didn’t expect broadcasting can go that far!

The -1 will indicate to the view operation to move “everything else” to this dimension and broadcasting is not used.

I don’t know how to interpret this statement.

Look, the class is defined to take in [784], while I sent in a batch of 100 of [784] by using [-1, 784], and the class calculated the whole batch, which is not a behavior defined by me. I thought it is broadcasting working here but in a confusing way.

It is exactly the defined behavior.
The linear layer expects inputs in the shape [batch_size, in_features]. Your input has 100 samples (batch_size=100) and you are flattening the input to [batch_size=100, in_features=784] so the layer will process this batch as specified. No broadcasting is used in this case as the batch dimension is expected (in all layers).

Yes, you are right, it is i confused myself. Thanks! :sweat_smile: