Hi. First of, thanks for all the work you people constantly put in. I appreciate it.

I am currently training a relatively simple CNN for a classification task (75 classes, a few thousand training samples). The network itself trains nicely, and testing gives me a reasonable accuracy (depending on a few hyperparameters and loss functions, around 60-70%). However, I have run into a problem with inference when I tried to include it in a pipeline.

Forward-pass code so you know the network:

```
def forward(self, x):
x = self.noise(x)
x = self.rotation(x)
x = self.conv1(x)
if self.bn:
x = self.BN1(x)
x = nnf.relu(x)
x = self.dropout_conv(x)
x = nnf.avg_pool2d(x, kernel_size=4)
x = self.conv2(x)
if self.bn:
x = self.BN2(x)
x = nnf.relu(x)
x = nnf.max_pool2d(x, kernel_size=2)
x = self.conv3(x)
if self.bn:
x = self.BN3(x)
x = nnf.relu(x)
x = nnf.max_pool2d(x, kernel_size=2)
x = self.conv4(x)
if self.bn:
x = self.BN4(x)
x = nnf.relu(x)
x = nnf.max_pool2d(x, kernel_size=2)
x = x.view(-1, self.n_feature*16*16*24)
x = self.dropout_fc(x)
x = self.fc1(x)
x = nnf.relu(x)
x = self.dropout_fc(x)
x = self.fc2(x)
x = nnf.relu(x)
x = self.dropout_fc(x)
x = self.fc3(x)
return x
```

BN is BatchNorm layers, FC are fully connected.

Probability estimation during testing in my original notebook:

```
with torch.no_grad():
for data, target, dindex in test_loader:
output = model(data)
lsm = nnf.log_softmax(output/model.temperature, dim=1).to(output.device)
sm = nnf.softmax(output/model.temperature, dim=1).to(output.device)
test_loss += nnf.nll_loss(lsm, target.to(output.device), reduction='sum').item()
pred = lsm.data.max(1, keepdim=True)[1]
prob = sm.data.max(1, keepdim=True)[0]
...
```

Inference in the pipeline (currently another notebook):

```
...
representation = df.make_representation_from_unknown(current_image = sitk_image, target_size=(512,512,512))
# add batch dimension to image
tensor_representation = torch.unsqueeze(torch.Tensor(representation), 0)
with torch.no_grad():
# load network
network = torch.load(network)
# set to eval mode
network.eval()
# collect results
logits = network(tensor_representation)
if verbose:
print(logits)
lsm = torch.nn.functional.log_softmax(logits/network.temperature, dim=1)
sm = torch.nn.functional.softmax(logits/network.temperature, dim=1)
prediction = lsm.data.max(1, keepdim=True)[1].item()
probability = sm.data.max(1, keepdim=True)[0].item()
...
```

The original code produces sensible probabilities just fine. The ported version does not (it only produces a 1 and otherwise 0s). The only difference I can see, since the entire model should be loaded, is that of batch sizes, as the test_loader comes with a batch size of 24, while the pipeline will have to make single predictions. The only guess I have thus far is BatchNorm acting up because of the change in batch size.

Is that intuition correct? How do I solve it?