That is not how batchnorm 1d works. Batchnorm 1d assumes an optional first batch dimension, then a channel dimension then an actual dimension. So the input is 2d without batch and 3d with batch.
As defined by the BatchNorm1d
, the Input is expected to be of size (N, L)
or (N, C, L)
with batch dim first. What is optional is the additional channel dimension for BatchNorm1d
from the documentation.
This is because you define your batchnorm as having 100
channels, but what you give as input has 2
.
(1, 2, 20)
is due to the suggestion adding .unsqueeze(0) to your input
but the resulting shape is not originally intended. By definition, whether the 100
is C or L in the previous example, BatchNorm1d
produces the same results given (N, 100)
or (N, 100, 1)
. (2, 100)
is already a batch input with 2
1D features and matches the input accepted by BatchNorm1d
. This has to be on the same page.
Now, get back to the issue with SyncBatchNorm
conversion. Two questions:
- Does
SyncBatchNorm
wrapped BatchNorm1d
behave as expected as before the conversion?
The original BatchNorm1d
takes both (N, L) or (N, C, L) and produces the same results as the following revised code segment shows. However, after converted to SyncBatchNorm
which CHANGES the interface to ONLY accepts input of size (N, C, L). This conversion unlikely works transparently with existing models using BatchNorm1d
to accept input of size (N, L)
.
import os
import copy
import torch
from torch import nn
with torch.no_grad():
inputNL = torch.randn(2, 20).cuda()
module = torch.nn.Sequential(
torch.nn.Linear(20, 100),
torch.nn.BatchNorm1d(100)
).cuda()
moduleC = copy.deepcopy(module).cuda()
moduleL = copy.deepcopy(module).cuda()
moduleC.eval()
moduleL.eval()
# XXX: BatchNorm1d accepts (N, C, L)
outputNL = moduleC[0](inputNL)
outputNCL = moduleC[1](outputNL.unsqueeze(-1))
print('BatchNorm1d NCL:', outputNCL.shape, round(outputNCL.mean().item(), 7))
# XXX: BatchNorm1d accepts (N, L) too
outputNL = moduleL[0](inputNL)
outputNL = moduleL[1](outputNL)
print('BatchNorm1d NL:', outputNL.shape, round(outputNL.mean().item(), 7))
os.environ['RANK'] = '0'
os.environ['WORLD_SIZE'] = '1'
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '25791'
torch.distributed.init_process_group(backend='nccl')
moduleC = copy.deepcopy(module)
moduleL = copy.deepcopy(module)
moduleC = nn.SyncBatchNorm.convert_sync_batchnorm(moduleC)
moduleL = nn.SyncBatchNorm.convert_sync_batchnorm(moduleL)
moduleC.eval()
moduleL.eval()
# XXX: converted BatchNorm1d ONLY accepts (N, C, L)
outputNL = moduleC[0](inputNL)
outputNCL = moduleC[1](outputNL.unsqueeze(-1))
print('SyncBatchNorm NCL:', outputNCL.shape, round(outputNCL.mean().item(), 7))
# FIXME: Converted BatchNorm1d never accepts (N, L)
outputNL = moduleL[0](inputNL)
outputNL = moduleL[1](outputNL)
print('SyncBatchNorm NL:', outputNL.shape, round(outputNL.mean().item(), 7))
Sample output:
BatchNorm1d NCL: torch.Size([2, 100, 1]) 0.0683341
BatchNorm1d NL: torch.Size([2, 100]) 0.0683341
SyncBatchNorm NCL: torch.Size([2, 100, 1]) 0.0683341
Traceback (most recent call last):
File "syncBN.py", line 45, in <module>
outputNL = moduleL[1](outputNL)
File "/home/ml/farleylai/miniconda3/envs/sinet37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ml/farleylai/miniconda3/envs/sinet37/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 429, in forward
self._check_input_dim(input)
File "/home/ml/farleylai/miniconda3/envs/sinet37/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 417, in _check_input_dim
.format(input.dim()))
ValueError: expected at least 3D input (got 2D input)
- If not, what is the justification or workaround that does not require changing the existing model to wrap?
One workaround is to reshape/unsqueeze(-1) the immediate input of size (N, L) to (N, C=L, L=1) before the converted BatchNorm1d
as demonstrated by @bonzogondo. Unfortunately, this may not be scalable if the uses of BatchNorm1d
are all over the place in existing models. There is no reshape layers in PyTorch to automate the unsqeeze
. An alternative could be to identify whether the BatchNorm
to wrap is 1D or not so that the SyncBatchNorm._check_input_dim(…) checks the same criteria as BatchNorm1d
as sketched in the following. There may be some other exceptions but the goal should be to wrap existing models transparently.
class SyncBatchNorm(nn.SyncBatchNorm):
def _check_input_dim(self, input):
if self._1d:
if input.dim() != 2 and input.dim() != 3:
raise ValueError('expected 2D or 3D input (got {}D input)'
.format(input.dim()))
elif input.dim() <= 2:
raise ValueError('expected at least 3D input (got {}D input)'
.format(input.dim()))
@classmethod
def convert_sync_batchnorm(cls, module, process_group=None):
...
if isinstance(module, nn.modules.batchnorm._BatchNorm):
module_output = SyncBatchNorm(module.num_features,
module.eps, module.momentum,
module.affine,
module.track_running_stats,
process_group)
module_output._1d = isinstance(module, nn.modules.batchnorm.BatchNorm1d)
...