Backward fails when the batch size is less than 5

kranti · May 1, 2021, 9:12pm

I am trying to run a simple model

import torch
import torchvision.models as models

model = models.resnet18(pretrained=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model = torch.nn.DataParallel(model)
model.to(device)

loss_criterion = torch.nn.CrossEntropyLoss()
inp = torch.randn(5, 3, 224, 224).to(device)
target = torch.empty(5, dtype=torch.long).random_(1000).to(device)
output = model(inp)
loss = loss_criterion(output, target)
loss.backward()

I observed that backward fails when I give the input with batch size less than 5. Further if I am using nn.dataparallel, the model also requires there are atleaset 5 samples per batch other wise it gives an error.
I am attaching the screenshot of the error below.

My system:
Ubuntu: 20.04
Pytorch:1.8.1
CUDA: 11.1

Shima_Shahfar · May 2, 2021, 1:25am

May I know how many devices you are running your script on? Or should I say what is the number of GPUs you are using?

ptrblck · May 2, 2021, 6:07am

Could you post the output of python -m torch.utils.collect_env? Also, how did you install PyTorch (pip wheel, conda binary, source build)?

kranti · May 2, 2021, 10:04am

I have tested with both cases, using a single GPUs and multiple GPUs. In both the cases, I have found that the minimum data points in each GPU to be 5 for correctly backpropagating.

kranti · May 2, 2021, 10:14am

Collecting environment information…
PyTorch version: 1.8.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti

Nvidia driver version: 455.32.00
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1+cu111
[pip3] torchaudio==0.8.1
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.9.1+cu111
[conda] blas 1.0 mkl
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.1.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.19.4 pypi_0 pypi
[conda] numpydoc 1.1.0 py_0
[conda] torch 1.7.0 pypi_0 pypi
[conda] torchaudio 0.7.0 pypi_0 pypi
[conda] torchvision 0.8.1 pypi_0 pypi

kranti · May 2, 2021, 10:44am

I installed it using pip wheel.

ptrblck · May 2, 2021, 9:51pm

Thanks, could you create a new virtual environment and install the conda binaries, as the pip wheels in 1.8.1 currently fail for sm_61. Related issue

kranti · May 3, 2021, 5:01am

Thanks for the reply. I tested with conda installation and it works fine.