Hello,
I’m trying to replicate the ViT paper: [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
I’ve reproduced the architecture, however, some strange things happen when increasing the size of the nn.Linear()
layers.
For example, when trying to run a nn.Linear()
layer with over 3000
hidden units, my PC crashes and turns off immediately.
It works with lower multiples such as:
- 768 - works
- 1024 - works
- 2048 - works
But anything over 3000, seems to crash my PC without any kind of error warning.
Sometimes it’ll work with 3001 but then won’t work with 3002 hidden units.
The code I’m using is replicating the MLP
block in Table 1 of the paper linked above:
import torch
from torch import nn
# Could also call this "FeedForward"
class MLPBlock(nn.Module):
"""Creates an MLPBlock of the Vision Transformer architecture."""
def __init__(self,
embedding_dim, # embedding dimension (Hidden Size D in Table 1)
mlp_size, # MLP size in Table 1
dropout=0): # "Dropout... is applied to every dense layer... (Appendix B.1)"
super().__init__()
self.mlp = nn.Sequential(
nn.Linear(in_features=embedding_dim,
out_features=mlp_size),
nn.GELU(), # "The MLP contains two layers with a GELU non-linearity (section 3.1)."
nn.Dropout(p=dropout),
nn.Linear(in_features=mlp_size, # needs to take same in_features as out_features of layer above
out_features=embedding_dim), # take back to embedding_dim
nn.Dropout(p=dropout)
)
def forward(self, x):
return self.mlp(x)
# Create random tensor (same shape as paper)
z = torch.randn((1, 196, 768))
print(z.shape)
# Set MLP size
mlp_size = 1024
# No CUDA
cpu_device = "cpu"
print(f"\nUsing device: {cpu_device}")
print(f"Using MLP size: {mlp_size}")
mlp_block = MLPBlock(embedding_dim=768,
mlp_size=mlp_size).to(cpu_device)
z_through_mlp_block = mlp_block(z.to(cpu_device))
print(z_through_mlp_block.shape)
# With CUDA
cuda_device = "cuda"
print(f"\nUsing device: {cuda_device}")
print(f"Using MLP size: {mlp_size}")
mlp_block = MLPBlock(embedding_dim=768,
mlp_size=mlp_size).to(cuda_device)
z_through_mlp_block_cuda = mlp_block(z.to(cuda_device))
print(z_through_mlp_block_cuda.shape)
Output:
torch.Size([1, 196, 768])
Using device: cpu
Using MLP size: 1024
torch.Size([1, 196, 768])
Using device: cuda
Using MLP size: 1024
torch.Size([1, 196, 768])
If I set mlp_size
to be anything over 3000 in the code above, it crashes my whole PC.
Tests I’ve done
I’m not quite sure what’s going on because I’ve tested the same code on Google Colab with a P100 GPU (~16GB memory) and it works fine at various mlp_size
values, including values of 5000+.
But if I run the same code on my local machine with a NVIDIA TITAN RTX (~24GB memory), it crashes immediately.
My hardware
I ran the script from the PyTorch GitHub to show the various hardware/software I’ve got:
Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA TITAN RTX
Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.3
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.11.0
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.11.0
[pip3] torchinfo==1.7.0
[pip3] torchmetrics==0.7.2
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.21.2 py39h20f2e39_0
[conda] numpy-base 1.21.2 py39h79a1101_0
[conda] pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-tb-profiler 0.3.1 pypi_0 pypi
[conda] torchaudio 0.11.0 py39_cu113 pytorch
[conda] torchinfo 1.7.0 pypi_0 pypi
[conda] torchmetrics 0.7.2 pypi_0 pypi
[conda] torchvision 0.12.0 py39_cu113 pytorch
My thoughts
Is this potentially something to do with the max memory values I’ve set on my GPU?
I’m not sure where I’d look to find those.
And temperature wise, this happens regardless of warm start or cold start.