Windows uses twice as much VRAM than Linux for same code

I am running Tacotron2 on Windows with Conda environment with an RTX 3090.

I can get to run but in Windows, it seems like it uses twice as much VRAM as compared to my Linux counter parts.

I’ve tried using python 3.6 and 3.7. installing pytorch through pip and conda. using cudatoolkit.
CudaToolkits pre-v11 tell me its not sm_86 compatible and freeze.

I didn’t always have this issue but I’ve gone back to my conda env backups, and those don’t help.

Is there any scripts that can tell me which cuda version that its using? I removed all the cuda paths but somehow my conda env still knew where to find it and run.

My colab graphics card is 16GB and can do a batch of 48. While my windows graphics card is 24GB and can only do 32 batch size.

Colab Link:

I installed cudnn 8.1.0 but when I run this code and I get: (‘11.0’, 8004)

def get_cuda_version():
    import torch
    cuda = (torch.version.cuda)
    cudnn = torch.backends.cudnn.version()
    return cuda, cudnn

The conda binaries and pip wheels will use their own CUDA runtime (which you specify during the installation) as well as cudnn, NCCL etc., so your local CUDA toolkit won’t be used.
The local CUDA toolkit (and in particular the compiler) will be used if you are building a custom CUDA extension or PyTorch from source.

Are both machines using the same PyTorch, CUDA, cudnn installation (e.g. through conda) as well as the same GPU?
If so, are both GPUs free before you start the training or is e.g. Windows using it to display the desktop (which will use GPU memory)?

Thanks ptrblck,

When I run get_cuda_version() in Ubuntu 18.04, I get (‘11.0’, 8005) so slightly different cudnn.

I did the same install steps using conda in Ubuntu 18.04 as I do in Windows.

I’m getting the same results as Windows for same conda and pytorch as I do in Ubuntu for same GPU.

Colab gives me a P100 or V100 and its using pytorch+cu101. Is Cuda 10 less memory intensive?

I am using the display for my desktop but its less than 500MB.

That’s good to hear. So the different memory behavior was observed between your local Windows machine and Colab?

It might be the case, but would also depend on the used device.
Could you create a single CUDATensor and check the memory used by the CUDA context via nvidia-smi? Make sure that the GPU memory is empty before running this test or subtract the already used memory.


In google colab, I got a v100 today. I switched the pytorch version to be cu110 from cu101.

However, I didn’t use conda. I was able to do a batch size of 48 without crashing using 16GB - as I have always been able to do using colab.

pip install torch==1.7.1+cu110 -f

get_cuda_version() in colab gave me (‘11.0’, 8005)

I’ll get the tensors tonight.

Hope this is right. Both were close.

import torch

a = torch.full((100000,10000,), 3, device=torch.device("cuda"))

Windows10 = 8422MiB
Colab = 8851MiB