Out of Memory by Torch (CUDA)

PyTorch won’t load the entire dataset behind your back and onto the GPU, as it wouldn’t make sense.

@somedays are you forgetting the forward activations? The parameters and inputs could be tiny in comparison to the activations depending on the model architecture. This post describes it in more detail for a resnet.

1 Like