Insufficient GPU memory due to differences in memory size between the training machine and the deployed machine

Because the GPU memory used for forward propagation during training is very large, but there is no such high GPU memory during actual deployment. Is there any way to reduce the video memory usage by an order of magnitude during deployment?
Batch size is already 1.
The main reason for the large memory usage is because the data is three-dimensional.

Since you won’t be calculating the gradients during deployment you should wrap the forward pass into a with torch.no_grad() or with torch.inference_mode() guard which won’t store any forward activations and will thus decrease the memory usage. The actual reduction in memory usage depends on the model architecture.

Using with torch.no_grad() or with torch.inference_mode() like this?

    def forward(self, x):
        with torch.no_grad():
            x1 = self.conv1_1(x)
            x1 = self.conv1_2(x1)
            x1 = self.conv1_3(x1)

even while training?directly add it in the model?

No, you shouldn’t use it during training as it will disable the gradient calculation as previously explained. Wrap the forward pass of the entire model in the guard during inference:

# inference
with torch.no_grad():
    out = model(x)
1 Like

Very useful! Thank you