During training nn.dataparallel() working perfectly but eval mode it is showing this bellow error,
RuntimeError: CUDA out of memory. Tried to allocate 29.21 GiB (GPU 0; 7.93 GiB total capacity; 361.73 MiB already allocated; 6.15 GiB free; 34.27 MiB cached)
Example:
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
if torch.cuda.is_available:
torch.cuda.manual_seed_all(1)
class model(nn.Module):
def init(self):
super(model, self).init()
Did you also call model.module.feature during training? It seems to be a custom method, so that I think nn.DataParallel won’t be applied.
Also, wrap your code in a with torch no_grad() block to save some memory during evaluation.
" Did you also call model.module.feature during training?" , No, I am using normal training loss. It is working perfectly.
Do I need to wrap the “input” with dataloader during testing?
nn.DataParallel will call the wrapped module directly (__call__ first, then inside it forward).
If you call some custom methods from the model, data parallel won’t be used, so instead call your model directly again.
@ptrblck, “If you call some custom methods from the model, data parallel won’t be used, so instead call your model directly again.”. I am planning to take two outputs from the forward method of the model i.e. validity and feature. Would it help me to solve the problem? I am confused.
same error. “RuntimeError: CUDA out of memory. Tried to allocate 29.21 GiB (GPU 0; 7.93 GiB total capacity; 361.89 MiB already allocated; 6.17 GiB free; 14.11 MiB cached)”
I have tried with the below approach,
model.eval()
with torch.no_grad():
input = torch.randn(10000000,784).to(device)
_, feature = model(input)
print(feature.size())
feature_data = feature.data.cpu().numpy()
I am currently using 4 GPUs with 8GB RAM (4*8=32GB).
“were you able to feed 1e7 samples before?” No. This is a simple example of my original code where I need to feed almost 20GB data into the model for extracting features. Is it possible with 4GPUs ? Is there anyway to parallelize the data over all three GPUs and extracting features from the model (cuda:‘0’)?