How do I put all the data into nn.dataparallel during eval() mode?

During training nn.dataparallel() working perfectly but eval mode it is showing this bellow error,
RuntimeError: CUDA out of memory. Tried to allocate 29.21 GiB (GPU 0; 7.93 GiB total capacity; 361.73 MiB already allocated; 6.15 GiB free; 34.27 MiB cached)

Example:
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
if torch.cuda.is_available:
torch.cuda.manual_seed_all(1)

class model(nn.Module):
def init(self):
super(model, self).init()

    self.feature = nn.Sequential(
        nn.Linear(int(np.prod(img_shape)), 512),
        nn.LeakyReLU(0.2, inplace=True),
        nn.Linear(512, 10),
        nn.LeakyReLU(0.2, inplace=True))
    self.linear = nn.Sequential( nn.Linear(10, 1))


def forward(self, img):
    img_flat = img.view(img.shape[0], -1)
    img_flat = self.feature(img_flat)
    validity = self.linear(img_flat)
    #print("Discrimiantor model")
    #print("\tIn Model: input size", img.size(),
    #      "output size", validity.size())
    #print("#########################################################")
    return validity

if torch.cuda.device_count() > 1:
print(“Let’s use”, torch.cuda.device_count(), “GPUs!”)

dim = 0 [30, xxx] -> [10, …], [10, …], [10, …] on 3 GPUs

model = nn.DataParallel(model)
Training is perfectly working.
######################## Eval mode
model.eval()
input = torch.randn(10000000,784).to(device)
feature = model.module.feature(input)
print(feature)

I am currently using 4 GPUs with 8GB RAM.

Did you also call model.module.feature during training? It seems to be a custom method, so that I think nn.DataParallel won’t be applied.
Also, wrap your code in a with torch no_grad() block to save some memory during evaluation.

" Did you also call model.module.feature during training?" , No, I am using normal training loss. It is working perfectly.
Do I need to wrap the “input” with dataloader during testing?

nn.DataParallel will call the wrapped module directly (__call__ first, then inside it forward).
If you call some custom methods from the model, data parallel won’t be used, so instead call your model directly again.

sorry. I couldn’t get it. Please give me a simple example so that I can understand.

@ptrblck, “If you call some custom methods from the model, data parallel won’t be used, so instead call your model directly again.”. I am planning to take two outputs from the forward method of the model i.e. validity and feature. Would it help me to solve the problem? I am confused.

same error. “RuntimeError: CUDA out of memory. Tried to allocate 29.21 GiB (GPU 0; 7.93 GiB total capacity; 361.89 MiB already allocated; 6.17 GiB free; 14.11 MiB cached)”
I have tried with the below approach,
model.eval()
with torch.no_grad():
input = torch.randn(10000000,784).to(device)
_, feature = model(input)
print(feature.size())
feature_data = feature.data.cpu().numpy()

In that case your input might be just too large. How many GPUs are you using and were you able to feed 1e7 samples before?

I am currently using 4 GPUs with 8GB RAM (4*8=32GB).
“were you able to feed 1e7 samples before?” No. This is a simple example of my original code where I need to feed almost 20GB data into the model for extracting features. Is it possible with 4GPUs ? Is there anyway to parallelize the data over all three GPUs and extracting features from the model (cuda:‘0’)?