Hi,
I want to extract all features from first convolution Layer in ResNet Model.
So, i construct the codes like below
# Model file
class ResNet(nn.Module):
def __init__().__init__()
.....
# First Convolution Layer
self.conv1 = nn.Conv2d(...)
....
# Fully-Connected Layer
self.fc = nn.Linear(...)
def forward(self, data):
.....
def extract_features(self, data):
# Extract First Convolution features
return self.conv1(data)
# Implement code
model = ResNet()
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(train_loader):
outputs = model.extract_features(inputs) # <- It causes memory leakage
Running this code, i encountered an error
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.91 GiB total capacity; 11.33 GiB already allocated; 2.88 MiB free; 11.34 GiB reserved in total by PyTorch)
I know that torch.no_grad() means do not using gradient options when inference time
But in my code, GPU memories are increase every iteration.
Yes, in the context of torch.no_grad(), no backwards graph is created so the GPU memory allocation must be smaller than without it. Can you provide more code lines saving outputs tensor? Or, check if any other pre-calculated tensors on GPU were freed properly.
features = []
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(train_loader):
outputs = model.extract_features(inputs) # <- It causes memory leakage
features.append(outputs)
# Concat all features
npy_data = torch.cat((features[:]))
npy_data = np.array(npy_data.cpu())
np.save('data_npy/data.npy', npy_data)
When i extract last convolution layer’s features, GPU memory is not raising
I think, The first convolution layer’s dimension is [-1, 64, 32, 32], last convolution layer’s dimension is [-1, 512, 4, 4].
[-1, 64, 32, 32] is more bigger than [-1, 512, 4, 4]. So I think it is the reason that the problem to occur
Try to move outputs tensor to RAM before you append it, such as,
features = []
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(train_loader):
outputs = model.extract_features(inputs)
features.append(outputs.cpu())