I tried to del unused variable and use ‘torch.cuda.empty_cache()’ to release the gpu memory. Bute I found the used gpu memory is constantly changing but the maximum value is unchanged.
I build the resnet18 in my own way, but the used gpu memory is obviously larger than the official implementation in torch.vision. So how can I find the reason?
Thanks for attention!
def forward(self, bottoms):
# string -> feature_map list
feature_pool = dict()
bottoms = bottoms if isinstance(bottoms, list) else [bottoms]
if (self.get_device_id() >= 0):
for idx, bottom in enumerate(bottoms):
bottoms[idx] = bottoms[idx].cuda(device=self.get_device_id())
for id, i_idx in enumerate(self.input_idx):
feature_pool["bottom_{}".format(id)] = [bottoms[i_idx]]
del bottoms
torch.cuda.empty_cache()
for idx, dag_node in enumerate(self.dag_list):
local_bottoms = []
for x in dag_node.bottoms:
local_bottoms.extend(feature_pool[x])
is_x_depended = any([True if x in dag.bottoms else False for dag in self.dag_list[(idx+1):]])
if (not is_x_depended) and (x not in self.top_names):
del feature_pool[x]
torch.cuda.empty_cache()
local_tops = self._modules[dag_node.scope].forward(local_bottoms)
assert len(dag_node.tops) == len(local_tops)
for i in range(len(dag_node.tops)):
feature_pool[dag_node.tops[i]] = [local_tops[i]]
del local_tops, local_bottoms
torch.cuda.empty_cache()
feature_list = []
for name in self.top_names:
feature_list.extend(feature_pool[name])
del feature_pool[name];torch.cuda.empty_cache()
return feature_list
To release the memory, you would have to make sure that all references to the tensor are deleted and call torch.cuda.empty_cache() afterwards.
E.g. del bottoms should only delete the internal bottoms tensor, while the global one should still be alive.
Also, note that torch.cuda.empty_cache() will not avoid out of memory issues, since the cache is reused, not lost.
Removing the local reference will not delete the global tensor.
If you cannot free the cache, then a reference is still pointing to the tensor as shown here:
def fun(tensor):
print(torch.cuda.memory_allocated() / 1024**2)
# Delete local reference
del tensor
print(torch.cuda.memory_allocated() / 1024**2)
return
# Check that memory is empty
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_cached())
> 0
# Create tensor
x = torch.randn(1024 * 1024, device='cuda')
print(torch.cuda.memory_allocated() / 1024**2)
> 4.0
print(torch.cuda.memory_cached() / 1024**2)
> 20.0
# Call fun and check, if x is still alive
fun(x)
> 4.0
> 4.0
print(x.device) # still alive
> cuda:0
print(torch.cuda.memory_allocated() / 1024**2)
> 4.0
print(torch.cuda.memory_cached() / 1024**2)
> 20.0
# Delete global tensor
del x
print(torch.cuda.memory_allocated() / 1024**2)
> 0.0
print(torch.cuda.memory_cached() / 1024**2)
> 20.0
# Now empty cache
torch.cuda.empty_cache()
print(torch.cuda.memory_cached() / 1024**2)
> 0.0
If I understand your issue correctly you are trying to empty the cache, which doesn’t seem to be working, right?
If that’s the case, you would have to delete all references to the tensors you would like to delete so that the cache can be emptied.
Yes, I tried to avoid using temporary variables and delete unusable variable.In forward function of each module, I delete other feature_map tensor before return result.
You won’t avoid the max. memory usage by removing the cache.
As explained before, torch.cuda.empy_cache() will only release the cache, so that PyTorch will have to reallocate the necessary memory and might slow down your code
The memory usage will be the same, i.e. if your training has a peak memory usage of 12GB, it will stay at this value.
You will only temporarily reduce the allocated memory, which will then be reallocated if necessary.
I agree your opinion. But when I define the cnn model by code showed in question. The peak memory usage is 4 times larger than offfical resnet model in torchvision.
I would recommend to add debug statements using print(torch.cuda.max_memory_allocated()) to try to narrow down which operations are wasting the memory.
Just by skimming through the code, it seems that some lists and dicts are temporarily used and freed later. This might increase the peak memory, e.g. if you are storing the complete feature maps first and delete them one by one later.
When I delete tensor and use empty_cache, the memory usage will decrease only when one-batch train process done rather than where I use “torch.cuda.empy_cache()”.
That might be expected and PyTorch will reallocate the memory, if needed.
You can clear the cache, but won’t be able to reduce the peak memory, and might just slow down the code using it.
If your custom ResNet implementation uses more memory than the torchvision implementation, I would still recommend to compare both implementations by adding the mentioned print statements and narrow down which part of your code uses more memory.