Here I observe some difference and just want to make sure:
I am running my model on dual-GPU, during training, each GPU will be used for 4GB.
But during validation (i.e. the training is not finished yet, but just switch to model.eval() to run validation and will switch back to model.train() after validation), I will observe GPU 1 memory usage become 7GB while GPU 2 is still the same.
Then after validation finished and change back to training, the memory usage for GPU1 is not back to 4GB…it locked to 7GB…
is this normal??? I would expect it will change back to 4GB after validation is done…
what is the process for model.eval()? will it clear the memory after it finished? how does PyTorch assign the memory when it switches from train() to eval()? I am so curious!
training epoch 1: GPU1 ---4GB GPU2 ---4GB validation: GPU1 ---7GB GPU2 ---4GB training epoch 2: GPU1 ---7GB (???) GPU2 ---4GB