I solve this problem,
The reason is , torch.cuda.empty_cache() write data to gpu0 (by default) ,about 500M
when I meet this problem, my gpu0 was fully occupied. so if I try
with torch.cuda.device('cuda:1'):
torch.cuda.empty_cache()
no memory allocation occurs on gpu0.