When you say “machine details” do you mean like hardware? Or software?
Do you mean like a docker container? If so, no I’m just running this on a desktop with the conda evironment stated previously. Just straight from command line with
I’m trying to think of the best way to get a coding snippet working so you can reproduce the error. The only issue is the code is spread over a few files and all brought together within a main script. While waiting for your response, I did go through the code and rewrote it up (from scratch) to remove any potential memory issue that I could see being an issue and the ‘new’ version still has the existing memory leak - so I’m a little confused still!
To add to the confusion, I played around with the size of my network and the batch size to see if that affected the memory leak. Interestingly enough, I’ve found one thing that may correspond to the leak. To give a brief overview, my network is a feed-forward network and takes N inputs and returns 2 outputs (sign and logabsdet from
torch.slogdet), and is used to calculate a scalar loss value which I subsequently minimize. However, I’ve noticed that if I have N=8 (for the network input) the memory leak seems to go away and the memory usage just fluctuates around 0.5 Gb but if I have say N=12 the memory leak is present and increases by around 0.1GB per epoch.
I haven’t been able to check with
valgrind yet, I’ve only use valgrind once and that was to debug some Fortran95 code so I’m not 100% sure that would interface with an interpretative language like Python. I can have a look online and see how to use it!
Thank you for the help!