Hi all,
I’m trying to run the same pytorch training script with different arguments (argparse) from another python script. I’m using os.system()
for the same.
Here’s what I’m trying to do -
train.py
= > the script which contains the train-loop.
runner.py
=> the file which runs the train script in a loop.
# runner.py
for hp in hyperparams:
os.system(f"CUDA_VISIBLE_DEVICES=1 python train.py --arg1 hp")
A few models get trained but I eventually end up getting a CUDA out of memory error. My guess is that the GPU memory is not being cleared after every loop. What can I do to mitigate this?