Torch.save() like open(mode='a')

I am running a training script and I want to save the output tensors of my validation set after each epoch.

My script runs for an arbitrary amount of epochs so I would like to append tensors to a file after each epoch.

What is the best way to go about this?

  • I could torch.save() to one new file every epoch, but that will create a lot of files.
  • I could torch.save() to a single file each epoch, but then I would need to torch.load() that file each epoch to append to the single data structure and re-save it.
  • I could add to an ever-increasing list inside my script and torch.save() that each epoch, but that would use up more and more memory.

Are there better alternatives? Like appending a text representation of the output tensor and append it to a text file?

The following code seems to be quite efficient for my use case:

# Save tensors each output
netG.eval()
with torch.no_grad():
    pred_val = netG(fixed_target)
netG.train()

f_path = Path(f"./progress/batch_outputs.h5")
if not f_path.exists():
    f = tables.open_file(str(f_path), mode='w')
    atom = tables.Float64Atom()
    batches_ea = f.create_earray(f.root, 'batches', atom, shape=(0, *pred_val.shape))
else:
    f = tables.open_file(str(f_path), mode='a')
    f.root.batches.append(pred_val.unsqueeze(0).cpu().numpy())
f.close()

Adapted from python - save numpy array in append mode - Stack Overflow