Torch.save() like open(mode='a')

ZimoNitrome · November 17, 2021, 9:30am

I am running a training script and I want to save the output tensors of my validation set after each epoch.

My script runs for an arbitrary amount of epochs so I would like to append tensors to a file after each epoch.

What is the best way to go about this?

I could torch.save() to one new file every epoch, but that will create a lot of files.
I could torch.save() to a single file each epoch, but then I would need to torch.load() that file each epoch to append to the single data structure and re-save it.
I could add to an ever-increasing list inside my script and torch.save() that each epoch, but that would use up more and more memory.

Are there better alternatives? Like appending a text representation of the output tensor and append it to a text file?

ZimoNitrome · November 17, 2021, 10:51am

The following code seems to be quite efficient for my use case:

# Save tensors each output
netG.eval()
with torch.no_grad():
    pred_val = netG(fixed_target)
netG.train()

f_path = Path(f"./progress/batch_outputs.h5")
if not f_path.exists():
    f = tables.open_file(str(f_path), mode='w')
    atom = tables.Float64Atom()
    batches_ea = f.create_earray(f.root, 'batches', atom, shape=(0, *pred_val.shape))
else:
    f = tables.open_file(str(f_path), mode='a')
    f.root.batches.append(pred_val.unsqueeze(0).cpu().numpy())
f.close()

Adapted from python - save numpy array in append mode - Stack Overflow

qpwo · November 30, 2024, 6:24pm

Thank you for sharing your solution!