Save torch model in Databricks Cluster

Hi. I’m using a Databricks CPU Cluster to train my NN on Pytorch. The problem is that when I try to save my model, I can only save it on the Cluster CPU (which is unnaccessible) and when I restart the cluster I lose my saved model.
How could I make it to save it into local path(windows)? If it is not possible, could it be possible to save the file on a table-like format and then reformat it into PyTorchModelType?

Thanks :slight_smile:

I’m not sure I understand this sentence correctly.
Are you able to access some folders on this machine?
If so, you should be able to store the state_dict there (similar to your script files).

I can acces into some folders of the Cluster but they are non-savable paths for the model. It seems like I’m saving the .pt into a non accessible path.

Were are you storing the script files? Could you save the state_dict to the same location?

Any path that I try that is not the direct path -> “filename.pt” returns me this error:

FileNotFoundError: [Errno 2] No such file or directory:

Are you getting this error from torch.save(model.state_dict(), 'filename.pt')?

This gets correctly saved and can be loaded but when cluster is restarted it disappears:

torch.save(model.state_dict(), ‘filename.pt’)

Andthis returns de error mentioned before:

torch.save(model.state_dict(), ‘dbfs:/existingfilepath/filename.pt’)

So the files disappear after a restart?
This seems to be a Databricks issue and I’m unfortunately not familiar with their cluster.
Could you add some persistent user folders, where you could store data or maybe mount another drive to the machine?

The path should be like:
‘/dbfs:/existingfilepath/filename.pt’