Debugging training from the remote server

Hi,

It may seem dummy for some of the developers, but this is the first time that I am dealing with around 50gb training data. Until now, I was only implementing homeworks and can store the data in my local.

Now, I want to debug the baseline code, visualize the data and training& validation curves by using tensorboard (if you are using a different one, I am open to suggestions), but the code is running on the remote server on the GPU. The dataset is available on the remote server. Since I am using mac there is no chance to run in my local the code also store the dataset.

I am wondering, how do you handle this kind of situations in bigger projects in terms of the scale?

  • How to visualize data
  • How implementation works (Are you implementing in your local but what about testing?)

I am looking for general advice. Could you please help me?

Thanks a lot

If you are working on a remote server (with docker), you could open a specific port (e.g. 6006) and use it to for the tensorboard visualization.

The workflow might differ, but you could of course directly work on the server, with a code editor in the terminal or you could use a remote session with e.g. VSCode and develop of your Laptop.
Iā€™m not familiar with Mac, but Atom should also have this remote feature.

1 Like