Can we discuss tooling btw? (If there is a better channel for this, let me know, or maybe even we should create one?)
The general question:
I wrote a pure pytorch prototype last night using wandb logging, and saved JUST the model checkpoint as artifacts. It crashed in the middle of the night and I am manually restarting from that checkpoint.
Should I adopt pytorch-lightning? I’ve used it in the past but I used to run into complications with it when using stranger models like GANs etc. But it also was nice for DDP on multi-GPUs on a single node. Should I use replicate.ai? Are there other tools people prefer for this sort of thing?
The more specific setting I’m working in:
I adopted facebook vicreg code to PRETRAIN a noncontrastive vicreg model that torchsynth synth1b1 audio synthesis PARAMETERS embeddings to match orchsynth synth1b1 audio synthesis AUDIO embeddings. It has some weird facebook pure LARS optimizer I couldn’t get working so I used good old SGD instead. I am currently training it on a single GPU. It crashed after 60K batches for reasons I’m still investigating. The training loss is still going down.
Next, I am going to write a DOWNSTREAM model that uses held out dev and test batches that take the vicreg model and learns a parameter embedding to parameter model on dev, either freeze or tuning the vicreg model, and evaluate on train. I will do QUALITATIVE evaluation by listening to the torchsynth synth of torchsynth audio from predicted parameters versus torchsynth audio from true parameters. I will do QUANTITATIVE evaluation I think using l1 melspectrogram distance of torchsynth audio from predicted parameters versus torchsynth audio from true parameters. (I want to use MSSTFT distance using auraloss but lambdalabs has a funny numba version and auraloss dependency on librosa and thus numba is a huge pain. So instead I code my own audio distance using torchaudio I think.) I first will pretrain vicreg then learn downstream later. Later I might interleave the training (even tho they both have separate optimizers etc.)
I’ll share code and models alter when it’s slightly more mature.