How to train PyTorch model on multiple CPU nodes (SLURM)?

It should be possible to instantiate torch.nn.parallel.DistributedDataParallel with device_ids=None, and then simply never call .to on anything.
For connecting with slurm, you should likely use srun ... torchrun ... within your sbatch script, there are examples for this here: Distributed training on slurm cluster

You can then just pass the relevant arguments to torchrun from the sbatch script for each node.