Strange number of processes per GPU

I guess that you are using a distributed training setup (via DDP).
If that’s the case, it seems that the setup is incorrect, as apparently each process creates a new CUDA context on each device. I’m not familiar with Lightning, but you could check, if your script only uses the specified GPU and doesn’t allocate tensors on all visible devices.