Can i run the default cuda 11.3 conda install on cuda 11.6 device?

No, you don’t need to downgrade your local CUDA toolkit as it would only be used if you are building PyTorch from source or custom CUDA extensions.
The binaries ship with their own CUDA runtime and will work.

Create a new virtual environment and reinstall all packages there.

TF32 is disabled by default since ~3 weeks ago so your nightly binary would also not use it.
You can re-enable it in case you are using the default float32 data type.
Also take a look at the performance guide and familiarize yourself with profilers to narrow down bottlenecks as your training could also be blocked by e.g. the data loading while the GPU sits idle.