Using both CPU and GPU during training to avoid CUDA out of memory

Training on a CPU would be very slow if you’re doing deep learning stuff. Perhaps, you might want to try some other approaches such as mixed-precision training ?