Installed CUDA after pytorch

Hello, a few days ago i decided to follow a tutorial that was using pytorch. I installed pytorch with the cuda command even though i didnt previously have CUDA installed. Today i got to the GPU part, and it seemed to go awfully slow, even though in the video it went much faster than me. I have a gtx 1070 and it was a simple MNIST example with 1 hidden layer. I did something similar on octave before and it went MUCH faster. So, i thought that probably not having cuda was the problem, i googled and i came to this: Is it required to set-up CUDA on PC before installing CUDA enabled pytorch?
I then installed cuda 11.5 on my machine. I tried the pytorch code again and it still had the same speed. Am i doing something wrong here?

If you are already pushing the input tensor as well as the model to the GPU via .cuda() or .to('cuda') then your GPU should already be used the the util. in nvidia-smi as well as the memory usage would also verify it. To further isolate your bottleneck, profile your code and try to narrow down which part of your code is the slowest.