I’m training a image segmentation network where when i give single image as input and run the model in CPU it runs in only one core takes more than 1 minute buts scales in GPU and when i simply multiply two tensors it runs in multiple cores.
I’m training a image segmentation network where when i give single image as input and run the model in CPU it runs in only one core takes more than 1 minute buts scales in GPU and when i simply multiply two tensors it runs in multiple cores.