CPU bottlecking GPU

Assuming the data loading is the bottleneck, you could take a look at this post, which explains common use cases and potential workarounds.
If not already done, use multiple workers in the DataLoader.