You could transfer the criterion to the GPU just to avoid possible issues, but it shouldn’t be necessary for nn.BCELoss.
One minor advice: I would remove the last sigmoid in your model and use nn.BCEWithLogitsLoss instead, as it will be numerically more stable.
Check out “Deep Learning with PyTorch” by @lantiga, @elistevens, and @tom, which can be downloaded for free on the official website.
(It’s not the full book if I’m not mistaken, as it’s still work in progress
)