For example, you can just load the first batch in the dataloader (then break from the loading loop) and verify that the loss goes down to basically zero after many epochs (you might want to tweak your learning rate schedule, if you have one for this experiment as the epochs have fewer examples now).