Thanks ! Very informative.
I’m just trying to figure out what changes when using CrossEntropyLoss().cuda()
and BCEWithLogitsLoss().cuda()
Just placing the BCE in place of the CE trows me this error:
ValueError: Target size (torch.Size([64])) must be the same as input size (torch.Size([64, 2]))
This is a sniplet of the training step:
for i, (input, target) in enumerate(val_loader):
target = target.cuda(async=True)
input_var = torch.autograd.Variable(input, volatile=True)
target_var = torch.autograd.Variable(target, volatile=True)
# compute output
output = model(input_var)
loss = criterion(output, target_var) # <- error here!
Edit: More Information
Here are the two variables passed into criterion() :
(Pdb) output
tensor([[-0.2657, 0.1728],
[ 0.3407, -0.6961],
[ 0.8020, -0.8201],
[ 0.1457, 0.0311],
[-0.2517, 0.0223],
[-0.1266, -0.3978],
[ 0.4527, -0.6096],
[ 0.2077, -0.1428],
[-0.1205, -0.5252],
[ 0.5462, -0.3988],
[-0.1215, -0.1321],
[ 0.3062, -0.5417],
[ 0.0723, -0.0537],
[-0.5435, -1.1898],
[ 0.0718, -0.0986],
[ 0.0118, -0.0860],
[-0.0998, -0.8494],
[-0.2591, -0.4207],
[ 0.2687, -0.6160],
[-0.2336, -0.4814],
[-0.1896, -0.1463],
[ 0.4623, -0.5179],
[-0.3181, -0.3042],
[-0.2550, -0.1824],
[-0.6250, -0.1293],
[-0.8920, 0.1077],
[ 0.0013, -0.1081],
[-0.2565, -0.0777],
[-0.2360, -0.3112],
[ 0.0615, -0.3419],
[-0.4794, -0.1323],
[-0.0624, 0.1003],
[ 0.1803, -0.2833],
[-0.0859, 0.0516],
[-0.0256, -0.4226],
[-0.6047, -0.3403],
[ 0.2778, -0.6168],
[ 0.0973, -0.3736],
[-0.2165, -0.2941],
[ 0.0252, -0.2497],
[-0.1285, -0.3079],
[-0.3292, -0.5657],
[ 0.1660, -0.5869],
[-0.1829, -0.3313],
[-0.5305, 0.0671],
[ 0.2120, -0.5442],
[-0.1197, -0.0711],
[ 0.2132, -0.5229],
[-0.0977, -0.3243],
[ 0.1694, -0.2342],
[ 0.0137, -0.3607],
[-0.3495, -0.2702],
[ 0.3058, -0.8327],
[ 0.4417, -0.7817],
[-0.7523, -0.5299],
[ 0.0826, -0.3280],
[-0.4834, -0.4926],
[-0.5763, 0.0012],
[ 0.0992, -0.8658],
[-0.1066, 0.4763],
[-0.4472, 0.2544],
[-0.3449, -0.1687],
[-0.1852, 0.1073],
[-0.0782, -0.5123]], device='cuda:0', grad_fn=<AddmmBackward>)
(Pdb) target
tensor([0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1], device='cuda:0')