My loss function is:
criterion = nn.CrossEntropyLoss(reduction='none')
Iteration 0
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
------------------------------------------------------------------------------------------------------------
Iteration num: 0
tensor([6, 6, 3, 0, 5, 2, 8, 2, 8, 0, 1, 1, 1, 8, 6, 6, 2, 9, 8, 8, 6, 8, 1, 3,
7, 1, 1, 0, 3, 3, 9, 2, 4, 2, 6, 5, 9, 7, 4, 0, 3, 2, 7, 3, 5, 9, 0, 6,
6, 0, 7, 4, 0, 4, 1, 7, 1, 1, 8, 5, 4, 4, 1, 7, 2, 5, 6, 8, 8, 6, 9, 2,
4, 6, 1, 8, 1, 8, 9, 0, 9, 1, 4, 4, 0, 7, 4, 1, 2, 2, 7, 5, 1, 6, 0, 3,
9, 4, 2, 5, 8, 7, 8, 1, 2, 1, 0, 2, 6, 9, 5, 5, 3, 7, 0, 3, 5, 9, 4, 5,
3, 2, 8, 7, 7, 4, 6, 6, 9, 0, 6, 1, 6, 4, 1, 2, 1, 8, 9, 0, 6, 0, 8, 2,
2, 5, 0, 7, 8, 3, 8, 1, 2, 4, 2, 2, 5, 0, 9, 6, 0, 9, 4, 7, 1, 3, 1, 0,
6, 9, 2, 0, 9, 7, 1, 4, 3, 3, 9, 4, 9, 8, 7, 1, 9, 2, 3, 8, 7, 6, 3, 4,
8, 6, 7, 4, 8, 0, 0, 9, 4, 1, 9, 6, 7, 9, 0, 7, 6, 4, 3, 6, 3, 0, 4, 7,
1, 4, 0, 3, 6, 4, 6, 2, 0, 1, 9, 5, 4, 9, 4, 3, 1, 1, 0, 4, 4, 1, 3, 9,
8, 4, 7, 9, 0, 3, 3, 5, 4, 6, 5, 1, 0, 6, 3, 8], device='cuda:0')
tensor([[ 0.4823, 0.2964, 1.1482, ..., 0.2172, -0.4138, -1.0365],
[-0.7772, 0.2143, 0.5901, ..., 1.0894, 0.0747, -0.1231],
[-0.6043, -0.5650, 0.5918, ..., 0.3752, -0.4515, -2.1144],
...,
[-1.2358, -1.7264, -0.0189, ..., 1.6880, -1.1934, -1.3710],
[-1.6906, 0.5091, 0.9523, ..., -0.3808, -0.3868, -0.5108],
[-1.3468, 1.0241, -0.0880, ..., 1.0244, 0.3306, -1.9983]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 1.0097033977508545, sdloss: 1.0652413368225098
lossItem: 2.6301562786102295
Epoch: [0][0/160] Time 0.413 (0.413) Data 0.377 (0.377) Speed 620.382 (620.382) Loss 2.6302 (2.6302) Acc@1 8.203 (8.203) Acc@5 46.875 (46.875) Count 256
------------------------------------------------------------------------------------------------------------
Iteration num: 1
tensor([3, 1, 5, 7, 2, 5, 5, 8, 4, 3, 9, 7, 3, 4, 2, 6, 7, 2, 2, 0, 8, 1, 4, 9,
7, 7, 8, 8, 5, 6, 6, 8, 0, 4, 2, 4, 5, 8, 3, 6, 9, 7, 8, 9, 6, 7, 1, 8,
8, 1, 0, 8, 9, 2, 7, 1, 2, 0, 2, 4, 8, 8, 2, 9, 0, 1, 2, 6, 9, 9, 8, 5,
1, 0, 1, 5, 0, 9, 6, 8, 5, 9, 1, 0, 0, 9, 4, 1, 1, 7, 1, 1, 6, 7, 6, 5,
3, 8, 8, 0, 0, 4, 9, 1, 0, 1, 7, 3, 2, 4, 4, 6, 8, 1, 1, 8, 3, 3, 1, 9,
6, 0, 1, 5, 9, 3, 8, 4, 1, 6, 1, 1, 4, 9, 8, 7, 3, 0, 3, 4, 1, 9, 2, 1,
7, 8, 7, 6, 8, 2, 1, 3, 8, 9, 3, 5, 2, 1, 3, 9, 8, 7, 8, 8, 3, 7, 8, 3,
7, 0, 1, 0, 8, 1, 9, 0, 9, 1, 8, 7, 3, 5, 4, 9, 5, 3, 1, 6, 9, 1, 3, 2,
5, 5, 8, 8, 1, 0, 1, 3, 2, 9, 7, 6, 1, 4, 2, 3, 0, 7, 2, 4, 9, 2, 2, 7,
7, 2, 2, 0, 2, 5, 6, 9, 2, 5, 0, 2, 1, 2, 9, 5, 1, 7, 4, 3, 6, 4, 4, 5,
6, 2, 4, 8, 2, 3, 1, 3, 5, 7, 8, 8, 4, 1, 5, 1], device='cuda:0')
tensor([[ 2.4735, -1.9027, 0.9066, ..., -0.4060, 0.5519, -1.6199],
[-1.7410, -0.5936, 0.3711, ..., -0.0734, -2.2461, -1.0728],
[-0.6628, -1.2386, 0.8018, ..., -0.1567, -1.2360, -1.5239],
...,
[-1.0523, -2.6707, 0.6930, ..., 0.6356, -1.0273, -1.4660],
[ 1.9683, 0.0360, 0.0793, ..., 0.6470, -0.6063, -0.2374],
[ 0.3069, -1.1807, 0.6074, ..., 0.2922, 0.0501, -0.9813]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 1.083749532699585, sdloss: 1.0792979001998901
lossItem: 2.7319676876068115
------------------------------------------------------------------------------------------------------------
Iteration num: 2
tensor([9, 9, 6, 9, 8, 8, 5, 4, 0, 3, 0, 3, 4, 8, 2, 2, 2, 9, 2, 7, 7, 8, 3, 0,
5, 1, 6, 0, 4, 2, 7, 9, 0, 0, 8, 9, 7, 2, 2, 3, 3, 5, 6, 9, 1, 6, 5, 5,
3, 5, 0, 4, 5, 1, 4, 1, 9, 6, 5, 3, 0, 3, 7, 6, 4, 3, 7, 3, 2, 3, 9, 3,
9, 6, 6, 1, 4, 8, 7, 3, 1, 3, 4, 2, 6, 1, 6, 5, 9, 5, 3, 1, 2, 6, 4, 8,
2, 2, 0, 9, 8, 2, 4, 1, 5, 6, 4, 8, 8, 7, 9, 8, 8, 3, 1, 2, 0, 0, 5, 8,
1, 2, 6, 8, 1, 1, 0, 4, 8, 9, 3, 9, 6, 2, 4, 3, 4, 9, 8, 3, 7, 8, 3, 1,
1, 1, 7, 2, 3, 4, 0, 8, 5, 9, 1, 8, 4, 2, 7, 0, 1, 1, 3, 8, 1, 5, 4, 2,
4, 4, 9, 2, 0, 6, 2, 3, 5, 5, 5, 2, 2, 0, 7, 5, 1, 3, 9, 9, 9, 4, 5, 4,
9, 6, 8, 5, 2, 4, 6, 4, 5, 9, 7, 0, 9, 5, 6, 4, 5, 7, 5, 9, 7, 4, 9, 9,
7, 8, 0, 7, 7, 8, 1, 8, 2, 6, 6, 1, 6, 8, 8, 2, 5, 7, 7, 3, 4, 7, 1, 8,
5, 5, 7, 7, 9, 2, 0, 3, 1, 8, 6, 5, 3, 5, 2, 2], device='cuda:0')
tensor([[-2.0396, -0.5301, 0.8984, ..., -0.0925, 0.3624, -2.6404],
[-1.1958, 0.9594, 0.7785, ..., 0.4484, -0.2216, -2.3675],
[-1.0618, -0.5347, 0.1386, ..., -0.2420, -0.6653, -1.2504],
...,
[ 0.3692, -1.9406, -0.1676, ..., 0.6972, 0.2905, -1.9543],
[ 0.1557, -0.1204, -0.0210, ..., 1.1594, -1.2747, -1.6906],
[-3.9985, -3.0479, 2.0096, ..., -0.5583, 1.6112, -2.3361]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 1.2101424932479858, sdloss: 1.3503553867340088
lossItem: 2.141524314880371
------------------------------------------------------------------------------------------------------------
Iteration num: 3
tensor([9, 0, 9, 0, 4, 5, 5, 0, 7, 7, 7, 0, 3, 1, 1, 2, 5, 4, 9, 1, 7, 0, 5, 6,
2, 9, 2, 3, 4, 1, 9, 4, 1, 4, 1, 8, 7, 5, 5, 7, 9, 9, 9, 4, 7, 7, 5, 8,
4, 6, 6, 8, 0, 6, 2, 0, 2, 7, 5, 0, 8, 4, 1, 8, 2, 9, 4, 1, 8, 2, 2, 7,
8, 7, 5, 4, 4, 4, 9, 7, 9, 0, 9, 8, 2, 2, 4, 7, 6, 4, 9, 7, 1, 1, 4, 7,
7, 4, 5, 2, 1, 4, 5, 2, 0, 9, 4, 3, 8, 3, 7, 1, 6, 0, 2, 8, 0, 6, 1, 5,
2, 2, 6, 2, 0, 9, 0, 1, 6, 5, 4, 1, 7, 9, 1, 4, 9, 0, 9, 2, 0, 4, 9, 9,
8, 4, 3, 4, 3, 0, 8, 9, 2, 0, 9, 3, 5, 5, 5, 0, 3, 7, 3, 9, 7, 4, 3, 0,
4, 2, 3, 1, 3, 5, 4, 2, 4, 9, 9, 5, 6, 0, 5, 9, 3, 4, 3, 2, 5, 1, 9, 3,
2, 2, 4, 2, 7, 2, 7, 9, 1, 7, 5, 1, 1, 5, 0, 0, 2, 5, 0, 9, 7, 3, 0, 7,
7, 9, 2, 9, 9, 4, 3, 5, 4, 2, 6, 9, 7, 0, 6, 5, 4, 7, 9, 0, 3, 8, 6, 0,
5, 3, 5, 4, 8, 5, 5, 8, 7, 8, 4, 4, 3, 8, 1, 2], device='cuda:0')
tensor([[ 0.6528, -0.4094, 0.7019, ..., 0.2155, -1.9526, -2.3808],
[-0.5322, -1.1375, -0.5830, ..., 1.2385, -0.1592, -1.0004],
[-1.0041, -0.2019, 1.3646, ..., -0.2524, -0.2368, -2.4263],
...,
[-1.1273, -0.6086, 2.0380, ..., 1.7105, -1.1619, -1.7979],
[-0.3133, 0.6419, 1.6696, ..., 0.2661, 0.4561, -0.3616],
[-0.8983, -0.9374, -0.1871, ..., -0.2925, -0.6359, -3.0607]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 1.4503097534179688, sdloss: 1.783054232597351
lossItem: 1.7175476551055908
------------------------------------------------------------------------------------------------------------
Iteration num: 4
tensor([0, 6, 8, 4, 6, 0, 2, 6, 2, 3, 1, 3, 9, 1, 8, 2, 6, 0, 7, 0, 2, 9, 7, 0,
4, 7, 5, 6, 2, 8, 2, 7, 3, 0, 2, 4, 7, 2, 2, 2, 0, 3, 3, 5, 2, 3, 7, 6,
6, 9, 0, 9, 1, 6, 2, 2, 5, 2, 7, 3, 0, 8, 8, 2, 9, 9, 5, 4, 6, 2, 8, 7,
4, 8, 5, 6, 5, 8, 7, 4, 1, 8, 9, 2, 1, 8, 2, 7, 4, 4, 2, 2, 3, 6, 9, 2,
3, 7, 1, 5, 5, 1, 0, 1, 6, 5, 2, 6, 4, 7, 0, 0, 0, 7, 7, 1, 8, 7, 3, 4,
6, 5, 3, 1, 2, 1, 1, 9, 0, 3, 3, 0, 1, 8, 0, 0, 6, 2, 8, 7, 7, 1, 2, 9,
4, 4, 3, 0, 8, 4, 5, 7, 1, 0, 8, 5, 8, 7, 2, 4, 5, 9, 8, 6, 1, 9, 8, 0,
1, 2, 1, 0, 2, 2, 4, 7, 7, 4, 1, 7, 2, 3, 0, 4, 2, 5, 0, 6, 9, 2, 4, 7,
0, 7, 2, 6, 3, 1, 9, 2, 8, 1, 1, 0, 8, 0, 2, 0, 2, 5, 1, 8, 5, 6, 9, 6,
5, 6, 1, 0, 7, 6, 7, 1, 8, 5, 2, 3, 5, 4, 7, 2, 9, 6, 4, 9, 3, 4, 6, 6,
1, 0, 0, 4, 7, 4, 8, 2, 7, 7, 5, 4, 7, 8, 5, 5], device='cuda:0')
tensor([[-1.0180, -0.8228, 0.5158, ..., 0.3549, 0.0072, -1.9437],
[ 0.1443, -0.7663, 1.4854, ..., -0.7737, -1.0578, -2.4985],
[-1.2494, -0.8026, 0.8820, ..., 1.1737, -0.8699, -1.6476],
...,
[-0.8916, -1.7457, 1.1313, ..., -0.2323, -0.7291, -1.3231],
[-1.6000, -0.0373, 0.0189, ..., -0.5194, -0.4465, -1.7128],
[-1.2338, -1.1997, 1.0362, ..., 0.5314, -0.4950, -2.2386]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 1.7753264904022217, sdloss: 3.022984266281128
lossItem: 1.0905673503875732
------------------------------------------------------------------------------------------------------------
Iteration num: 5
tensor([6, 8, 3, 8, 4, 7, 8, 1, 4, 6, 7, 8, 3, 6, 9, 5, 6, 5, 4, 1, 6, 6, 4, 8,
8, 7, 4, 7, 1, 1, 6, 9, 2, 7, 5, 7, 0, 3, 2, 7, 3, 9, 7, 7, 1, 1, 1, 2,
6, 6, 4, 5, 5, 0, 2, 4, 2, 3, 6, 9, 0, 8, 9, 0, 0, 6, 5, 0, 3, 4, 8, 8,
8, 3, 0, 9, 3, 6, 3, 0, 5, 5, 0, 6, 4, 6, 4, 6, 0, 7, 3, 4, 2, 4, 7, 4,
9, 5, 3, 6, 9, 7, 6, 2, 4, 9, 2, 4, 8, 9, 0, 9, 1, 5, 4, 9, 8, 2, 0, 6,
8, 3, 1, 7, 5, 4, 8, 9, 4, 3, 9, 3, 7, 6, 5, 4, 1, 6, 0, 7, 9, 4, 4, 0,
4, 5, 1, 7, 7, 9, 8, 3, 8, 3, 6, 0, 3, 0, 2, 2, 9, 1, 5, 7, 9, 4, 4, 5,
2, 7, 0, 3, 7, 9, 0, 8, 9, 2, 6, 2, 7, 6, 8, 3, 4, 7, 4, 7, 3, 5, 7, 0,
6, 7, 5, 1, 5, 5, 2, 2, 9, 7, 7, 0, 3, 7, 1, 8, 2, 0, 4, 6, 5, 1, 9, 0,
5, 5, 0, 0, 0, 1, 4, 5, 4, 7, 3, 4, 1, 9, 3, 4, 9, 5, 0, 9, 0, 4, 8, 7,
9, 3, 0, 4, 6, 1, 2, 4, 8, 6, 4, 0, 1, 6, 4, 9], device='cuda:0')
tensor([[-0.3739, -2.0318, 0.8308, ..., -1.5842, -1.1179, -2.8031],
[-0.9913, 0.5701, 2.1547, ..., 1.1520, -0.5210, -2.3975],
[-0.5808, -0.2658, 0.6696, ..., 0.8275, -0.4249, -1.5584],
...,
[-0.8302, -0.4249, 0.8844, ..., -0.0033, -1.1271, -1.9469],
[-1.2628, -1.2556, 1.3446, ..., 0.0410, -0.2970, -1.7601],
[-0.7990, -1.3058, 0.9276, ..., 0.2087, 0.3230, -1.5558]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 2.0537357330322266, sdloss: 3.4425933361053467
lossItem: 1.0173795223236084
------------------------------------------------------------------------------------------------------------
Iteration num: 6
tensor([0, 6, 5, 1, 3, 3, 4, 2, 4, 1, 5, 0, 7, 7, 3, 3, 5, 6, 0, 2, 6, 2, 6, 4,
6, 4, 5, 2, 8, 2, 9, 3, 5, 4, 0, 5, 2, 3, 1, 3, 4, 3, 1, 1, 6, 7, 2, 5,
5, 6, 9, 1, 8, 2, 9, 2, 4, 0, 3, 5, 5, 9, 7, 5, 0, 4, 0, 5, 7, 4, 6, 4,
5, 7, 7, 8, 8, 7, 2, 1, 4, 3, 7, 3, 1, 6, 3, 2, 2, 6, 3, 1, 5, 8, 1, 2,
1, 0, 8, 4, 0, 7, 3, 5, 1, 8, 8, 3, 9, 1, 3, 7, 7, 3, 7, 5, 0, 3, 5, 1,
7, 4, 3, 0, 0, 1, 6, 8, 8, 5, 2, 7, 1, 9, 4, 0, 0, 0, 0, 5, 3, 6, 2, 6,
7, 8, 8, 3, 5, 3, 5, 3, 1, 7, 5, 7, 3, 6, 1, 7, 3, 1, 7, 8, 7, 0, 0, 6,
9, 8, 2, 3, 0, 1, 2, 3, 3, 9, 2, 3, 0, 9, 1, 6, 7, 5, 3, 5, 0, 1, 4, 4,
3, 5, 2, 8, 4, 3, 9, 0, 3, 0, 6, 4, 9, 7, 4, 7, 3, 6, 8, 2, 4, 9, 0, 5,
4, 0, 5, 4, 8, 9, 6, 1, 3, 6, 2, 0, 7, 9, 3, 0, 7, 4, 0, 7, 4, 7, 9, 9,
9, 7, 3, 8, 1, 6, 5, 7, 9, 6, 2, 8, 2, 9, 8, 3], device='cuda:0')
tensor([[-2.3020, -1.6480, 1.7204, ..., -0.2290, -0.7170, -2.3077],
[-1.7064, -0.6016, 1.8866, ..., 0.6248, -0.3797, -2.5742],
[-1.4687, 0.2669, 1.4310, ..., 0.5527, 0.2978, -2.1013],
...,
[-0.8727, -1.0016, 0.6120, ..., -0.7191, 0.1590, -2.0574],
[-0.5944, -0.9997, 0.5103, ..., 0.5657, -1.0921, -2.7526],
[-1.6110, -0.7161, 1.2556, ..., 0.2084, 0.8581, -1.2577]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 2.4133262634277344, sdloss: 3.8655548095703125
lossItem: 0.9866995215415955
------------------------------------------------------------------------------------------------------------
Iteration num: 7
tensor([0, 1, 2, 6, 6, 7, 5, 1, 0, 7, 6, 8, 6, 4, 1, 2, 7, 9, 4, 3, 7, 1, 0, 1,
9, 2, 5, 8, 4, 3, 0, 3, 6, 6, 6, 9, 5, 2, 8, 4, 0, 4, 6, 9, 3, 3, 2, 1,
9, 9, 7, 7, 7, 8, 3, 6, 9, 4, 4, 8, 4, 7, 7, 2, 7, 5, 0, 1, 7, 2, 2, 0,
1, 3, 9, 3, 8, 8, 3, 4, 8, 5, 9, 9, 9, 1, 2, 8, 7, 5, 4, 2, 9, 9, 2, 6,
7, 9, 8, 4, 6, 8, 3, 1, 5, 0, 8, 1, 2, 4, 9, 6, 8, 0, 6, 6, 5, 3, 4, 8,
7, 2, 6, 3, 3, 6, 7, 2, 7, 4, 4, 2, 5, 9, 4, 1, 3, 0, 2, 7, 1, 0, 3, 7,
5, 0, 4, 6, 4, 2, 5, 1, 2, 1, 6, 5, 6, 6, 3, 6, 1, 9, 6, 4, 9, 9, 1, 7,
2, 4, 1, 5, 8, 2, 6, 9, 3, 4, 9, 9, 9, 0, 0, 9, 9, 6, 9, 2, 4, 8, 8, 7,
6, 1, 6, 9, 3, 7, 0, 8, 5, 2, 0, 2, 2, 7, 1, 9, 2, 9, 1, 0, 9, 6, 1, 6,
8, 4, 3, 1, 0, 2, 8, 2, 3, 9, 8, 4, 2, 1, 7, 7, 0, 6, 5, 2, 0, 2, 9, 5,
9, 0, 0, 4, 8, 0, 8, 5, 0, 2, 6, 6, 1, 7, 8, 8], device='cuda:0')
tensor([[ -1.2120, -2.0005, 1.6341, ..., 0.6033, -0.5305, -3.2017],
[ -1.8019, -0.4143, 1.4597, ..., 0.8472, -0.2829, -1.6098],
[ -2.7603, -1.1355, 2.4807, ..., 0.5621, -0.0642, -3.7420],
...,
[ -1.2322, -1.4504, 1.4420, ..., 0.6432, 0.4821, -2.5214],
[-15.2949, -7.3248, 25.2044, ..., 1.4242, 1.4580, -13.7431],
[ -2.4870, -0.2798, 1.9545, ..., -0.7157, -1.0314, -2.4235]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 2.6089565753936768, sdloss: 4.552063465118408
lossItem: 0.8715435266494751
------------------------------------------------------------------------------------------------------------
Iteration num: 8
tensor([6, 6, 0, 8, 0, 0, 7, 6, 2, 2, 9, 7, 6, 6, 4, 1, 0, 0, 5, 6, 5, 7, 3, 6,
9, 9, 5, 5, 3, 4, 3, 3, 9, 9, 8, 7, 6, 2, 7, 0, 1, 6, 7, 1, 3, 2, 6, 0,
7, 8, 3, 7, 5, 8, 9, 1, 7, 6, 1, 9, 8, 9, 7, 3, 4, 7, 9, 6, 8, 9, 5, 9,
9, 9, 2, 8, 2, 4, 9, 7, 2, 6, 1, 5, 7, 3, 5, 1, 3, 1, 4, 8, 9, 4, 6, 3,
8, 4, 1, 8, 8, 9, 8, 0, 1, 2, 3, 5, 0, 4, 9, 4, 0, 2, 9, 2, 8, 5, 0, 8,
9, 6, 2, 1, 6, 0, 7, 4, 7, 2, 5, 5, 3, 8, 5, 0, 5, 0, 9, 1, 0, 8, 6, 4,
0, 0, 1, 0, 2, 1, 8, 4, 2, 7, 0, 4, 2, 4, 2, 4, 9, 7, 6, 9, 2, 7, 7, 9,
6, 4, 4, 0, 9, 7, 0, 5, 8, 1, 5, 5, 3, 8, 1, 7, 6, 9, 4, 1, 1, 6, 4, 2,
9, 2, 4, 5, 7, 5, 5, 5, 0, 6, 4, 7, 2, 1, 7, 7, 2, 8, 2, 4, 9, 5, 6, 4,
3, 2, 5, 0, 2, 3, 7, 5, 8, 7, 3, 3, 2, 1, 7, 0, 0, 9, 3, 7, 8, 8, 7, 8,
9, 7, 3, 5, 4, 5, 8, 7, 2, 7, 8, 7, 4, 7, 6, 6], device='cuda:0')
tensor([[ -0.7124, -0.7721, 0.6896, ..., 0.2432, -0.4004, -1.7503],
[ -0.9975, -1.2593, 1.7826, ..., 0.7399, -0.7758, -3.1522],
[-10.9876, -7.0715, 17.1001, ..., 0.4595, -2.0494, -11.1477],
...,
[ -0.7248, -0.7171, 0.8077, ..., 0.2410, -0.2197, -1.9328],
[ -0.7845, -1.0941, 0.5888, ..., 0.4366, -0.3011, -1.8252],
[ -0.7055, -0.7121, 0.6080, ..., 0.2347, -0.2328, -1.7705]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 2.9365410804748535, sdloss: 5.954787254333496
lossItem: 0.7344008088111877
------------------------------------------------------------------------------------------------------------
Iteration num: 9
tensor([3, 2, 2, 5, 5, 9, 7, 8, 9, 5, 7, 4, 9, 9, 1, 6, 4, 2, 8, 7, 1, 2, 0, 8,
3, 4, 9, 3, 7, 3, 8, 0, 1, 6, 4, 2, 8, 9, 4, 6, 6, 7, 5, 6, 6, 8, 9, 6,
1, 8, 3, 8, 6, 3, 1, 5, 3, 3, 5, 9, 3, 4, 2, 3, 0, 3, 8, 0, 9, 1, 5, 0,
4, 8, 6, 0, 3, 9, 3, 9, 7, 7, 7, 8, 1, 3, 3, 7, 2, 6, 1, 7, 4, 0, 2, 6,
7, 1, 2, 6, 1, 8, 1, 8, 5, 5, 5, 3, 6, 5, 5, 9, 5, 0, 5, 7, 8, 6, 3, 3,
8, 4, 5, 9, 2, 1, 3, 6, 1, 2, 0, 2, 3, 2, 6, 0, 9, 5, 5, 0, 4, 9, 5, 4,
9, 4, 3, 1, 8, 1, 0, 2, 4, 9, 1, 1, 1, 1, 3, 1, 4, 9, 9, 7, 0, 3, 1, 9,
4, 2, 6, 6, 7, 5, 3, 0, 5, 3, 1, 1, 9, 9, 5, 4, 7, 9, 6, 7, 1, 5, 6, 9,
1, 6, 3, 4, 7, 5, 3, 9, 9, 3, 5, 7, 8, 6, 3, 2, 1, 3, 0, 0, 9, 4, 4, 0,
6, 5, 9, 7, 3, 2, 7, 6, 4, 1, 9, 1, 5, 6, 8, 1, 7, 6, 9, 0, 5, 2, 1, 8,
2, 7, 1, 6, 9, 4, 1, 7, 0, 0, 6, 2, 6, 2, 7, 2], device='cuda:0')
tensor([[-3.3316, -0.7074, 3.9191, ..., -0.1669, -1.0646, -6.2732],
[-1.6146, -1.7145, 1.6094, ..., 1.8238, -0.8163, -3.8408],
[-1.2954, -1.1230, 1.4273, ..., 1.1358, -0.4365, -3.4556],
...,
[-1.2492, -0.4321, 2.2156, ..., -0.2905, -0.5351, -2.7494],
[-2.2683, -0.8141, 1.8049, ..., -0.1424, -0.7121, -3.8456],
[-3.1600, -0.8897, 2.0659, ..., 0.1129, -1.1949, -3.0344]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 3.325645923614502, sdloss: 6.283945083618164
lossItem: 0.7559524774551392
------------------------------------------------------------------------------------------------------------
Iteration num: 10
tensor([2, 4, 4, 3, 9, 5, 7, 9, 2, 8, 0, 7, 7, 8, 0, 9, 7, 7, 3, 0, 3, 4, 4, 2,
0, 3, 5, 4, 0, 3, 2, 2, 5, 3, 3, 8, 2, 8, 1, 4, 3, 1, 1, 7, 7, 1, 3, 2,
6, 8, 2, 9, 0, 4, 1, 7, 2, 0, 0, 1, 6, 2, 8, 4, 5, 9, 0, 4, 7, 4, 5, 6,
3, 0, 3, 2, 4, 1, 6, 3, 7, 4, 2, 5, 1, 2, 5, 9, 4, 5, 1, 9, 8, 2, 5, 9,
2, 1, 7, 1, 0, 0, 8, 4, 2, 3, 0, 1, 0, 3, 5, 1, 0, 2, 6, 1, 3, 5, 6, 8,
8, 3, 1, 3, 7, 8, 2, 5, 3, 7, 4, 9, 3, 2, 8, 1, 9, 6, 9, 7, 2, 4, 2, 4,
0, 1, 6, 0, 1, 6, 3, 0, 2, 1, 3, 7, 0, 4, 2, 2, 2, 9, 2, 8, 0, 1, 9, 2,
3, 7, 8, 6, 8, 8, 0, 9, 8, 6, 4, 5, 8, 1, 0, 3, 7, 2, 9, 8, 7, 0, 1, 1,
1, 3, 3, 9, 4, 6, 2, 1, 0, 6, 1, 1, 0, 4, 2, 1, 8, 8, 3, 3, 1, 7, 7, 5,
0, 9, 2, 3, 6, 3, 7, 6, 3, 4, 1, 6, 1, 0, 0, 4, 7, 7, 8, 1, 3, 5, 3, 3,
3, 4, 8, 4, 9, 2, 8, 4, 0, 7, 2, 1, 0, 1, 0, 0], device='cuda:0')
tensor([[ -5.6106, -1.6667, 5.2138, ..., 0.4415, -0.4450, -6.3135],
[ -8.9125, -3.7542, 19.5806, ..., -0.4064, 0.3483, -11.4086],
[ -1.0877, -1.1826, 0.8344, ..., -0.1360, -0.6348, -2.6532],
...,
[ -0.7358, -0.7670, 0.9107, ..., -0.0585, -0.3440, -1.8350],
[ -4.5926, -2.8046, 5.5985, ..., 0.7487, -1.1452, -4.6941],
[ -2.8975, -1.7148, 2.8584, ..., 0.5823, -0.2585, -4.7432]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 3.765282392501831, sdloss: 7.975272178649902
lossItem: 0.5635203123092651
------------------------------------------------------------------------------------------------------------
Iteration num: 11
tensor([8, 9, 9, 8, 2, 7, 4, 2, 3, 7, 1, 1, 2, 2, 6, 7, 9, 4, 3, 1, 3, 3, 9, 8,
5, 8, 9, 2, 9, 8, 0, 1, 4, 9, 8, 2, 6, 5, 1, 1, 4, 8, 5, 6, 8, 9, 7, 9,
7, 6, 5, 5, 9, 3, 3, 6, 6, 4, 8, 0, 7, 3, 8, 6, 8, 1, 0, 4, 8, 1, 6, 7,
4, 8, 5, 4, 6, 0, 7, 2, 1, 2, 1, 9, 8, 1, 0, 3, 5, 6, 5, 9, 2, 6, 1, 2,
5, 9, 6, 7, 3, 0, 1, 1, 2, 9, 7, 5, 4, 3, 5, 1, 3, 7, 2, 2, 6, 3, 9, 2,
0, 6, 4, 3, 4, 0, 7, 3, 4, 3, 7, 1, 4, 9, 5, 9, 7, 9, 1, 4, 2, 3, 1, 6,
9, 4, 7, 7, 0, 6, 9, 0, 8, 3, 2, 0, 3, 8, 8, 9, 0, 6, 8, 3, 3, 7, 0, 5,
4, 4, 9, 1, 2, 4, 4, 5, 0, 7, 2, 4, 3, 2, 6, 2, 7, 3, 8, 3, 2, 9, 3, 6,
4, 6, 2, 1, 3, 5, 7, 3, 9, 4, 0, 6, 6, 0, 2, 5, 3, 6, 4, 5, 2, 2, 2, 6,
4, 8, 5, 5, 4, 8, 3, 1, 4, 3, 2, 4, 2, 6, 8, 0, 7, 9, 4, 9, 4, 6, 0, 2,
9, 2, 2, 4, 3, 3, 4, 9, 1, 1, 2, 6, 1, 9, 9, 4], device='cuda:0')
tensor([[-0.7936, -0.6890, 0.8799, ..., 0.4396, -0.1954, -1.9341],
[-0.8742, -0.8172, 0.9014, ..., 0.4542, -0.2890, -1.4936],
[-0.8265, -0.9503, 0.8600, ..., 0.8367, -0.4419, -1.8943],
...,
[-1.2959, -0.6144, 1.8038, ..., -0.4317, -0.0171, -2.9657],
[-0.9658, -0.5634, 1.6531, ..., 0.4865, -0.3951, -1.8265],
[-1.1988, -1.1405, 0.9006, ..., 0.7609, -0.4783, -2.2097]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 4.061051368713379, sdloss: 8.536210060119629
lossItem: 0.564131498336792
------------------------------------------------------------------------------------------------------------
Iteration num: 12
tensor([2, 4, 5, 7, 3, 7, 4, 2, 6, 2, 4, 7, 4, 3, 6, 6, 6, 0, 4, 0, 8, 7, 7, 3,
5, 8, 9, 1, 8, 0, 9, 8, 6, 1, 4, 9, 4, 4, 9, 1, 6, 8, 1, 9, 2, 5, 2, 0,
2, 3, 0, 2, 7, 4, 1, 8, 4, 2, 7, 5, 7, 6, 5, 8, 8, 1, 0, 9, 6, 9, 1, 0,
2, 5, 3, 2, 9, 3, 2, 6, 7, 4, 4, 8, 5, 1, 0, 2, 4, 2, 8, 5, 2, 6, 7, 1,
3, 3, 9, 7, 0, 4, 8, 5, 6, 0, 6, 9, 2, 4, 1, 7, 8, 3, 0, 6, 6, 0, 1, 6,
5, 1, 1, 9, 8, 2, 3, 6, 6, 9, 2, 0, 1, 7, 9, 2, 3, 3, 6, 6, 9, 2, 5, 3,
1, 9, 4, 9, 5, 9, 5, 7, 3, 1, 5, 5, 9, 8, 6, 8, 7, 1, 5, 3, 4, 6, 2, 7,
3, 7, 0, 7, 1, 6, 7, 4, 2, 8, 0, 6, 7, 8, 0, 0, 5, 5, 5, 9, 9, 1, 4, 1,
6, 1, 2, 6, 5, 1, 2, 1, 8, 1, 2, 1, 1, 8, 8, 9, 0, 2, 4, 4, 1, 3, 7, 8,
7, 6, 0, 3, 4, 0, 4, 2, 3, 5, 1, 2, 4, 2, 9, 5, 6, 4, 6, 3, 8, 8, 0, 8,
7, 6, 8, 1, 6, 5, 7, 0, 8, 7, 5, 3, 4, 3, 6, 2], device='cuda:0')
tensor([[-3.4676, -1.7467, 3.2783, ..., -0.1886, -1.3624, -4.7153],
[-0.5332, -0.7577, 0.7168, ..., 0.5388, -0.2941, -1.6675],
[-0.5332, -0.7577, 0.7168, ..., 0.5388, -0.2941, -1.6675],
...,
[-0.5332, -0.7577, 0.7168, ..., 0.5388, -0.2941, -1.6675],
[-0.5332, -0.7577, 0.7168, ..., 0.5388, -0.2941, -1.6675],
[-0.9877, -0.9210, 1.6490, ..., 0.4779, -0.2820, -2.3817]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 4.560257911682129, sdloss: 10.405665397644043
lossItem: 0.49603745341300964
------------------------------------------------------------------------------------------------------------
Iteration num: 13
tensor([0, 4, 7, 7, 7, 7, 7, 9, 6, 3, 5, 9, 2, 2, 2, 2, 5, 0, 7, 8, 9, 9, 7, 8,
0, 9, 4, 6, 7, 2, 0, 6, 9, 0, 9, 6, 4, 5, 1, 6, 9, 2, 1, 9, 1, 8, 1, 8,
0, 4, 7, 6, 1, 2, 0, 1, 2, 5, 7, 2, 0, 4, 3, 7, 8, 0, 6, 4, 5, 3, 1, 7,
5, 5, 4, 8, 1, 2, 7, 4, 3, 5, 4, 3, 3, 0, 7, 1, 4, 5, 3, 5, 6, 6, 8, 1,
8, 1, 5, 2, 9, 5, 7, 1, 2, 9, 5, 4, 1, 1, 2, 7, 6, 5, 0, 5, 1, 3, 5, 2,
0, 9, 6, 4, 1, 5, 3, 0, 9, 1, 6, 7, 4, 9, 6, 5, 4, 5, 6, 8, 6, 1, 6, 3,
1, 7, 8, 2, 2, 1, 0, 7, 3, 2, 1, 0, 6, 2, 0, 7, 5, 6, 7, 3, 0, 1, 3, 2,
3, 7, 0, 5, 3, 1, 4, 8, 2, 1, 6, 0, 6, 6, 8, 9, 0, 9, 0, 2, 6, 4, 5, 9,
3, 9, 3, 7, 4, 9, 2, 0, 5, 8, 1, 7, 9, 3, 4, 0, 8, 9, 7, 3, 3, 8, 4, 6,
2, 5, 5, 1, 9, 5, 8, 9, 5, 6, 0, 4, 0, 6, 8, 3, 1, 5, 8, 1, 7, 1, 3, 4,
0, 0, 7, 3, 9, 1, 7, 1, 8, 2, 4, 3, 1, 1, 4, 8], device='cuda:0')
tensor([[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538],
[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538],
[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538],
...,
[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538],
[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538],
[-4.7163, -2.5053, 8.4166, ..., -0.2740, -1.3736, -7.8538]],
device='cuda:0', grad_fn=<MmBackward>)
sdactivations: 4.476424217224121, sdloss: 4.369915008544922
lossItem: 2.0310254096984863
------------------------------------------------------------------------------------------------------------
Iteration num: 14
tensor([5, 3, 4, 0, 6, 3, 4, 8, 4, 6, 8, 5, 1, 9, 2, 3, 4, 6, 8, 2, 3, 1, 3, 6,
7, 1, 3, 2, 2, 4, 5, 1, 3, 1, 9, 1, 2, 8, 9, 4, 4, 1, 1, 6, 1, 8, 5, 1,
7, 0, 1, 4, 7, 6, 1, 7, 4, 3, 2, 8, 6, 0, 5, 5, 1, 1, 8, 1, 5, 3, 9, 2,
0, 4, 8, 2, 7, 3, 6, 6, 0, 4, 2, 9, 6, 8, 4, 4, 7, 8, 9, 1, 6, 1, 2, 5,
9, 1, 8, 4, 6, 4, 8, 5, 5, 8, 4, 7, 4, 0, 7, 3, 7, 7, 7, 3, 0, 6, 5, 7,
7, 3, 9, 9, 8, 5, 5, 5, 9, 2, 0, 5, 1, 8, 4, 9, 5, 9, 5, 9, 7, 1, 4, 9,
2, 2, 8, 2, 2, 2, 0, 0, 5, 0, 3, 0, 1, 1, 0, 5, 7, 7, 4, 8, 9, 5, 5, 0,
8, 1, 0, 1, 4, 5, 0, 1, 4, 8, 6, 0, 7, 3, 6, 2, 5, 9, 6, 8, 8, 2, 3, 5,
7, 8, 2, 1, 6, 0, 5, 0, 3, 7, 1, 9, 5, 0, 1, 4, 9, 3, 7, 5, 6, 4, 5, 6,
3, 8, 3, 6, 9, 1, 7, 0, 6, 7, 8, 9, 3, 6, 0, 7, 3, 2, 2, 0, 6, 5, 6, 1,
4, 2, 9, 1, 2, 1, 5, 2, 7, 8, 2, 2, 0, 8, 6, 9], device='cuda:0')
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<MmBackward>)
sdactivations: nan, sdloss: nan
lossItem: nan
------------------------------------------------------------------------------------------------------------
I tried 5 times. It gets nan around the 13th iteration. The first tensor in an iteration is the target and the second is the output.