I am wondering what causes the difference between the following two way of writing the same operation? Here ‘‘s’’ is a matrix with shape (batch_size x 4), and ‘‘model’’ is a DNN.
out_1 = model(ss[0:1])
out_2 = model(ss)[0:1]
[[-3.5763, -2.3842, 0.0000, 4.7684]], device='cuda:1')
The difference looks like it’s due to floating point precision. Both results should be equal.
batch size will be used by some layers (e.g. batch norm). If your network is complicated, it’s not surprised that different batch sizes have different results.
No, my network is a shallow fully-connected network without any batch normalization.
This was my initial guess as well, but it is not consistent with the following experiment. For example, let’s say all rows of ss is the same. Then the following set of operation results in exact zero difference.
out_1 = model(ss)[0:1]
out_2 = model(ss)[1:2]
tensor( [[0.0, 0.0, 0.0, 0.0]], device='cuda:1')