-
if
loss = error.pow(2).sum()/2.0
,dloss/derror = error
ifloss = error.pow(2).mean()
,dloss/derror = 2*error/(batch_size)
, your batch_size is 64 here.
because in numpy implementation,outerror = error
, so we should use the first form ofloss
. -
I print (error**2).mean().data[0], because you are doing this in numpy
loss = (error ** 2).mean()
...
...
print(epoch, loss)
- they are the same, but the pytorch code can backward and calculate grad automatically.