Standard way to go through 1 epoch in SGD with a Model (especially NN)

ok, it seems there were some confusions. When I said SGD I meant as its normally (informally) implied, mini batch stochastic gradient descent. I didn’t actually expect 1 point at a time and only seeing the data only once (true SGD).

Why does the additional stochasticity mean we copy data? Can’t we just do it in place? I don’t understand what part of the code would copy the data without the coders permission. Is the line you are referring to:

            cost += train(model, loss, optimizer,
                      trX[start:end], trY[start:end])

does the indexing of the FloatTensor copy the data unnecessarily?