Confused about how a completed SGD is implemented in PyTorch?

I am dazzled about all the pieces of tutorials from any kind of websites. How SGD was implemented in PyTorch?
lr: learning rate
w: weights
dw: the grad of weights

Is this equation w'= momentum * w - lr * (dw + weight_decay * w) right?
Or this v=momentum * v(t-1) + (dw + weight_decay * w), then w = w - lr * v, here v(t-1) means the last time v.