Understanding Vector-Jacobian Product

I’ve been reading about graphs and autograd and bit and came across the following:-

Pytorch doesn’t compute Jacobian and instead uses VJP directly to calculate the derivates directly.

So I had a few questions in mind:-

  1. What exactly does VJP do? I understand it gives us product between vector and Jacobian but what is that?
  2. What is v, it’s cotangent but what role does it play? By default it’s a tensor of 1 which people keep saying is “differentiation by self”. But that’s just adding up the columns, how does adding columns give you derivative?
  3. If possible, can someone explain how you calculate VJP directly from the graph?

Been looking everywhere at this moment and with each article I read it seems no one directly addresses this topic. Kindly Help :smiling_face_with_tear: