Proper way to generate gradient of log_prob(random_variable) where random variable is not sampled from the distribution

Hi,
Grad of network parameters are all zero even though the loss value is non-zero. I don’t want to sample from the distribution and train (not implementing vanilla REINFORCE)

probs = policy(state)
m = Catagorical(probs)
log_prob = m.log_prob(action) # **action** generated from another source(not a neural network)
loss =  - log_prob * R
loss.backward()