Fisher Matrix Calculation: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

I am trying to update Fisher matrix using this code:

def _update_fisher_params(self, current_ds, batch_size, num_batch):
	log_liklihoods = []
	i=1
	for inputs, target in tqdm(current_ds):
		inputs=inputs.cuda()
		target= target.cuda()
		if i >= num_batch:
			break

		output = F.log_softmax(self.model(inputs), dim=1).detach().requires_grad_(True)
		i+=1
		log_liklihoods.append(output[:, target])
	
	log_likelihood = torch.cat(log_liklihoods).mean()
	grad_log_liklihood = autograd.grad(log_likelihood, self.model.parameters(), create_graph=True, retain_graph =True)

I have tried create_graph=True and retain_graph=True also. But still, it gives the error: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Also, if I remove detach() from my code, it says Cuda out of memory at the line where I append my tensors to the list(log_liklihoods).
Can someone please tell me where am I wrong? I am unable to understand anymore

If you are calling detach() on the output of F.log_softmax, the computation graph will be cut at this place, so that Autograd won’t be able to calculate the gradients for the former part of the graph.
You might need to reduce num_batch, if you are running out of memory.

Is there any other way apart from detach() where I don’t run out of memory?
Also, my batch size os 32, this loop barely runs for 5 batches and says CUDA out of memory if I don’t use detach. I want to calculate the fisher matrix with more number of samples from Dataset A, how can I still run this?

Or is there any efficient of calculating fisher?