Incorrect gradient for combined network

Hi there. I am using combined network made of pre-trained ESM transformer by FAIR and my own classifier. I should mention that I changed first layer of transformer to identity layer because i want to calculate embeddings with embedd. layer of this transformer separately in order to be able to manipulate them later.
I want to calculate gradient for the input however, when i use torch autograd I get a wrong one. I checked it by manualy calculating grad for one position using f(x+e)-f(x-e))/2e where f- is my combined network, x - specific position of input, e - small increment. I am not sure what exactly was wrong with my automatic diff.
here is my code:

# ======== split the transformer model in two, getting access to the embedding layer=====
splitted_model = []
for name, module in transformer.named_children():

# take the first layer
embedding_layer = splitted_model[0]
embedding = embedding_layer(input_tokens)
#replace the embedding layer with an Identity layer
identity_layer = torch.nn.Identity()
transformer.embed_tokens = identity_layer
#set token dropout to False

# this will be an input to my combined network
input_embedding = embedding_layer(input_tokens)

# ========. combine transformer and my classifier. ==============
# create class
class FullModel(nn.Module):
    def __init__(self, transformer, classifier_nn):
        super(FullModel, self).__init__()
        self.transformer = transformer
        self.classifier_nn = classifier_nn
    def forward(self, x):
# calculate representation matrix of a shape L, E. L - length on input sequence, E - length of feature verctor. token 0 is a start-of-sequence token, so the first symbol of input is token 1
        x1 = self.transformer(x, repr_layers=[34])["representations"][34][0,1 : len(x[0]) + 1]
# average over L to get representation vector of length E
        x2 = torch.mean(x1, dim=0)
# use it for the classification
        x3 = self.classifier_nn(x2)
        return x3
# combine
transformer_model = transformer
classifier_nn = my_classifier
final_model = FullModel(transformer_model, classifier_nn)

# ======== compute gradient =========
# define for which output class i want to get gradient
external = torch.tensor([1,0,0]) #1st out of 3 possible

x = Variable(input_embedding, requires_grad=True)
pred = final_model(x)
input_gradient = x.grad

Have you checked that the gradient is correct for the non-combined ESM transformer on its own using your finite differencing method? Its possible that for large models the input is unstable at the point you sampled.