Is gradient flow lost when using Numpy?

marcos_tulio · April 15, 2025, 2:38am

I’m having difficulty training a deep learning model where some data is obtained from a library that uses Numpy. I’ve noticed that, although the necessary derivative calculations for training an optimization model are being performed, the model is not managing to learn. Is it possible that the tensor graph is being lost? My example uses the EPANET library (a library for reading hydraulic simulators), and there are several functions written in Numpy. These functions are called within the grad_steps function.

(…)
solver_net = NNSolver(data, args)
solver_net.to(DEVICE)
solver_opt = optim.Adam(solver_net.parameters(), lr=solver_step)

stats = {}
for i in range(nepochs):

    solver_net.train()
    for Xtrain in train_loader:
        Xtrain = Xtrain[0].to(DEVICE)
        solver_opt.zero_grad()
        Yhat_train = solver_net(Xtrain)
        Ynew_train = grad_steps(data, Xtrain, Yhat_train, args)
        train_loss = total_loss(data, Xtrain, Ynew_train, args)
        train_loss.sum().backward()
        solver_opt.step()

(…)

Thank you for any help

ptrblck · April 15, 2025, 3:34pm

Yes, using 3rd party libs, such as numpy, will detach the tensor from the computation graph and you would need to implement custom autograd.Functions and define the backward method explicitly. Alternatively, try to use PyTorch operations instead of numpy (if possible).

marcos_tulio · April 15, 2025, 3:46pm

Thank you for the response. Unfortunately, implementing the third-party package in PyTorch is not possible. I saw a possible implementation at PyTorch: Defining New autograd Functions — PyTorch Tutorials 2.6.0+cu124 documentation, however, I have four different functions that use NumPy. Does that mean I need to create a separate class inheriting from autograd.Function for each function?

ptrblck · April 15, 2025, 11:54pm

Yes, you would need to implement all custom functions and implement the backward method.