RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of AsStridedBackward0, is at version 180; expected version 177 instead. Hint: enable anomaly detection

I tried converting the hydrological model (gr4j) to the pytorch version and got the same results as the previous version. Now, I want to apply autograd to parameter optimization, just like LSTM and other models, but I don’t know where to write there is a problem, I get this error. Here is my code.

model code

import torch
from torch import nn as nn
from torch import tensor
import pandas as pd

def s_curves1(t, x4):
        Unit hydrograph ordinates for UH1 derived from S-curves.

    if t <= get_tensor(0.0):
        return get_tensor(0.0)
    elif t < x4:
        return (t / x4) ** get_tensor(2.5)
    else:  # t >= x4
        return get_tensor(1.0)

def s_curves2(t, x4):
        Unit hydrograph ordinates for UH2 derived from S-curves.

    if t <= get_tensor(0.0):
        return get_tensor(0.0)
    elif t < x4:
        return get_tensor(0.5) * (t / x4) ** get_tensor(2.5)
    elif t < get_tensor(2.0) * x4:
        return get_tensor(1.0) - get_tensor(0.5) * (get_tensor(2.0) - t / x4) ** get_tensor(2.5)
    else:  # t >= x4
        return get_tensor(1.0)

def gr4j(precip, potential_evap, x1, x2, x3, x4, states=None, return_state=False):
        Generated simulated streamflow for given rainfall and potential evaporation.

        :param precip: Catchment average rainfall.
        :type precip: array(float)
        :param potential_evap: Catchment average potential evapotranspiration.
        :type potential_evap: array(float)
        :param params: X parameters for the model.
        :type params: dictionary with keys X1, X2, X3, X4
        :param states: Optional initial state values.
        :type states: Dictionary with optional keys 'production_store', 'routing_store'.
        :param return_state: If true returns a dictionary containing 'production_store' and 'routing_store'. Default: False.
        :type return_state: boolean

        :return: Array of simulated streamflow.
    if states is None:
        states = {}

    X1 = x1
    X2 = x2
    X3 = x3
    X4 = x4
    P = precip
    E = potential_evap

    nUH1 = int(torch.ceil(X4))
    nUH2 = int(torch.ceil(get_tensor(2.0) * X4))

    uh1_ordinates = torch.zeros(nUH1)
    uh2_ordinates = torch.zeros(nUH2)

    UH1 = states.get('UH1', torch.zeros(nUH1))
    UH2 = states.get('UH2', torch.zeros(nUH2))

    for t in range(1, nUH1 + 1):
        uh1_ordinates[t - 1] = s_curves1(t, X4) - s_curves1(t - get_tensor(1.0), X4)

    for t in range(1, nUH2 + 1):
        uh2_ordinates[t - 1] = s_curves2(t, X4) - s_curves2(t - get_tensor(1.0), X4)

    production_store = states.get('production_store', get_tensor(0.0))  # S
    routing_store = states.get('routing_store', get_tensor(0.0))  # R

    if P > E:
        net_evap = get_tensor(0.0)
        scaled_net_precip = (P - E) / X1
        if scaled_net_precip > get_tensor(13.0):
            scaled_net_precip = get_tensor(13.0)
        tanh_scaled_net_precip = torch.tanh(scaled_net_precip)
        reservoir_production = (X1 * (get_tensor(1.0) - (production_store / X1) ** get_tensor(2.0)) * tanh_scaled_net_precip) \
                               / (get_tensor(1.0) + production_store / X1 * tanh_scaled_net_precip)

        routing_pattern = P - E - reservoir_production
        scaled_net_evap = (E - P) / X1
        if scaled_net_evap > get_tensor(13.0):
            scaled_net_evap = get_tensor(13.0)
        tanh_scaled_net_evap = torch.tanh(scaled_net_evap)

        ps_div_x1 = (get_tensor(2.0) - production_store / X1) * tanh_scaled_net_evap
        net_evap = production_store * ps_div_x1 / (get_tensor(1.0) + (get_tensor(1.0) - production_store / X1) * tanh_scaled_net_evap)

        reservoir_production = get_tensor(0.0)
        routing_pattern = get_tensor(0.0)

    production_store = production_store - net_evap + reservoir_production

    percolation = production_store / (get_tensor(1.0) + (production_store / get_tensor(2.25) / X1) ** get_tensor(4.0)) ** get_tensor(0.25)

    routing_pattern = routing_pattern + (production_store - percolation)
    production_store = percolation

    UH1_copy = UH1
    for i in range(0, len(UH1) - 1):
        UH1_copy[i] = UH1[i + 1] + uh1_ordinates[i] * routing_pattern
    UH1_copy[-1] = uh1_ordinates[-1] * routing_pattern

    UH2_copy = UH2
    for j in range(0, len(UH2) - 1):
        UH2_copy[j] = UH2[j + 1] + uh2_ordinates[j] * routing_pattern
    UH2_copy[-1] = uh2_ordinates[-1] * routing_pattern

    groundwater_exchange = X2 * (routing_store / X3) ** get_tensor(3.5)
    routing_store = torch.maximum(get_tensor(0.0), routing_store + UH1[0] * get_tensor(0.9) + groundwater_exchange)

    R2 = routing_store / (get_tensor(1.0) + (routing_store / X3) ** get_tensor(4.0)) ** get_tensor(0.25)
    QR = routing_store - R2
    routing_store = R2
    QD = torch.maximum(get_tensor(0.0), UH2[0] * get_tensor(0.1) + groundwater_exchange)
    Q = QR + QD

    if return_state:
        return Q, {
            'production_store': production_store,
            'routing_store': routing_store,
            'UH1': UH1_copy,
            'UH2': UH2_copy,
        return Q

def get_tensor(v):
    return tensor(v, requires_grad=True).to(torch.float32)

class NeuralGR4J(nn.Module):
    def __init__(self, x1, x2, x3, x4, S0, R0):
        super(NeuralGR4J, self).__init__()
        self.x1 = nn.Parameter(tensor(x1))
        self.x2 = nn.Parameter(tensor(x2))
        self.x3 = nn.Parameter(tensor(x3))
        self.x4 = nn.Parameter(tensor(x4))

        self.S0 = nn.Parameter(tensor(S0))
        self.R0 = nn.Parameter(tensor(R0))

    def forward(self, prec, evp):
        prec [batch_size, series_len, 1]
        :param prec:
        :param evp:

        series_len = prec.shape[0]
        temp_states = {'production_store': self.S0, 'routing_store': self.R0}
        temp_Q = []
        for i in range(series_len):
            Q, states = gr4j(prec[i], evp[i], self.x1, self.x2, self.x3, self.x4, states=temp_states, return_state=True)
            temp_states['production_store'] = states['production_store']
            temp_states['routing_store'] = states['routing_store']
            temp_states['UH1'] = states['UH1']
            temp_states['UH2'] = states['UH2']
        return temp_Q

if __name__ == '__main__':
    model = NeuralGR4J(x1=320.11, x2=2.42, x3=69.63, x4=1.39, S0=0.6 * 320.11, R0=0.70 * 69.63)
    obs_series = pd.read_csv(r'data/sample.csv')
    P = torch.from_numpy(obs_series['P'].values)
    PE = torch.from_numpy(obs_series['ET'].values)
    model_Q = torch.from_numpy(obs_series['modeled_Q'].values)
    obs_Q = torch.from_numpy(obs_series['obs_Q'].values)
    q_sim = model.forward(P, PE)

main code

import pandas as pd
from gr4j_loader import GR4JDataset
from neural_gr4j import NeuralGR4J
from import DataLoader
import torch
import torch.nn.functional as F
from numpy import array
import numpy as np

model_file = pd.read_csv('data/sample.csv')
train_df = model_file.iloc[:600]
test_df = model_file.iloc[600:]

train_loader = DataLoader(GR4JDataset(train_df, 60), batch_size=32)
test_loader = DataLoader(GR4JDataset(test_df, 60), batch_size=32)

model = NeuralGR4J(S0=array(0.6 * 320.11).astype(np.float32), R0=array(0.7 * 69.63).astype(np.float32),
                   x1=array(320.11).astype(np.float32), x2=array(2.42).astype(np.float32), x3=array(69.63).astype(np.float32), x4=array(1.39).astype(np.float32))
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)

for epoch in range(10):
    for batch in train_loader:
        P, E, Q_obs = batch
        avg_loss = 0
        for i in range(P.shape[0]):
            P_i, E_i, Q_obs_i = P[i], E[i], Q_obs[i]
            Q_hat = model.forward(P_i, E_i)
            loss = 0
            for j in range(len(Q_hat)):
                loss = loss+F.mse_loss(Q_obs_i[j], Q_hat[j])
            avg_loss = avg_loss + loss
        avg_loss = avg_loss / P.shape[0]
        print("epoch_{}-----train_mse_loss: {}---".format(epoch, avg_loss))
        print("param_grad: {}".format([x.grad for x in optimizer.param_groups[0]['params']]))
    for batch in test_loader:
        with torch.no_grad():
            P, E, Q_obs = batch
            for i in range(P.shape[0]):
                P_i, E_i, Q_obs_i = P[i], E[i], Q_obs[i]
                Q_hat = model.forward(P_i, E_i)
                loss = 0
                for j in range(len(Q_hat)):
                    loss = loss + F.mse_loss(Q_obs_i[j], Q_hat[j])
                avg_loss = avg_loss + loss
            avg_loss = avg_loss / P.shape[0]
            print("epoch_{}-----test_mse_loss: {}---".format(epoch, avg_loss))
    print('model_pram: {}'.format([param for param in model.parameters()]))


import pandas as pd
import numpy as np
from import Dataset

class GR4JDataset(Dataset):
    def __init__(self, df: pd.DataFrame, time_len):
        self.P = df.P.values
        self.E = df.ET.values
        self.obs_Q = df.obs_Q.values
        self.time_len = time_len

    def __len__(self):
        return len(self.P) - self.time_len

    def __getitem__(self, idx):
        return self.P[idx:idx + self.time_len].astype(np.float32), self.E[idx:idx + self.time_len].astype(np.float32), self.obs_Q[idx:idx + self.time_len].astype(np.float32)

the sample.csv can be found in the GitHub - ckrapu/gr4j_theano: Autodiff-enabled hydrology modeling via Theano

Hi Jingx!

This is almost certainly your problem (and the same for UH2_copy). The
line UH1_copy = UH1 does not create a new, independent copy of UH1.
Instead, it creates a new python reference that refers to the same tensor
object in memory to which UH1 refers. Therefore your loop modifies (the
tensor referred to by) UH1 inplace.

As I read your code, you never actually use (what you thought were) the
copied values in UH1_copy – you just overwrite them with assignment
statements. So maybe creating an “empty” tensor with torch.empty_like()
would make more sense.

Also, your loop repeatedly modifies UH1_copy inplace by assigning into
it with indexing. So even if you create a new, independent tensor for
UH1_copy, these repeated inplace modification might cause problems.

I think trying something like (if I understand correctly what your code is

UH1_new = ((UH1[1:], torch.zeros (1))) + routing patterns * uh1_ordinates

would be the most natural approach.

Note that this tensor expression creates a new tensor and that nothing gets
modified inplace (and this operation will become part of the “computation
graph” and can participate in autograd / backpropagation).

As an aside, you do have a lot of layered code. If further “inplace” errors
crop up, do consider using autograd’s anomaly detection to help with


K. Frank