How do I use the output of a feed forward neural network as a new value for training with pytorch?

eriks · November 24, 2021, 5:35pm

Hello everyone,
I am quiet the beginner regarding neural networks and I am currently struggling with the implementation of a feed-forward neural network for time series prediction. The network gets an input state for a time step t and shall predict the next state for the time step t+1 and so on. The thing is now that I want the network to take the prediction of state t+1 as the new input to predict t+2 but unfortunately I get following runtime exception at the loss.backward() call. Full traceback:

/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py:154: UserWarning: Error detected in MmBackward0. Traceback of forward call that caused the error:
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 296, in <module>
    main()
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 277, in main
    train_loss, val_loss = optimize_model(model, train_data_features, train_data_labels,
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 57, in optimize_model
    prediction = model(input).squeeze()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/model_network_MLP.py", line 51, in forward
    x = self.activation(self.layers[i_layer](x))
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
 (Triggered internally at  ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
  Variable._execution_engine.run_backward(
Backend Qt5Agg is interactive backend. Turning interactive mode on.
^CTraceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 296, in <module>
    main()
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 277, in main
    train_loss, val_loss = optimize_model(model, train_data_features, train_data_labels,
  File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 61, in optimize_model
    loss.backward()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 401]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

My implementation for the training loop looks as follows:

def optimize_model(model: pt.nn.Module, x_train: pt.Tensor, y_train: pt.Tensor,
                   x_val: pt.Tensor, y_val: pt.Tensor, epochs: int=1000,
                   lr: float=0.001, save_best: str="") ->Tuple[List[float], List[float]]:
    """Optimize network weights based on training and validation data.

    :param model: neural network model
    :type model: pt.nn.Module
    :param x_train: features for training
    :type x_train: pt.Tensor
    :param y_train: labels for training
    :type y_train: pt.Tensor
    :param x_val: features for validation
    :type x_val: pt.Tensor
    :param y_val: labels for validation
    :type y_val: pt.Tensor
    :param epochs: number of optimization loops, defaults to 1000
    :type epochs: int, optional
    :param lr: learning rate, defaults to 0.001
    :type lr: float, optional
    :param save_best: path where to save best model; no snapshots are saved
        if empty string; defaults to ""
    :type save_best: str, optional
    :return: lists with training and validation losses for all epochs
    :rtype: Tuple[List[float], List[float]]
    """    
    criterion = pt.nn.MSELoss()
    optimizer = pt.optim.Adam(params=model.parameters(), lr=lr)
    best_val_loss, best_train_loss = 1.0e5, 1.0e5
    train_loss, val_loss = [], []
    
    torch.autograd.set_detect_anomaly(True)
    for e in range(1, epochs+1):
        optimizer.zero_grad()
        #prediction = model(x_train).squeeze()
        for i in range(len(x_train)-1):
            input = x_train[i]
            prediction = model(input).squeeze()
            x_train[i+1, :-1] = prediction[:-2]
            
            loss = criterion(prediction, y_train[i, :])
            loss.backward()
            optimizer.step()
            train_loss.append(loss.item())

            with pt.no_grad():
                prediction = model(x_val[i, :]).squeeze()
                loss = criterion(prediction, y_val[i, :])
                val_loss.append(loss.item())
                
    return train_loss , val_loss

My idea was to use just one state of the training data to input into the model and then iterate over the training data in order to manipulate the next time step and so on but it does not seem to work.

Does anyone know what the problem is or if my idea is even close to what I want to do?

yuanyin · November 25, 2021, 3:37am

You can replace “input = x_train[i]” by “input = x_train[i].clone().detach()”. This error was caused by loop dependency “x_train ← input ← prediction ← x_train”.

eriks · November 25, 2021, 9:38am

Thank you! That worked well, but my training is really slow now, because of the additional loop and predicting every single value of x_train after another.
Another question regarding this issue is, where to put the loss.backward() call exactly. Should it be inside the additional loop over x_train, or outside? I am not really sure what difference it makes.

EDIT: I think I figured it out myself. I put it into the top level loop for each epoch and added another tensor storing the predictions in order to calculate the loss correctly over the all time steps.

yuanyin · November 25, 2021, 11:47am

Yes, you can put it out of the inside loop, and calculate the loss to other tensor. But it is not nesessary because such action will introduce a little complex backward process, and you would get no improvement in performance. So you could just take the optimizer.step() out of the inside loop.

In additianal, this way will cause an another result different to origin code version for optimizer.step() executed at every step in inside loop. Which is the best one relies on the background problem.

I am sorry for lated answor.