Hello everyone,
I am quiet the beginner regarding neural networks and I am currently struggling with the implementation of a feed-forward neural network for time series prediction. The network gets an input state for a time step t and shall predict the next state for the time step t+1 and so on. The thing is now that I want the network to take the prediction of state t+1 as the new input to predict t+2 but unfortunately I get following runtime exception at the loss.backward()
call. Full traceback:
/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py:154: UserWarning: Error detected in MmBackward0. Traceback of forward call that caused the error:
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 296, in <module>
main()
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 277, in main
train_loss, val_loss = optimize_model(model, train_data_features, train_data_labels,
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 57, in optimize_model
prediction = model(input).squeeze()
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/model_network_MLP.py", line 51, in forward
x = self.activation(self.layers[i_layer](x))
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward(
Backend Qt5Agg is interactive backend. Turning interactive mode on.
^CTraceback (most recent call last):
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/home/user/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/user/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 296, in <module>
main()
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 277, in main
train_loss, val_loss = optimize_model(model, train_data_features, train_data_labels,
File "/home/user/coding/Studienarbeit_git/Active_flow_control_past_cylinder_using_DRL/DRL_py_beta/train_pressure_model.py", line 61, in optimize_model
loss.backward()
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 401]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
My implementation for the training loop looks as follows:
def optimize_model(model: pt.nn.Module, x_train: pt.Tensor, y_train: pt.Tensor,
x_val: pt.Tensor, y_val: pt.Tensor, epochs: int=1000,
lr: float=0.001, save_best: str="") ->Tuple[List[float], List[float]]:
"""Optimize network weights based on training and validation data.
:param model: neural network model
:type model: pt.nn.Module
:param x_train: features for training
:type x_train: pt.Tensor
:param y_train: labels for training
:type y_train: pt.Tensor
:param x_val: features for validation
:type x_val: pt.Tensor
:param y_val: labels for validation
:type y_val: pt.Tensor
:param epochs: number of optimization loops, defaults to 1000
:type epochs: int, optional
:param lr: learning rate, defaults to 0.001
:type lr: float, optional
:param save_best: path where to save best model; no snapshots are saved
if empty string; defaults to ""
:type save_best: str, optional
:return: lists with training and validation losses for all epochs
:rtype: Tuple[List[float], List[float]]
"""
criterion = pt.nn.MSELoss()
optimizer = pt.optim.Adam(params=model.parameters(), lr=lr)
best_val_loss, best_train_loss = 1.0e5, 1.0e5
train_loss, val_loss = [], []
torch.autograd.set_detect_anomaly(True)
for e in range(1, epochs+1):
optimizer.zero_grad()
#prediction = model(x_train).squeeze()
for i in range(len(x_train)-1):
input = x_train[i]
prediction = model(input).squeeze()
x_train[i+1, :-1] = prediction[:-2]
loss = criterion(prediction, y_train[i, :])
loss.backward()
optimizer.step()
train_loss.append(loss.item())
with pt.no_grad():
prediction = model(x_val[i, :]).squeeze()
loss = criterion(prediction, y_val[i, :])
val_loss.append(loss.item())
return train_loss , val_loss
My idea was to use just one state of the training data to input into the model and then iterate over the training data in order to manipulate the next time step and so on but it does not seem to work.
Does anyone know what the problem is or if my idea is even close to what I want to do?