Tensorboard issue with self-defined forward function

How to get around the following tensorboard issue with self-defined forward function ?

/home/phung/miniconda3/envs/py39/bin/python3.9 /home/phung/PycharmProjects/beginner_tutorial/gdas.py
Files already downloaded and verified
Files already downloaded and verified
run_num =  0

Error occurs, No graph saved
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 770, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 357, in train_NN
    writer.add_graph(graph, train_inputs)
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 736, in add_graph
    self._get_file_writer().add_graph(graph(model, input_to_model, verbose, use_strict_trace))
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 297, in graph
    raise e
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 291, in graph
    trace = torch.jit.trace(model, args, strict=use_strict_trace)
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/phung/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 201, in _forward_unimplemented
    raise NotImplementedError
NotImplementedError

Process finished with exit code 1

Based on the error message it doesn’t look like a Tensorboard issue but it seems as if your forward method is undefined in the used model.

yes and I added dummy forward function to every class hierarchy in my code which helps.

Now, I am facing the following Tensorboard error:

Error occurs, No graph saved
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 789, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 538, in train_NN
    writer.add_graph(graph, train_inputs)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 736, in add_graph
    self._get_file_writer().add_graph(graph(model, input_to_model, verbose, use_strict_trace))
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 295, in graph
    raise e
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 289, in graph
    trace = torch.jit.trace(model, args, strict=use_strict_trace)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
RuntimeError: Only tensors, lists, tuples of tensors, or dictionary of tensors can be output from traced functions

Process finished with exit code 1

The error is raised as tracing the model fails. Check the error message and make sure the expected types are returned in the forward function.

This new code seems to have gotten around the self-defined forward function issue, but it has the following new error. Should I create a new question for this new issue ?

Should I use add_module() instead of nn.ModuleList() in this case ?

Error occurs, No graph saved
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 794, in <module>
    ltrain = train_NN(forward_pass_only=0)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 551, in train_NN
    writer.add_graph(graph, NN_input)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 736, in add_graph
    self._get_file_writer().add_graph(graph(model, input_to_model, verbose, use_strict_trace))
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 295, in graph
    raise e
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 289, in graph
    trace = torch.jit.trace(model, args, strict=use_strict_trace)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 322, in forward
    self.cells[c].nodes[n].connections[
RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

Hi again,
I think ModuleList should be fine, but the combined_feature_map, output and so on are the problem.

@Unity05 How shall I fix combined_feature_map ?

I think here the problem really is that this tensor requires gradient but is no parameter. If you try it requires gradient set to False, there should be no problems regarding combined_feature_map and the graph tracing. However, I remember you’ve set requires gradient to True a while ago to solve another issue.

@Unity05 I have already set requires_grad=False for combined_feature_map , but still the same runtime error

Yes, but that should be a problem because it calls the graph a second time.

@Unity05 What do you exactly mean by the above quoted sentence ?

Sorry, I meant calls instead of class.

@Unity05 wait, why the second time ?

That’s how even if you set requires_grad=False at initialization, the next calls it’s gonna be True.

@Unity05 How do I get around the issue with the next calls it’s gonna be True. ?

You could for instance just reset the combined_feature_map after an epoch. But spoiler: combined_feature_map is not the only thing with that problem.

1 Like

@Unity05 Thanks for your suggestion. However, if you look at value of NUM_EPOCHS variable that I set for code development and debugging purpose, I am afraid that this is not really the solution to the runtime error that I am facing now.

In other words, the runtime error happens within 1 epoch itself, not between 2 epochs

I’ve just tried it with resetting it in line 472, what did work.

@Unity05 Would you be able to share your reset modification coding ?

If you just want to test it out quickly, try:

def reset(self):
        self.combined_feature_map = torch.zeros([BATCH_SIZE, NUM_OF_IMAGE_CHANNELS, IMAGE_HEIGHT, IMAGE_WIDTH],
                                                requires_grad=False)
        if USE_CUDA:
            self.combined_feature_map = self.combined_feature_map.cuda()

in line 211 and

            for c in range(NUM_OF_CELLS):
                for n in range(NUM_OF_NODES_IN_EACH_CELL):
                    # not all nodes have same number of Type-1 output connection
                    for cc in range(MAX_NUM_OF_CONNECTIONS_PER_NODE - n - 1):
                        self.cells[c].nodes[n].connections[cc].reset()

in line 472. Line numbers according to your gist. It’s not too pretty but should do its job on that part.

1 Like