Differnece in pytorch computational graphs


I am trying the compare the two computational graphs(graph_1, graph_2). Basically, both perform exactly the same operations in terms of their architecture construction. However, in graph_1 all parameters are initialized and in graph_2 none of the parameters are initialized. When I compare the two graphs there are some differences, but I am not able to understand the difference and justify the reason for the difference.

Any help in understanding these two graphs and their differences is highly appreciated!


Thank you!