Instead of iterating each row in out
, you could split out
before into both inputs, call fc_1
and fc_2
, and create the output tensor using the results.
Depending on the batch size, the performance difference might be insignificant.
1 Like