Model(
(multi_tdnn): MultiTDNN(
(multi_tdnn): Sequential(
(0): TDNN(
(conv): Conv1d(60, 512, kernel_size=(5,), stride=(1,))
(nonlinearity): ReLU()
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): TDNN(
(conv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), dilation=(2,))
(nonlinearity): ReLU()
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): TDNN(
(conv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), dilation=(3,))
(nonlinearity): ReLU()
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): TDNN(
(conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
(nonlinearity): ReLU()
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): TDNN(
(conv): Conv1d(512, 1500, kernel_size=(1,), stride=(1,))
(nonlinearity): ReLU()
(bn): BatchNorm1d(1500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stats_pool): StatsPool()
(linear1): Linear(in_features=3000, out_features=512, bias=True)
(linear2): Linear(in_features=512, out_features=1739, bias=True)
(nonlinearity): ReLU()
)
I am using above architecture for my experimentation. But, it gives me NaN vales. From my analysis, I think ReLU and BatchNorm in TDNN layer is causing problem.
Conv1d --> ReLU --> BatchNorm1d <== Gives NaN
Conv1d --> Tanh --> BatchNorm1d <== Working Perfectly
Conv1d --> ReLU --> Dropout <== Gives NaN
Conv1d --> Tanh --> Dropout <== Working Perfectly
Conv1d --> ReLU --> BatchNorm1d --> Dropout <== Working Perfectly
Can someone please guide me on this issue?