https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html
Can someone give me the derivation for L_out in the above documentation?
Specifically, how the output length of temporal data gets downsampled / affected by the kernel, dilation, etc.
https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html
Can someone give me the derivation for L_out in the above documentation?
Specifically, how the output length of temporal data gets downsampled / affected by the kernel, dilation, etc.
There is already a formula for L_out in the documentation, but if you are asking about the intuition behind why it is the way it is:
For “kernel”, consider an example where kernel=3, stride=1, dilation=1, padding=0, and L_in=10.
The L_out is L_in - kernel + 1 = 8 since you can tile the size 3 kernel in 8 ways.
To think about “dilation” and how it spreads the kernel, you can check that all else being equal, kernel=2, dilation=2 produces the same L_out as kernel=3, dilation=1.
For the rest, you can probably get at them easily by playing around with some examples.
In addition, tools such as @ezyang’s conv visualizer might also be helpful to understand the conv arguments better: Convolution Visualizer