ResNet, time series classification and input tensor dimension mapping

I’m trying to reimplement the ResNet from this paper for use in time series classification. Since it’s my first time working with convolutional layers, I’m a bit confused about how to arrange the input tensor for the convolution.

The original implementation in Keras uses 2d layers. I’m given to understand that the convention in torch for a 2d layer is that the tensor should look like [batch_size, channels, height, width]. For image input, this makes sense to me but I’m not sure how multivariate time series data maps to this schema, as in my mind, a 1d layer would make much more sense. Could somebody please point out to me how [sample_axis, time_axis, feature_axis] could be mapped to this schema and why the paper uses a 2d layer?

Am I correct in assuming that in my case, channels would be 1? Is this the same as using Conv1d or am I overlooking something?

Turns out I had a thorough misunderstanding of the original paper, as all the datasets are image series, so it makes perfect sense that they used 2d layers.