CNN for stacked frames?

I’m not very experienced in using CNNs but I want to use one in a reinforcement learning environment where I have 4 stacked gray scale images. How do I begin to write the parameters for the Conv layer? I don’t know how to find out the dimensions of the images (which are just arrays) or how to make a conv layer for stacked frames.

Once you’ve created the tensor you can check its shape via print(tensor.shape). Depending how you’ve created this particular tensor (from a dataset or DataLoader) it would have 3 or 4 dimensions. A DataLoader will create a batch of samples, so it would have 4 dimensions in the shape [batch_size, channels, height, width].

Based on your description it seems you are dealing with images with 4 channels, so the first conv layer should be created via in_channels=4.