Handle video tensor

I’m struggling with a video tensor in pytorch. I want to make some filtering on it for preparation. I’m uncertain which shape I should use for the video. Currently I use shape: [frames, colors, height, width]
e.g. rgb: [50, 3, 1080, 1920] or for gray [50, 1, 1080, 1920]

To prepare the video I would like to do some filtering/convolution.
I would like to do some 2d-convolution separate for each frame and color channel.
I tried torch.nn.functional Conv3d and Conv2d and different weights.
If I try conv2d(video, filter, stride=1, bias=None, padding=1)
video having shape gray [50, 1, 1080, 1920]
filter: [[ 0.0000, 0.5000, 0.0000],
[ 0.0000, -1.0000, 0.0000],
[ 0.0000, 0.5000, 0.0000]]
And get.RuntimeError: weight should have at least three dimensions
If i unsqeese(filter,0) I get:
RuntimeError: expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]
Could anyone show me how I should do it?
Best regards

I used this as a solution:
import torch
import torch.nn.functional as F

x= torch.randn([10, 3, 4, 4], dtype=torch.float32)
xsize= x.shape
xx= x.view(xsize[0]*xsize[1], 1, xsize[2], xsize[3])

f= torch.tensor([[0,0.5,0],[0,-1,0],[0,0.5,0]], dtype=torch.float32)

y= F.conv2d(xx, f, bias= None, padding=1, stride=1)