I need to use torch.einsum
to do the convolution instead of using torch.nn.functional.conv2d
.
In short, I got 16 images with the size of (3, 4, 4) and 16 kernels with the size of (3, 4, 4). I need to apply one kernel to one corresponding image and get one output (one data point).
import torch
import torch.nn.functional as F
torch.manual_seed(1)
weight = torch.normal(10**-5, 10**-6, (16, 3, 4, 4))
bias = torch.normal(10**-5, 10**-6, (16,1))
img = torch.randn(16, 3, 4, 4)
I compared the convolutional operation with einsum
and conv2d
:
With
F.conv2d(img[0], weight[0].unsqueeze(0), bias[0]).item()
I got 4.617268132278696e-05
But with
(torch.einsum('chw,chw->', [img[0], weight[0]]) + bias[0]).item()
I got 4.617268859874457e-05
.
All data dtype = torch.float32
.
I would like to know whether einsum
and conv2d
are equivalent in my scenario.
The reason of implementing with torch.einsum
:
I have 16 images and 16 kernels and need to applying one kernel to one image to get one output. It is easy to directly get all outputs (for 16 images) with
torch.einsum('bchw,bchw->b', [img, weight])+bias.squeeze(-1)
The output:
tensor([ 4.6173e-05, -9.3411e-06, -8.0316e-05, -6.5993e-05, 1.3381e-04,
-2.3025e-05, -1.3640e-06, 9.6504e-05, 2.1309e-06, -4.2717e-05,
3.5023e-06, 3.2773e-05, 2.0304e-04, -2.4030e-05, 1.0894e-04,
6.0090e-05])