I want to implement Gram Matrix of a tensor with shape
(batch_size, channel_size, patch_size, height, width)
We can calculate it only by this code in numpy
A @ A.T.
My question is why in neural style transfer tutorial, first we convert our matrix into 2D matrix, then use
.mm function to multiply it by its transpose?
a, b, c, d = input.size() # a=batch size(=1)
# b=number of feature maps
# (c,d)=dimensions of a f. map (N=c*d)
features = input.view(a * b, c * d) # resise F_XL into \hat F_XL
G = torch.mm(features, features.t()) # compute the gram product
# we 'normalize' the values of the gram matrix
# by dividing by the number of element in each feature maps.
return G.div(a * b * c * d)
Is there any optimization consideration or anything else?
The tutorial explains it as:
F_XL is reshaped to form F̂_hat_XL, a KxN matrix, where K is the number of feature maps at layer L and N is the length of any vectorized feature map F^k_XL.
It doesn’t seem to be working and I’m not sure, what this operation calculates in your case:
a, b, c, d = 2, 3, 4, 5
x = torch.randn(a, b, c, d)
x_np = x.numpy()
g_np = x_np @ x_np.T
> ValueError: shapes (2,3,4,5) and (5,4,3,2) not aligned: 5 (dim 3) != 3 (dim 2)
Actually, It seems I asked my question ambiguously.
A @ A.T, I mean the definition of Gram matrix which is multiplication of a matrix to its transpose, not a numpy code. My question was why do we have to reshape matrices.Now, I can understand that the only way to multiply two 4-D matrices is to reshape them into 2-D matrices then multiply them (the
.t() function only works on 2-D matrix, obviously).
Based on tutorial,
channel_size has been considered as number of feature maps and
width of images as length of any vectorized feature map
Now, in my problem, I have a 5-D matrix with a dimesion called
patch_size and I considered it as number of feature maps that only can cause a difference in output matrix size, but not the values, so I think it will not cause any problem in computing loss process.
Thanks for help