Hi, let’s say I have a an image tensor (not a minibatch), so its dimensions are (3, X, Y).
I want to convert it to numpy, for applying an opencv manipulation on it (writing text on it).
Calling .numpy() works fine, but then how do I rearrange the dimensions, for them to be in numpy convention (X, Y, 3)?
I guess I can use img.transpose(0, 1).transpose(1, 2) but just wondering if there’s any smoother bridge to numpy.
I have below implementation which can flatten and unflatten both the net itself.Hopefully you can use some of that.
import numpy as np
#############################################################################
# Flattening the NET
#############################################################################
def flattenNetwork(net):
flatNet = []
shapes = []
for param in net.parameters():
#if its WEIGHTS
curr_shape = param.cpu().data.numpy().shape
shapes.append(curr_shape)
if len(curr_shape) == 2:
param = param.cpu().data.numpy().reshape(curr_shape[0]*curr_shape[1])
flatNet.append(param)
elif len(curr_shape) == 4:
param = param.cpu().data.numpy().reshape(curr_shape[0]*curr_shape[1]*curr_shape[2]*curr_shape[3])
flatNet.append(param)
else:
param = param.cpu().data.numpy().reshape(curr_shape[0])
flatNet.append(param)
finalNet = []
for obj in flatNet:
for x in obj:
finalNet.append(x)
finalNet = np.array(finalNet)
return finalNet,shapes
#############################################################################
# UN-Flattening the NET
#############################################################################
def unFlattenNetwork(weights, shapes):
#this is how we know how to slice weights
begin_slice = 0
end_slice = 0
finalParams = []
#print(len(weights))
for idx,shape in enumerate(shapes):
if len(shape) == 2:
end_slice = end_slice+(shape[0]*shape[1])
curr_slice = weights[begin_slice:end_slice]
param = np.array(curr_slice).reshape(shape[0], shape[1])
finalParams.append(param)
begin_slice = end_slice
elif len(shape) == 4:
end_slice = end_slice+(shape[0]*shape[1]*shape[2]*shape[3])
curr_slice = weights[begin_slice:end_slice]
#print("shape: "+str(shape))
#print("curr_slice: "+str(curr_slice.shape))
param = np.array(curr_slice).reshape(shape[0], shape[1], shape[2], shape[3])
finalParams.append(param)
begin_slice = end_slice
else:
end_slice = end_slice+shape[0]
curr_slice = weights[begin_slice:end_slice]
param = np.array(curr_slice).reshape(shape[0],)
finalParams.append(param)
begin_slice = end_slice
finalArr = np.array(finalParams)
return np.array(finalArr)
flat_weights,shapes=flattenNetwork(model)
unFlattenNetwork(flat_weights,shapes) --Gives you Numpy n-dimensional array in your case the image which you can directly assign to a variable
You have to permute the axes at some point. Usually I do: x.permute(1, 2, 0).numpy()
to get the numpy array.
If I recall correctly, np.transpose
should also take multiple axis indices.
As an alternative, you could use a transform from torchvision, e.g. torchvision.transforms.ToPILImage()(x)
and maybe use a PIL
function to draw on your image.
Some excellent ideas here, thanks
How to reverse the action of .unsqueeze(0)
I have tried to use
torchvision.transforms.ToPILImage()(x)
to change tensor to PIL .
In 3rd dimension , it works correctly giving right output
torch.Size([3, 224, 224])
(224, 224, 3)
but when using 4 dimension tensor gives error saying
torch.Size([1, 3, 224, 224])
ValueError: pic should be 2/3 dimensional. Got 4 dimensions.
How can i reverse tensor [1, 3, 224, 224] to [3, 224, 224] ?
Thank you for helping
tensor = tensor.squeeze(0)
should work.
Oh thank you , I hadnt seen if squeeze() before. Now squeeze() working.
if x is a tensor image, you can simply do this using x[0], which will give you [3,224,224].
It seems that you have to use np.swapaxes (instead of transpose). If you have a tensor image ten [3, 32, 32], then:
img=ten.numpy()
img=np.swapaxes(img,0,1)
img=np.swapaxes(img,1,2)
will convert it to numpy image img [32, 32, 3].
Very Very useful Tips!
I love your gorgeous comment!
Thx!!
Usually, tensor images are float and between 0 to 1 but np images are uint8 and pixels between 0 to 255. So you need to do an extra step:
np.array(x.permute(1, 2, 0)*255, dtype=np.uint8)