Hello there. I’m wanted to know if what I’m doing seems sensible or correct? I’ve been using Tensorflow for some time and I’m looking at PyTorch as an alternative. In a nutshell, I have an image of a known solid object (a torus) represented as a point set. This is rotated around two axes (X and Z) and rendered to an image with a gaussian blur.
*I’d post the image here but this forum only allows one image per post (which seems a little low) *
The idea now, is given this set of points, can a neural net find the angles Rx and Rz that created this image?
My plan was initially to render an image, compare this generated image to the baseline and generate a loss that way. However, this didn’t work. I realised that one could transform a point from 3D to 2D pixel coordinates easily enough but then assigning this to an image and generating a loss function won’t work as you can’t get a derivative from an index-and-assign operation. Makes sense.
Instead, I found the function grid_sample and decided that I could generate pixel coordinates and check whether or not our generated pixel position has ‘hit’ a valid area (i.e is the sample white (1.o) or not).
I wrote some basic code to see how good this loss function is:
def test(points, rot_mat, mask) :
size = (128, 128) # N x C x H x W apparently
base = test_compare()
base = base.expand((1,1,size[0],size[1]))
learning_rate = 0.1
#x_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True)
#y_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True)
#z_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True)
ndc_mat = gen_ndc(size)
#rot_mat = gen_rot(x_rot, y_rot, z_rot)
trans_mat = gen_trans(0.0, 0.0, 2.0)
proj_mat = gen_perspective(math.radians(90), 1.0, 1.0, 10.0)
rot_mat.retain_grad() # Need this because it's not a leaf node?!?!
for i in range(200):
model_mat = torch.matmul(trans_mat, rot_mat)
o = torch.matmul(model_mat, points)
q = torch.matmul(proj_mat, o)
# Divide through by W seems to work with a handy mask and sum
w = q * mask
w = torch.sum(w,1,keepdim=True)
r = q / w
s = r.narrow(1, 0, 2).reshape(1,1,-1,2)
output = F.grid_sample(base, s)
gauss_point = torch.tensor([[[[1.0]]]], requires_grad=True,\
dtype=torch.float32)
criterion = torch.nn.MSELoss()
loss = criterion(output, gauss_point)
loss.backward(retain_graph=True)
with torch.no_grad():
rot_mat -= learning_rate * rot_mat.grad
#print("rot_mat", rot_mat)
#loss.backward(retain_graph=True)
#print("loss", rot_mat._grad)
#tn = grad(loss,rot_mat)
rot_mat.grad.zero_()
splatted = splat(points, model_mat, proj_mat, ndc_mat, size)
img = Image.fromarray(np.uint8(splatted.detach().numpy() * 255))
img.save("torch" + str(i).zfill(3) + ".jpg", "JPEG")
return rot_mat
For brevity I haven’t included the other functions (but maybe I’ll post the entire thing somewhere else).
The final result looks a little like this:
Ultimately, this hasn’t really worked. One clue is the final rotation matrix looks like this:
tensor([[ 0.4880, -0.1858, -0.0320, -1.2093],
[ 0.1985, 1.2921, -0.0137, -0.1724],
[ 0.0349, -0.0112, 0.8019, 0.1691],
[ 0.0697, -0.0224, -0.3963, 1.3382]], grad_fn=<MmBackward>)
Clearly that’s not correct as the last row should be 0,0,0,1 in the ideal case. I’ve clearly messed up with the maths somewhere. I realise this is quite a tricky problem and I’ve barely scratched the surface. I suppose my first question would be:
loss.backward(retain_graph=True)
Why do I need retain_graph=True here? Without it, I get no .grad attached to my rot_mat. I thought that rot_mat would be a leaf node and would therefore have it’s gradients worked out and ready for me to apply?
Cheers
Ben