Hello there. I’m wanted to know if what I’m doing seems sensible or correct? I’ve been using Tensorflow for some time and I’m looking at PyTorch as an alternative. In a nutshell, I have an image of a known solid object (a torus) represented as a point set. This is rotated around two axes (X and Z) and rendered to an image with a gaussian blur.
*I’d post the image here but this forum only allows one image per post (which seems a little low) *
The idea now, is given this set of points, can a neural net find the angles Rx and Rz that created this image?
My plan was initially to render an image, compare this generated image to the baseline and generate a loss that way. However, this didn’t work. I realised that one could transform a point from 3D to 2D pixel coordinates easily enough but then assigning this to an image and generating a loss function won’t work as you can’t get a derivative from an index-and-assign operation. Makes sense.
Instead, I found the function grid_sample and decided that I could generate pixel coordinates and check whether or not our generated pixel position has ‘hit’ a valid area (i.e is the sample white (1.o) or not).
I wrote some basic code to see how good this loss function is:
def test(points, rot_mat, mask) : size = (128, 128) # N x C x H x W apparently base = test_compare() base = base.expand((1,1,size,size)) learning_rate = 0.1 #x_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True) #y_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True) #z_rot = torch.tensor([0.0], dtype=torch.float32, requires_grad=True) ndc_mat = gen_ndc(size) #rot_mat = gen_rot(x_rot, y_rot, z_rot) trans_mat = gen_trans(0.0, 0.0, 2.0) proj_mat = gen_perspective(math.radians(90), 1.0, 1.0, 10.0) rot_mat.retain_grad() # Need this because it's not a leaf node?!?! for i in range(200): model_mat = torch.matmul(trans_mat, rot_mat) o = torch.matmul(model_mat, points) q = torch.matmul(proj_mat, o) # Divide through by W seems to work with a handy mask and sum w = q * mask w = torch.sum(w,1,keepdim=True) r = q / w s = r.narrow(1, 0, 2).reshape(1,1,-1,2) output = F.grid_sample(base, s) gauss_point = torch.tensor([[[[1.0]]]], requires_grad=True,\ dtype=torch.float32) criterion = torch.nn.MSELoss() loss = criterion(output, gauss_point) loss.backward(retain_graph=True) with torch.no_grad(): rot_mat -= learning_rate * rot_mat.grad #print("rot_mat", rot_mat) #loss.backward(retain_graph=True) #print("loss", rot_mat._grad) #tn = grad(loss,rot_mat) rot_mat.grad.zero_() splatted = splat(points, model_mat, proj_mat, ndc_mat, size) img = Image.fromarray(np.uint8(splatted.detach().numpy() * 255)) img.save("torch" + str(i).zfill(3) + ".jpg", "JPEG") return rot_mat
For brevity I haven’t included the other functions (but maybe I’ll post the entire thing somewhere else).
The final result looks a little like this:
Ultimately, this hasn’t really worked. One clue is the final rotation matrix looks like this:
tensor([[ 0.4880, -0.1858, -0.0320, -1.2093], [ 0.1985, 1.2921, -0.0137, -0.1724], [ 0.0349, -0.0112, 0.8019, 0.1691], [ 0.0697, -0.0224, -0.3963, 1.3382]], grad_fn=<MmBackward>)
Clearly that’s not correct as the last row should be 0,0,0,1 in the ideal case. I’ve clearly messed up with the maths somewhere. I realise this is quite a tricky problem and I’ve barely scratched the surface. I suppose my first question would be:
Why do I need retain_graph=True here? Without it, I get no .grad attached to my rot_mat. I thought that rot_mat would be a leaf node and would therefore have it’s gradients worked out and ready for me to apply?