My cost function: Minimize distance between two sets of pixels using only affine transformations. Custom cost function needed?

Hi everyone!

I’m trying to decide:

Do I need to make a custom cost function? (I’m thinking I probably do)
---- If so, would I have to implement backwards() as well? (even if everything happens in / with Variables?)

Long story short, I have two images: a target image and an attempt to mimic the image (i.e. a distorted or perturbed version). I have two sets of pixel coordinates that are common to both images. Imagine something like Scale Invariant Feature Transform (SIFT), but no 1-1 mapping between each pixel in one image to the pixel in the other.

Imagine overlaying the attempt image over the target image such that the centers of each image are on top of each other. I want to transform the attempt image using only translation (y, x), rotation (theta), and scaling (s) in order to minimize the distance between every “important” point in the target image (tar) and every “important” point in the attempt (att) image.

Here’s the cost function more formally.

cost_fn

There isn’t a ground truth value, I just want to find the y, x, theta, and s values that make that function as close as possible to 0.

EDIT: I also realize that there are an infinite amount of solutions thanks to the cyclic nature of theta. Assume it’s restricted to something like -30 degrees <= theta <= 30 degrees.

If you can translate that loss function into a python expression using torch compatible math ops, then you can do

loss = that_expression
# and then you can get the gradients and update as usual
loss.backward()
optimizer.step()

No need to implement the backward pass explicitly.
Some torch compatible math ops include…

torch.sin(var_theta)
var_a * var_b
var_a + var_b
var_a - var_b
torch.sum(variable, dim=dimension_to_sum_over)

I do not think that having an infinite number of possible solutions is a problem. I would not constrain theta at all, just let it find the nearest solution. That is, unless you have a good reason for wanting to constrain the search to only small rotations of the input.

Thanks for your reply! It was really helpful. I have a prototype of the function up and running, but I’m running into some difficulty.

Let’s say I’m trying to differentiate my cost function with respect to variables y, x, theta, and S (as above) - so I have a FloatTensor of length 4.

My cost function involves the variables themselves in non-uniform ways (i.e. I can’t just do matrix multiplication, I have to index each variable from the Tensor specifically), and I’m getting an error I don’t know how to fix.

Minimal example of what I’m trying to do:

x = [x0, x1]
y = 2x0 + 4x1^2

then get dy / dx0 and dy / dx1 (where dy / dx0 = 2 and dy / dx1 = 8x1)

In code:

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
y = 2 * x[0] + 3 * x[1]^ 2
y.backward()
y.grad

The last part gives me this error:
RuntimeError: bitxor is only supported for integer type tensors at d:\pytorch\pytorch\torch\lib\th\generic/THTensorMath.c:961

Does what I’m saying make sense?

That error comes from x[1]^ 2, in python ** is the exponentiation operator and ^ is the bitwise XOR. It seems that pytorch understandably does not support the bitwise XOR operation.

I see another error in your code. By doing y.backward() you correctly request the gradients of y, by doing y.grad you request the gradients with respect to y, which makes little sense. If you want the gradient of y with respect to x you need x.grad instead of y.grad.

Basically after calling y.backward() you should be able to get dy / dx0 using x.grad[0].

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
y = 2 * x[0] + 3 * x[1]**2
y.backward()
print(x.grad[0], x.grad[1])

That actually works beautifully - I was scared that PyTorch didn’t support that kind of thing. I’m really impressed that you can pluck individual parts of your input vector and put them in your cost function and it still works. Man I love PyTorch.

I feel like I’m really close, I’m just getting another pesky error:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
total = autograd.Variable(torch.FloatTensor([0]))

for i in range(3): # number of epochs
print("iteration: " + str(i))
total += helper(x)
total.backward()
x = x - 0.1 * x.grad # a gradient step with a learning rate of 0.1
print(x.grad) # when that math in the line above is done, the gradient vanishes…
# shouldn’t I have to clear it manually?

print(x.grad[0], x.grad[1]) # never gets here

---------------------- Output --------------------
iteration: 0
None
iteration: 1

Unfortunately, on the second iteration of total.backward(), I get:

RuntimeError: element 0 of variables tuple is volatile

Again, thank you so much for your help. It’s been invaluable.

EDIT 1:

Another thing I tried is:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
total = autograd.Variable(torch.FloatTensor([0]))

for i in range(3): # number of epochs
print("iteration: " + str(i))
total += helper(x)
total.backward()
x = x - 0.1 * x.grad # a gradient step with a learning rate of 0.1
x.grad.data.zero_() # ADDITION: zeroing out the gradient

print(x.grad[0], x.grad[1]) # never gets here


Per this SO thread:

But it doesn’t seem like I need to, because this is the error I get when it hits x.grad.data.zero_():

AttributeError: ‘NoneType’ object has no attribute ‘data’

It’s already been cleared. Will continue to research.

EDIT 2:

I think I’m getting closer. I didn’t look as closely at the SO post as I should have, apparently I needed to put .data on everything (for reasons I’m not sure of):


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
total = autograd.Variable(torch.FloatTensor([0]))

for i in range(3): # number of epochs
print("iteration: " + str(i))
total += helper(x)
total.backward()
x.data = x.data - 0.1 * x.grad.data # ADDITION: added .data to everything per the SO post
x.grad.data.zero_()

print(x.grad[0], x.grad[1]) # never gets here


Now I get the error about retaining the graph:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Which makes sense, since the gradient was cleared. I’m going to try moving total into the loop.

EDIT 3:

Here is the code and the results of moving it into the for loop:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)

for i in range(3): # number of epochs
print("iteration: " + str(i))
total = autograd.Variable(torch.FloatTensor([0]))
total += helper(x)
total.backward()
x.data = x.data - 0.1 * x.grad.data
x.grad.data.zero_()

print(x.grad[0], x.grad[1]) # never gets here


iteration: 0
iteration: 1
iteration: 2
Variable containing:
0
[torch.FloatTensor of size 1]
Variable containing:
0
[torch.FloatTensor of size 1]


Which is unexpected. I was expecting those both to be non-zero.

I see your post; I’ll test it out and reply with my results. Thanks again!

The following line

x = x - 0.1 * x.grad

doesn’t modify the tensor originally referenced by the variable x, it creates a new tensor and changes the reference x to point to the new tensor. That is why x.grad disappears.

Try replacing the above line with x -= 0.1 * x.grad in order to modify the contents of x instead of redefining x.

The usual way of zeroing the gradients is by doing one of optimizer.zero_grad() or model.zero_grad().

Redefining x is probably also the cause of the error complaining about the tuple being volatile. I think the reason is that x no longer references the Variable declared with requires_grad=True but a Variable calculated from the original value. Variables that are the result of calculations are not considered to be leaf variables and they can’t require_grad nor do they have a .grad attribute.

(EDIT 4)

That makes sense, actually. What do you think about my last edit(s)? (i.e. replacing x and x.grad with x.data and x.grad.data, respectively?)

CODE:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
total = autograd.Variable(torch.FloatTensor([0]))

for i in range(3): # number of epochs
print("iteration: " + str(i))
total += helper(x)
total.backward()
x -= 0.1 * x.grad

print(x.grad[0], x.grad[1]) # never gets here


OUTPUT:

at x -= 0.1 * x.grad:

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

Take a look at the pytorch sgd source (or just use pytorch’s sgd), currently it changes x.data. Then you need to use zero_grad or so.

Best regards

Thomas

In that case

x.data -= 0.1 * x.grad

should work.

That’s a good idea Tom, thanks!

I actually wanted to use pytorch’s sgd, but all the cost functions (that I know of) take in two inputs: y hat / estimate and y target. I don’t have a y target and have no idea what it’d be. I just want to minimize the function.

EDIT: I’ll be investigating some of the answers in this thread more and post an update.

Thank you both very much!

I don’t think it does :confused:

Error on x.data -= 0.1 * x.grad:


TypeError: sub_ received an invalid combination of arguments - got (Variable), but expected one of:

  • (float value)
    didn’t match because some of the arguments have invalid types: (Variable)
  • (torch.FloatTensor other)
    didn’t match because some of the arguments have invalid types: (Variable)
  • (float value, torch.FloatTensor other)

CODE:

def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)
total = autograd.Variable(torch.FloatTensor([0]))

for i in range(3): # number of epochs
print("iteration: " + str(i))
total += helper(x)
total.backward()
x.data -= 0.1 * x.grad

print(x.grad[0], x.grad[1]) # never gets here

I’m pretty sure I got it working.

EDIT: Just kidding, not quite yet. I’m not actually clearing the gradients here, I’m just accumulating them and printing it at the end.

CODE:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)

for i in range(3): # number of epochs
print("iteration: " + str(i))
total = autograd.Variable(torch.FloatTensor([0]))
total += helper(x)
total.backward()
x.data -= 0.1 * x.grad.data

print(x.grad[0], x.grad[1])


OUTPUT:

iteration: 0
iteration: 1
iteration: 2
Variable containing:
6
[torch.FloatTensor of size 1]
Variable containing:
11.5200
[torch.FloatTensor of size 1]


My bad, I replied too quickly… As you found out, the following works

x.data -= 0.1 * x.grad.data

and as you aren’t using an optimizer, nor a model in your example code, you can zero gradients using

x.grad.data.fill_(0)

You will also get the following error on the second iteration. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. To avoid that you can add total.detach_() at the end of the for loop in order to tell pytorch where to stop backpropagating.

No worries! You’re golden.

I actually didn’t get an error, but that sounds like sound advice so I’ll do that as well and post the output:

CODE without total.detach_()


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)

for i in range(3): # number of epochs
print("iteration: " + str(i))
total = autograd.Variable(torch.FloatTensor([0]))
total += helper(x)
total.backward()
x.data -= 0.1 * x.grad.data
x.grad.data.fill_(0)

print(x[0], x[1])

OUTPUT without total.detach_()

iteration: 0
iteration: 1
iteration: 2
Variable containing:
0.4000
[torch.FloatTensor of size 1]
Variable containing:
0.1280
[torch.FloatTensor of size 1]


CODE with total.detach_()

def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)

for i in range(3): # number of epochs
print("iteration: " + str(i))
total = autograd.Variable(torch.FloatTensor([0]))
total += helper(x)
total.backward()
x.data -= 0.1 * x.grad.data
x.grad.data.fill_(0)
total.detach_()

print(x[0], x[1])


OUTPUT with total.detach_()

Identical to the first output.


They’re both the same, which makes sense since I initialize total as a new variable at the beginning anyway. Do you think it saves memory to detach it at the end of the loop? I’ll do it anyway since it doesn’t appear to harm anything and might make the code more readable.

Again, thank you both for all your help!

My mistake… you redefine total on each pass through the loop, so detaching it does nothing.

If you were to keep the value of total from one iteration to the next, modifying it in each iteration then you would get the error I was talking about.

Gotcha. It all makes sense now. Thanks!

Follow up question, now that I’ve been thinking about the optimizers:

CODE:


def helper(x):
return 2 * x[0] + 3 * x[1]**2

x = autograd.Variable(torch.FloatTensor([1, 2]), requires_grad=True)

optimizer = optim.SGD(x, lr = 0.1)

for i in range(3): # number of epochs
optimizer.zero_grad()
print("iteration: " + str(i))
total = autograd.Variable(torch.FloatTensor([0]))
total += helper(x)
total.backward()
optimizer.step()

print(x[0], x[1])


OUTPUT:

Error at: optimizer = optim.SGD(x, lr = 0.1)

TypeError: params argument given to the optimizer should be an iterable of Variables or dicts, but got torch.autograd.variable.Variable


I’m guessing PyTorch wants the input to optim.SGD as a list of Variables, huh? I suppose my question would be: should I refactor my code to accommodate this, or is there a better best practice?

optimizer = optim.SGD([x], lr = 0.1)

Though this might be a good time to investigate creating a model by subclassing nn.Module as shown in the tutorials.

That worked!

Thank you so much. I’ll be looking into that.

Have a good day!