Calculation of loss by secondary external generated tensor

Nolan241 · October 30, 2022, 12:38pm

hello,

for some time now im struggeling with a kinda snake-bites-tail problem and i can not wrap my head around it to find a solution.

lets say i want to have a network which takes a batch of images [b,c,h,w] and shall output a scale, position, rotation tensor [b, scale, pos_x, pos_y, pos_z, rot_x, rot_y, rot_z] [1,7]

by this output i generate a new image with an external software / python method (like open3d,pil,etc.)

now i want to calculate the loss between the input images and the generated ones and update the network. but since the generated image has no direct relation to the output i can not generate the gradients.

what i tried so far:

o custom loss functions
somehow mapping output to generated or a custom autograd function)

o multiple networks to map output coords to generated images and vice versa
encoder, decoder, discriminator

o reinforement learning
but the action space is kinda a problem, all possible combinations or each of the 7 values as seperate actions

i hope there is a simple solution… why isnt there a RL which works with values instead of actions

would it be possible to store the reward or the loss and pick the best one somehow (argmax)

greetings
nolan

KFrank · October 30, 2022, 7:21pm

Hi Nolan!

This is indeed the core of your problem and there is no magic way around it.

You want to train the parameters of your network that produces the
scale-position-rotation tensor with respect to the loss function that involves
the output of your “external” generator. So you need to backpropagate
through your generator somehow.

I strongly advise rewriting you external generator in pytorch so that you
get autograd and backpropagation “for free.”

If you can’t do that, you will have to write a function that computes the
jacobian of your generator (the partial derivatives of its outputs with
respect to its inputs). Unless your generator is very simple, this is likely
to somewhat difficult.

Given that the number of inputs to your generator is really quite small
(seven – I’m not sure what b is), you may be able to numerically
differentiate your generator’s outputs with respect to its inputs (but be
advised that numerical differentiation can be nuanced).

Good luck!

K. Frank

Nolan241 · November 1, 2022, 1:07pm

hm, thanks, that is what i expected / feared.

yesterday i did 2 tests with the GAN approach which seems promising.

first was working on the output directly and the second was using a generator.

import torch

from torch import nn, optim

device = 'cpu' if not torch.cuda.is_available() else 'cuda'


# discriminator
disc_model = nn.Sequential(
    nn.Linear(7, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 1),
).to(device)

disc_optimizer = optim.Adam(disc_model.parameters(), lr=0.002)

gen_model = nn.Sequential(
    nn.Linear(100, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 7),
).to(device)

gen_optimizer = optim.Adam(gen_model.parameters(), lr=0.002)

criterion = nn.MSELoss()

zero_target = torch.zeros([1, 1]).to(device)
invalid_loss = 1000000.0

target_image = torch.rand([1, 3, 64, 64]).to(device)
target_batch = target_image.view(-1, 3 * 64**2)


def unpack_data(tensor):
    # unpack tensor to usable scale, position, rotation
    # something like:
    # out_scale, out_position, out_rotation = torch.split(
    #    batch.squeeze(0), [1, 3, 3])

    # scale = out_scale.clone().detach().cpu().numpy()[0]
    # position = list(out_position.clone().detach().cpu().numpy())
    # rotation = list(out_rotation.clone().detach().cpu().numpy())
    return scale, position, rotation


def is_invalid_vector(vector, min_value=-1.0, max_value=1.0):
    for coord in vector:
        if coord < min_value or coord > max_value:
            return True
    return False


def is_data_invalid(scale, position, rotation):
    # check if data is in bounds / constraints
    # something like:
    # if scale < 0.5 or scale > 2:
    #     return True
    # if is_invalid_vector(position):
    #     return True
    # if is_invalid_vector(rotation, -180, 180):
    #     return True
    return False


def generate_image(scale, position, rotation):
    # generate a new image
    gen_image = torch.rand([1, 3, 64, 64]).to(device)
    gen_image.requires_grad = True
    return gen_image.view(-1, 3 * 64**2)


def train_with_direct_data():
    direct_data = torch.rand([1, 7]).to(device)
    direct_data.requires_grad = True

    direct_optimizer = optim.Adam([direct_data], lr=0.002)

    for step in range(100000):
        # pseudo code
        scale, position, rotation = unpack_data(direct_data)
        is_invalid = is_data_invalid(scale, position, rotation)

        if not is_invalid:
            loss_value = invalid_loss
        else:
            gen_batch = generate_image(scale, position, rotation)
            loss_value = criterion(gen_batch, target_batch)

        # train discriminator
        disc_tgt = torch.tensor([[float(loss_value)]]).to(device)
        disc_out = disc_model(direct_data.clone().detach())
        disc_loss = criterion(disc_out, disc_tgt)

        disc_optimizer.zero_grad()
        disc_loss.backward()
        disc_optimizer.step()

        # train direct data
        direct_out = disc_model(direct_data)
        direct_loss = criterion(direct_out, zero_target)

        direct_optimizer.zero_grad()
        direct_loss.backward()
        direct_optimizer.step()

        print('Step: {} Loss: {}'.format(step, direct_loss))


def train_with_generator():
    for step in range(100000):
        # pseudo code
        noise = torch.randn([1, 100]).to(device)
        gen_out = gen_model(noise)

        scale, position, rotation = unpack_data(gen_out)
        is_invalid = is_data_invalid(scale, position, rotation)

        if not is_invalid:
            loss_value = invalid_loss
        else:
            gen_batch = generate_image(scale, position, rotation)
            loss_value = criterion(gen_batch, target_batch)

        # train discriminator
        disc_tgt = torch.tensor([[float(loss_value)]]).to(device)
        disc_out = disc_model(gen_out.clone().detach())
        disc_loss = criterion(disc_out, disc_tgt)

        disc_optimizer.zero_grad()
        disc_loss.backward()
        disc_optimizer.step()

        # train generator
        pred_out = disc_model(gen_out)
        pred_loss = criterion(pred_out, zero_target)

        gen_optimizer.zero_grad()
        pred_loss.backward()
        gen_optimizer.step()

        print('Step: {} Loss: {}'.format(step, pred_loss))

what do you think, could that work?

the result of the generator approach was converging towards zeros.

greetings nolan

KFrank · November 1, 2022, 5:11pm

Hi Nolan!

I don’t understand your use case and don’t really follow what you are
doing here.

Nolan241:

def generate_image(scale, position, rotation):
    # generate a new image
    gen_image = torch.rand([1, 3, 64, 64]).to(device)
    gen_image.requires_grad = True
    return gen_image.view(-1, 3 * 64**2)

This looks fishy – you aren’t using the arguments to the function at all.
The returned result, gen_image is just a new tensor (independent of
the arguments), so nothing will backpropagate through the arguments.
(Slapping a .requires_grad = True on gen_image doesn’t fix this;
gen_image is just a new leaf tensor that you can backpropagate up to,
but not through and beyond.)

Best.

K. Frank

Nolan241 · November 1, 2022, 6:34pm

hi,

knew you would stumble upon it.

def unpack_data(tensor):
    # unpack tensor to usable scale, position, rotation
    # something like:
    # out_scale, out_position, out_rotation = torch.split(
    #    batch.squeeze(0), [1, 3, 3])

    # scale = out_scale.clone().detach().cpu().numpy()[0]
    # position = list(out_position.clone().detach().cpu().numpy())
    # rotation = list(out_rotation.clone().detach().cpu().numpy())
    return scale, position, rotation


def is_invalid_vector(vector, min_value=-1.0, max_value=1.0):
    for coord in vector:
        if coord < min_value or coord > max_value:
            return True
    return False


def is_data_invalid(scale, position, rotation):
    # check if data is in bounds / constraints
    # something like:
    # if scale < 0.5 or scale > 2:
    #     return True
    # if is_invalid_vector(position):
    #     return True
    # if is_invalid_vector(rotation, -180, 180):
    #     return True
    return False


def generate_image(scale, position, rotation):
    # generate a new image
    gen_image = torch.rand([1, 3, 64, 64]).to(device)
    gen_image.requires_grad = True
    return gen_image.view(-1, 3 * 64**2)

those are just pseudo code, due to the complex / long render image logic code.

its more about the proof of concept if it could work this way.

storing the loss as target in the discriminator and having zero as target for the generator.

greetings
nolan