Why is my generator model M_gen not training when optimizing based on the classifier model M_cls?

Brayn_O_Conner · February 5, 2025, 10:43am

I’m trying to train a generator model M_gen whose output serves as input for a classifier model M_cls. However, M_cls is also trained with additional data that doesn’t come from M_gen. To handle this, I combine both datasets into a single DataLoader. The issue is that M_gen doesn’t seem to be training properly, the evaluation metric avg_f1 remains the same across all epochs. This is my code to train M_gen:

while epoch <= num_epochs:
    for phase in ['train', 'val']:
        print(phase + "-Phase")
        if phase == 'train':
            model.train()  # Set model to training mode
            dataloader = train_dataloader
        else:
            model.eval()  # Set model to evaluate mode
            dataloader = val_dataloader

        
        generated_data_list = []
        for idx, inputs in tqdm(enumerate(dataloader), total=len(dataloader)):
            inputs = inputs.to(device).float()

            regression_output, classification_output = model(inputs)
            generated_data_list.append((regression_output, classification_output))

        f1 = main_M_cls.main(seed=2025, generated_data_list=generated_data_list)
        loss = torch.tensor(100-(f1*100), requires_grad=True, device=device)

        if phase == 'train':
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

In main_M_cls.main() I merge the samples from M_gen and the additional data saved in train_dataset

dataset2 = [(x1, y1) for (regression_output, classification_output) in generated_data_list for x1, y1 in zip(regression_output, classification_output)]
train_merged_dataset = ConcatDataset([train_dataset, dataset2])
train_loader = DataLoader(train_merged_dataset, batch_size=batch_size, shuffle=True)

Then I start the training and evaluation of M_cls:

for epoch in range(3):
    train_loss, val_loss = [], []

    # Start Training
    model.train()
    for i, (sample, label) in tqdm(enumerate(train_loader), total=len(train_loader)):
        sample = sample.to(device=device)
        label = label.to(device=device)

        output = model(sample)
        loss = criterion(output, label)

        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
        
    running_val_loss = 0

    # Start Validation
    model.eval()
    true_labels, pred_labels = [], []
    for i, (sample, label) in enumerate(val_loader):
        sample = sample.to(device=device, dtype=torch.float)
        label = label.to(device=device, dtype=torch.long)

        output = model(sample)
        loss = criterion(output, label)

        true_labels.append(vote_label.detach().cpu().numpy())
        pred_labels.append(output.detach().cpu().numpy())

   
    y_true = np.concatenate(true_labels, axis=0)
    y_prob = np.concatenate(pred_labels, axis=0)
    y_pred = np.argmax(y_prob, axis=1)
    val_f1 = f1_score(y_true, y_pred, average='macro')    # this function comes from sklearn.metrics
    return val_f1

I first thought that the f1-Score might be a problem, because the computation graph might not be able to compute and backpropagate the gradients, since the f1-Score is a Python scaler. However, I also tested my code by using the validation loss instead of f1-Score but still, the model M_gen is not learning anything, the losses for M_gen are always the same.

I appreciate any help or suggestions. Thanks in advance!

KFrank · February 5, 2025, 10:44pm

Hi Brayn!

torch.tensor() creates a new tensor that is not connected to any previous
computation graph. Calling it with requires_grad = True doesn’t change that.
(The new tensor would then be a leaf of a new branch of a computation graph.)

So this call to .backward() doesn’t actually backpropagate anything.

More generally, I can’t really follow the details of what you are doing here. Here
is a pytorch GAN tutorial that you might find helpful.

Best.

K. Frank

Brayn_O_Conner · February 7, 2025, 4:49pm

Hi Frank,
Thanks for your response and your help! I was already afraid that using torch.tensor(...) wouldn’t work in this case. But what would be a better approach instead?

What I’m trying to do is not a traditional GAN. In a typical GAN setup, we have a generator that produces samples (or a batch of samples) and a discriminator that learns to distinguish between real and generated samples.

However, in my case, the “discriminator” is actually a classifier that I want to improve using the data generated by the generator. To evaluate whether the classifier benefits from the generated data, it needs to be trained to completion, so let’s say for 30 epochs.

What would be the best way to implement this kind of setup in PyTorch? Any suggestions would be greatly appreciated!

Best regards
Brayn

KFrank · February 8, 2025, 4:29pm

Hi Brayn!

I don’t really understand what you are trying to do here.

As an aside, people do sometimes use generated data to train models, so it can
make sense.

If I understand you correctly, you want to generate data with your generator, use
the generated data to train the classifier, compute a figure of merit for how well
your trained classifier performs, and use that figure of merit as a loss value that
you backpropagate in order to train your generator to generate data that does a
better job of training your classifier. Is this correct?

The problem is that pytorch (for good reasons, if you dig down into it) does not
support backpropagating through the optimizer step that updates the parameters
of the classifier. That is, autograd won’t give you the gradients of your figure of
merit with respect to the parameters of the generator.

(Whether something else along the lines of what you want could work would depend
on the details of what you actually want to do.)

Best.

K. Frank

Brayn_O_Conner · February 8, 2025, 7:46pm

Hello Frank,
thanks a lot for your answer.

If I understand you correctly, you want to generate data with your generator, use
the generated data to train the classifier, compute a figure of merit for how well
your trained classifier performs, and use that figure of merit as a loss value that
you backpropagate in order to train your generator to generate data that does a
better job of training your classifier. Is this correct?

Yes, that’s exactly what I want to do

The problem is that pytorch (for good reasons, if you dig down into it) does not
support backpropagating through the optimizer step that updates the parameters
of the classifier. That is, autograd won’t give you the gradients of your figure of
merit with respect to the parameters of the generator.

OK, too bad. Then I’ll have to find another way to do this, maybe Reinforcement Learning? Though I’d need to think about that for a while, how this can work. If I understood you correctly, the problem is that the classification model is fully trained (and the gradients get reset every iteration, therefore there are no gradients after the classifier training that could be used), which makes it different from GAN training, right?

Maybe some background on how I came up with this idea: I think I once heard that some companies optimize aircraft turbines by modifying their geometry to improve efficiency, for example. These modified turbines are then tested and evaluated in a simulator.

Now I thought: in my case, a generative model could produce data, and my classifier could take on the role of the simulator. However, my context is entirely different, I’m working with accelerometer data and simply want to generate virtual data to improve classification.

Thanks a lot for your help. Do you have some ideas that I could try?

Best regards
Brayn

KFrank · February 9, 2025, 5:52pm

Hi Brayn!

Yes, in this regard what you describe is different than typical GAN training. If there
were some way you could measure the future benefit to your classifier due a single
sample (or batch) produced by your generator without taking an (official) optimizer
step, you might be able to make your training look enough like a GAN that typical
GAN techniques would work.

I don’t really understand your use case. What it seems like to me is:

You want to use fake (generated) data to train your classifier to do a better job of
classifying … Fake data?

If so, you could simply have your generator spit out a vector of 7.2s. Then your
classifier could classify it as a 7.2 sample. Presumably this is not what you mean.

There must be some interplay between your fake data and the real-world purpose
of your classifier. It’s the details of that interplay that will determine how or whether
you might train your generator to generate better fake data.

Best.

K. Frank

Brayn_O_Conner · February 9, 2025, 10:42pm

Dear Frank,
thanks again for your answer.

I don’t really understand your use case. What it seems like to me is:

You want to use fake (generated) data to train your classifier to do a better job of
classifying … Fake data?

If so, you could simply have your generator spit out a vector of 7.2s. Then your
classifier could classify it as a 7.2 sample. Presumably this is not what you mean.

There must be some interplay between your fake data and the real-world purpose
of your classifier. It’s the details of that interplay that will determine how or whether
you might train your generator to generate better fake data.

Oh, I’m so sorry. I thought I made this clear earlier. We actually have some real IMU data for Human Activity Recognition, but our data is very limited. We train a classifier on this data and achieve an F1-score of, let’s say, 50%. Now, we want to implement a generator that produces synthetic (or fake) data, which we will use alongside some real data to improve the classifier.

Generating 7.2 samples is probably not very helpful because 7.2s might never occur in the real training, validation, or test data. Moreover, the goal is to generate sliding windows of IMU data, including the corresponding label for each window. This means that simply generating a feature matrix of 7.2s along with a (random?) label would be most likely useless for improving the classifier.

We have already tried Conditional GANs and VAEs, but neither approach has significantly improved the classifier’s performance. Therefore I just wanted to try a different approach.

I hope this clarifies my use case.

Thanks again for your time and help.

Best regards
Brayn