Hi, I am training two models in GAN fashion, training one during the other being frozen. Besides, model A’s input is model B’s output. Both model A and B are auto-regressive model - the forward method is a little bit different from the inference, because I use teach forcing during forward.
Now when I train model A, I need to do inference of B. I try to develop an efficient data pipeline relying on dataloader and dataset. Should I use the approach below ?:
dev myDataset():
def __init__(self):
...
def __len__(self):
...
def __getitem__(self,batch,model_B):
return model_B.inference(batch)
train_loader = DataLoader(myDataset, ...)
For every couple epochs, I will update model_B, so the data cannot be prepared in advance. I measured the inference consuming time, it will take about 2 seconds per batch. I have about 10,000 batches for training. Could anyone tell me the aforementioned way is efficient? - By efficient, I mean it can do parallel processing and prefetching (model_B can keep inferencing while model_A is being trained)?
Thank you very much!