I realized that the device I’m measuring the 512 phases from (actually these are phases that 512 transducers produce, so each phase is assigned to one transducer), due to hardware limitations is only capable of producing 128 discrete phases between 0 and 2pi. Thus, I believe it is overkill to go for a regression task. What I thought instead was to add 512 seperate nn.Linear(4096, 128)
layers with a softmax activation function, like a multi-output classification approach. For each of 512 layers I calculate a seperate loss, with the output from the vgg as input to these layers. My network now looks like this:
class MyVgg(nn.Module):
def __init__(self, version='16', batch_norm=True, pretrained=True):
super().__init__()
vgg = namedtuple('vgg', ['version', 'batch_norm', 'pretrained'])
combinations = {vgg('16', True, True): torchvision.models.vgg16_bn(pretrained=True),
vgg('16', True, False): torchvision.models.vgg16_bn(pretrained=False),
vgg('16', False, True): torchvision.models.vgg16(pretrained=True),
vgg('16', False, False): torchvision.models.vgg16(pretrained=False),
vgg('19', True, True): torchvision.models.vgg19_bn(pretrained=True),
vgg('19', True, False): torchvision.models.vgg19_bn(pretrained=False),
vgg('19', False, True): torchvision.models.vgg19(pretrained=True),
vgg('19', False, False): torchvision.models.vgg19(pretrained=False)}
self.model = combinations[vgg(version, batch_norm, pretrained)]
# Remove the last fc layer
self.model.classifier = nn.Sequential(*list(self.model.classifier.children())[:-1])
# Include seperate classifiers for each phase
self.pc = OrderedDict() # pc: phase classifiers, 512 in total
for classifier in range(512):
self.pc['PC_{}'.format(classifier)] = nn.Sequential(nn.Linear(4096, 128, bias=True)) # no need for nn.Softmax(), it is encapsulated in nn.CrossEntropyLoss()
# Set your own forward pass
def forward(self, img, extra_info=None):
x = x.view(x.size(0), -1)
pre_split = self.model(x) # before splitting to different classifiers, take the output from vgg
outputs = OrderedDict()
for pc in self.model.pc().values(): # iterate through all 512 classifiers
outputs['x{}'.format] = pc(pre_split) # pass network output to all 512 classifiers
return outputs # dictionary with the outputs from the 512 classifiers
The output is a dictionary with 512 keys, and 128 vectors as values.
And, for each classifier at the end I’m calculating the nn.CrossEntopyLoss()
(which encapsulates the softmax activation btw, so no need to add that to my fully connected layers). My true labels is again a vector of 128 values (neurons), with 1 where the true value is and 0s for the rest (one-hot encoding like). Then I sum up the 512 losses and I’m back propagating to train the network like this:
for batch_idx, (images, labels) in enumerate(Bar(loaders['train'])):
images, labels = images.to(self.device, dtype=torch.float), labels.to(self.device, dtype=torch.float)
optimizer.zero_grad()
preds = network(images) # returns a dictionary 512 outputs of 128 values (512, 128)
loss = 0
for output, target in zip(preds.values(), labels): # comparing (128, 1) vs. (128, 1) vectors
loss += self.criterion(output, target)
Do you think the whole concept makes sense? I generated 12k images today, and gonna start experimenting again tomorrow.