I’m extending open source code based on a paper and seek help understanding code related to the loss (the author already has a backlog of questions and comments from me at this point). My hope is that someone with more expert knowledge on losses can understand the code, and then I can make it easier for the next person. My fork with the file in question is here: Domain-Agnostic-Sentence-Specificity-Prediction train.py
Link to paper: Domain Agnostic Real-Valued Specificity Prediction
This model involves both supervised training and then unsupervised training, with a teacher/student learning paradigm (the supervised model is the teacher and the student is learning on unlabeled data for a new domain). The supervised training portion uses labels with 2 classes. I’m extending it to 4 classes (there was even a parameter to support this, but it didn’t work). I’m getting an error at the function cat
: ou=ou* torch.cat((a,a), 1)
Traceback (most recent call last):
File “train.py”, line 491, in
train_acc = trainepoch(epoch)
File “train.py”, line 394, in trainepoch
ou=ou* torch.cat((a,a), 1)
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 1
My question is, how can I get past this error while also making sure the code does what it’s supposed to as this is within the training loop?
The variable names are hard to understand, so I’ve added comments with educated guesses on what they mean. All the comments other than “backward” are from myself.
output = pdtb_net((s1_batch, s1_len),s1_batchf) # Supervised model
output2 = pdtb_net((s1_batch2, s1_len2),s1_batchf2)
outputu = pdtb_net((su_batch, su_len),su_batchf) # "u" for unsupervised or unlabeled
outputu2 = pdtb_net2((su_batch2, su_len2),su_batchf2)
if params.loss==0:
pred = output.data.max(1)[1]
else:
pred=output.data[:,0]>0
assert len(pred) == len(s1[stidx:stidx + params.batch_size])
if params.loss==0: # This code is used because params.loss = 0
ou = F.softmax(outputu, dim=1) # output unlabeled
ou2 = F.softmax(outputu2, dim=1)
sou = F.softmax(output, dim=1) # supervised model output
sou2 = F.softmax(output2, dim=1)
a,_=torch.max(ou,1)
sa,_=torch.max(sou,1)
a=(a.detach()>params.th).view(-1,1).float()
sa=(sa.detach()>params.th).view(-1,1).float()
ou=ou* torch.cat((a,a), 1)
ou2=ou2* torch.cat((a,a), 1)
sou=sou* torch.cat((sa,sa), 1)
sou2=sou2* torch.cat((sa,sa), 1)
else: # This code is not used but may be an alternative to consider
ou=outputu[:,0]
ou2=outputu2[:,0]
a=(ou.detach()>params.th).view(-1,1).float()
ou=ou* a
ou2=ou2* a
ou2.require_grad=False
sou2.require_grad=False
loss2=( F.mse_loss(ou, ou2.detach(), size_average=False)+F.mse_loss(sou, sou2.detach(), size_average=False)) / params.n_classes/params.batch_size
# loss
if params.loss==0:
tgt_batch=torch.cat([1.0-tgt_batch.view(-1,1),tgt_batch.view(-1,1) ],dim=1)
oop=F.softmax(output, dim=1)
oop2=F.softmax(outputu, dim=1)
loss3=0
if params.use_gpu:
pppp=Variable(torch.FloatTensor([1/oop.size(0)]).cuda())
else:
pppp=Variable(torch.FloatTensor([1/oop.size(0)]))
dmiu=torch.mean(oop2[:,1])
dstd=torch.std(oop2[:,1])
loss3=loss3+torch.abs(torch.mean(oop2[:,1])-params.klmiu)+torch.abs(torch.std(oop2[:,1])-params.klsig)
kss=float(params.klsig)
loss1 = loss_fn(oop, tgt_batch.float())
else: # This code is not used
loss1 = loss_fn(output[:,0], (tgt_batch*2-1).float())
if epoch>=params.se_epoch_start: # I think SE is for self ensembling
loss=loss1+params.c*loss2+params.c2*loss3
else:
loss=loss1+params.c2*loss3
all_costs.append(loss.item())
words_count += (s1_batch.nelement()) / params.word_emb_dim
# backward
optimizer.zero_grad()
loss.backward()