Multi Label Classification in pytorch

SpandanMadan · March 6, 2017, 8:46am

Hi Everyone,

I’m trying to use pytorch for a multilabel classification, has anyone done this yet?

I have a total of 505 target labels, and samples have multiple labels (varying number per sample).

I tried to solve this by banalizing my labels by making the output for each sample a 505 length vector with 1 at position i, if it maps to label i, and 0 if it doesn’t map to label i.

Then, I tried to use the multilabelmarginloss().

Problem I’m facing - with more iterations, the output of the model should move towards a sparse vector with most elements 0 and a few elements 1, but it is actually moving towards a vector with often very large negative values.

Can some one tell me how to go about this?

Thanks!

smth · March 6, 2017, 4:24pm

Do you have a smaller dataset?

505 target labels with very few samples might just not train well. Maybe add weight decay?

SpandanMadan · March 21, 2017, 5:35am

It trained well, and gave great results! Thanks for the reply, it was a super silly mistake because I was using the loss function the wrong way. Figured it out after spending some time on the docs Thank you!

jlee · April 7, 2017, 6:42am

Can you tell us what kind of loss function you used?

AjayTalati · April 13, 2017, 7:16pm

Hi @SpandanMadan, great to know you managed to get multi-label classification working in PyTorch,

I’d like to try doing the same. Can you suggest any small multi-label datasets, (i.e. samples with multiple labels), to start experimenting with?

I’m a novice with this type of problem, so not sure where’s the simplest place to start?

Best regards,

Ajay

dablyo · April 21, 2017, 10:54pm

ok, @AjayTalati, you can try license number plate dataset.
there are six or seven digits/letters in plate, if choose 7,
every character has 36 possible classes(A-Z,0-9)
so every number plate has 736 labels as targets, the value 1 indicate the position related to a special character’s value,i36+k(0<=i<=num_character, 0<=k<=35), i indicate the position, and k indicate the value of character.
for example,if target[49]=1, means 1*36+13, the 2nd charater is ‘M’

i’m also learning pytorch, and take it as an exercise,

the input is BCHW, using multilabelmarginloss()

AjayTalati · April 21, 2017, 11:52pm

Hey @dablyo ,

thanks a lot, great idea, ! And thank you for the example too!

Have you got a link to the data set you are using please? I’d like to work on it too as an exercise ,

I’ve tried a few experiments with using multilabelmarginloss(), but I couldn’t get it to work ?

I could still train multiple, multi-class, classifiers on the same dataset. So for example I could train, 7 classifiers, one for each digit/letter in a number plate, but that’s not too smart because there’s some structure in the sequence of digits/letters, at least for UK number plates .

Kind regards,

Ajay

dablyo · April 23, 2017, 4:48am

In Number plate recognition with Tensorflow - Matt's Ramblings, the author has generate dataset from backgroupd picture and number plate font file, you can learn from it.

I’ve meet some problem
while training ,in 2nd minibatch, the output of multilabelmarginloss() is zero, i cann’t find out the reason.

the inputting image size 224224, the target vector width is 252(736),
‘X’ versus 000000000000000000000001000000000000
‘A’ versus 100000000000000000000000000000000000
the source code is as follows:

import …

DIGITS = “0123456789”
LETTERS = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”
CHARS = LETTERS + DIGITS
NPLEN=7
NUM_CLASSES=252

class anprmodel(nn.Module):

def __init__(self):
    super(anprmodel,self).__init__()
    self.num_classes=NUM_CLASSES
    self.conv1=nn.Conv2d(1,48,kernel_size=5,padding=2)  
    self.pool1=nn.MaxPool2d(kernel_size=(2,2),stride=2)
    self.conv2=nn.Conv2d(48,64,kernel_size=5,padding=2)
    self.pool2=nn.MaxPool2d(kernel_size=(2,2),stride=(2,2))
    self.conv3=nn.Conv2d(64,128,kernel_size=5,padding=2)
    self.pool3=nn.MaxPool2d(kernel_size=(2,2),stride=(2,2))        
    self.fc1=nn.Linear(28*28*128,2048)  
    self.fc2=nn.Linear(2048,NUM_CLASSES)
    
def forward(self,x): 
    x=F.relu(self.pool1(self.conv1(x)))  #input: 224*224
    x=F.relu(self.pool2(self.conv2(x)))  #input 112*112
    x=F.relu(self.pool3(self.conv3(x)))  #input 56*56
    x=x.view(-1,28*28*128)             #28*28
    x=F.relu(self.fc1(x))
    x=self.fc2(x)                            #output: 252
    return x

class NPSET(torch_utils_data.Dataset):
picroot=‘np’

def code_to_vec(self,p, code):
    def char_to_vec(c):
        y = np.zeros((len(CHARS),))
        y[CHARS.index(c)] = 1.0
        return y
    c = np.vstack([char_to_vec(c) for c in code])
    return c.flatten()

def __getitem__(self,index):
    label,img=self.labels[index], self.dataset[index]
    if self.data_transform is not None:
        img=self.data_transform(img)
    labelarray=self.code_to_vec(1,label)
    return (img,labelarray)

def __len__(self):
    return self.len

def __init__(self,root,data_transform=None):
    self.picroot=root
    self.data_transform=data_transform

    if not os.path.exists(self.picroot):
        raise RuntimeError('{} doesnot exists'.format(self.picroot))
    for root,dnames,filenames in os.walk(self.picroot):
        imgs=[] 
        labels=[]
        for filename in filenames:
            picfilename=os.path.join(self.picroot,filename)  #file name:
            im=cv2.imread(picfilename,cv2.IMREAD_GRAYSCALE)
            im=cv2.resize(im,(224,224))
            imgs.append(im)
            m=filename.split('_')  #filename style: xxxxxxxx_yyyyyyy_z.png
            labels.append(m[1])
        self.dataset=imgs
        self.labels=labels
        self.len=len(filenames)

def accuracy(output,target): #Tensor:Tensor #size: batchsize252
batchsize=output.size(0)
assert(batchsize==target.size(0))
p=torch.chunk(output,7,1) #p[0]–p[6], batchsize36
t=torch.chunk(target,7,1)

a=np.ones((batchsize,1),np.dtype('i8'))*7      #7,7,7,7,7.....7   num is batchsize
ts=torch.from_numpy(a)   #LongTensor, tmp, and will be cut
ps=torch.from_numpy(a)

for i in range(0,NPLEN):   # the index of max value in every segment
    _,pred=torch.max(p[i],1)
    ps=torch.cat((ps,pred),1)
    _,pred=torch.max(t[i],1)
    ts=torch.cat((ts,pred),1)
sub=torch.LongTensor([1,2,3,4,5,6,7])    
ts=torch.index_select(ts,1,sub)   #LongTensor
ps=torch.index_select(ps,1,sub) #LongTensor
tspseq=torch.eq(ts,ps)      #ByteTensor
tspseqsum=torch.sum(tspseq,1)   #ByteTensor ,it will be 7 if right
a=np.ones((batchsize,1),np.uint8)*7   #byte ndarray
result=torch.eq(tspseqsum,torch.from_numpy(a))
return batchsize,torch.sum(result)   #batchsize  number of right

class recMeter(object):
def init(self):
self.reset()
is_best=False
best=0
current=0

def reset(self):
    self.right = 0
    self.sum = 0
    current=0

def updatecnt(self, n, r):
    self.right+=r
    self.sum+=n
    
def updateaccurate(self):
    self.current=self.right/self.sum
    if ac > best:
        is_best=True
        best=ac

if name == “main”:
model=anprmodel()
model.cuda()
cudnn.benchmark=True
batch_size=10
data_transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((107.897212036,), (3893.57887653,)),
])
npset = NPSET(root=‘/home/wang/git/nppic/nproot/plate’, data_transform=data_transform)
nploader = torch.utils.data.DataLoader(npset, batch_size=batch_size, shuffle=True, num_workers=1) #train
npvalset=NPSET(root=‘/home/wang/git/nppic/npval/plate’, data_transform=data_transform)
npvalloader=torch.utils.data.DataLoader(npvalset, batch_size=batch_size, shuffle=False, num_workers=1) #validate
criterion=nn.MultiLabelMarginLoss()
optimizer=torch.optim.SGD(model.parameters(),0.1,momentum=0.9)

meter=recMeter()
for epoch in range(0,1):
    #Sets the learning rate to the initial LR decayed by 10 every 30 epochs
    lr=0.1*(0.1**(epoch//30))
    #for param_group in optimizer.param_groups:
    #    param_group['lr']=lr
    #train
    model.train()
    for i,data in enumerate(nploader):
        inputs,targets = data   #inputs size: batchsize*224*224
        inputs=torch.unsqueeze(inputs,1)  ##inputs size: batchsize*1*224*224
        targets=torch.LongTensor(np.array(targets.numpy(),np.long))
        targets=targets.cuda()
        inputs=inputs.cuda()
        input_var=torch.autograd.Variable(inputs)
        target_var=torch.autograd.Variable(targets)
        
        optimizer.zero_grad()
        output_var=model(input_var)
        #porcess loss
        character_loss=criterion(output_var,target_var)
    
        # compute gradient and do SGD step
        character_loss.backward()
        optimizer.step()

I’ve execute train loop in python console by this way:

npiter=iter(nploader)

then

(inputs,targets)=npiter.next()
inputs=torch.unsqueeze(inputs,1)
targets=torch.LongTensor(np.array(targets.numpy(),np.long))
targets=targets.cuda()
inputs=inputs.cuda()
input_var=torch.autograd.Variable(inputs)
target_var=torch.autograd.Variable(targets)
optimizer.zero_grad()
output_var=model(input_var)
character_loss=criterion(output_var,target_var)
character_loss.backward()
optimizer.step()
print(‘Loss: {:.6f}’.format(character_loss.data[0]))

from 2nd mini batch, the loss become 0.
I’ve stucked here.

sshuair · April 29, 2017, 11:50am

I got simiar result, I got zero loss from the second epoch. Do you find the reason?

AjayTalati · April 29, 2017, 11:57am

Hi there,

I can’t get multi-label classification working either, but @bartolsthoorn and @mratsim have found possible ways to do it here

hope that helps?

Aj

bartolsthoorn · April 29, 2017, 12:15pm

@AjayTalati OK I wrote a simple example here: https://gist.github.com/bartolsthoorn/36c813a4becec1b260392f5353c8b7cc

For accuracy it is important to note that you can pass the output first through nn.Sigmoid and everything > 0.5 is true (look at the Sigmoid function: https://en.wikipedia.org/wiki/Sigmoid_function

AjayTalati · April 29, 2017, 12:44pm

Hey @bartolsthoorn,

that’s really helpful, thank you very much , really nice example

Kind regards,

Ajay

mratsim · April 29, 2017, 11:19pm

@AjayTalati

Either after your last fc you do a sigmoid and then you use BCELoss or F.binary_crossentropy as your criterion/lossfunction

Or you directly use MultiLabelSoftMarginLoss as your loss function (it comes with sigmoid inside)

Now once you have your prediction, you need to threshold. 0.5 is the default naive way but it’s probably not optimal. In any case, once you get there, great !

Next part is technical optimization, you can do Multilabel classification without

Regarding the threshold, you might want to optimize either a common threshold for all your outputs (it can be 0.2, 0.5, 0.123456 who knows) or optimize a threshold per label class, especially if your classes as unbalanced.
You will need a solid validation set and a MultiLabel evaluation metrics (Hamming Loss, F1-score, Fbeta score).

An example code for the first strategy is here on Kaggle.

For the second strategy, I’m deep into various papers myself so I can’t help yet.
One thing to keep in mind is your “best threshold” will probably overfit the validation set, so use regularization, cross-validation or other anti-overfitting strategy.

nabergh · May 4, 2017, 10:30am

Does anyone understand how MultiMarginLoss is calculated exactly? I’m not sure I understand completely.

loss(x, y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x.size(0)
where i == 0 to x.size(0), j == 0 to y.size(0), y[j] != 0, and i != y[j] for all i and j.

The docs say y is a set of indices. If y[j] != 0 is enforced, how do you check the loss for class 0? Also if x belongs to two or more classes, how does max(0, 1 - (x[y[j]] - x[i])) contribute to the loss when both y[j] and x[i] are classes that x belongs to?

I also don’t know how to find the source code for MultiMarginLoss the docs link isn’t very informative.

mratsim · May 4, 2017, 11:00am

Had the same issue with the loss documentation. Myguess is that the current code is probably this one https://github.com/pytorch/pytorch/blob/master/torch/legacy/nn/MultiLabelMarginCriterion.py and what is in the docs a stub before it’s converted to the new APIs

ahkarami · May 11, 2017, 2:07pm

Hi Everyone,

I’m trying to Finetune the pre-trained convnets (e.g., resnet50) for a data set, which have 3 categories. In fact, I want to extend the introduced code of ‘Transfer Learning tutorial’ (Transfer Learning tutorial) for a new data set which have 3 categories. In addition, in my data set each image has just one label (i.e., each train/val/test image has just one label). Could you help me please to do that?
I have changed the above-mentioned code as follows:

I have changed the parameters of nn.Linear as follow:

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 3) # 3 means we have 3 class labels

I have changed the Loss function:
criterion = nn.NLLLoss()
I have changed the ‘train_model’ method as follow:

…
m = nn.LogSoftmax()
outputs = model(inputs)
_, preds = torch.max(outputs.data, 1)
loss = criterion(m(outputs), labels)

However, my obtained result isn’t good at all. As a result, my precise questions are as follows:

In these cases which Loss function must be used?
Are those changes for training the model and compute the loss correct?

mratsim · May 12, 2017, 11:22am

@ahkarami I think you should create a separate topic for your issue which is very different from the original post. You are doing Multiclass classification instead of multilabel.

Your loss function is correct btw.

ahkarami · May 12, 2017, 5:41pm

Thank you very much for your help. I agree with you there. As a result, I will create a new topic (Multiclass Classification in PyTorch).

dablyo · May 17, 2017, 2:50am

@AjayTalati
@mratsim

I’ve used MultiLabelSoftMarginLoss and Adam optimizer，the loss looked well.
the SGD optimizer worked properly also, and same as last fc along with sigmoid,then BCELoss.

the MultiLabelMarginLoss doesn’t work, loss become 0 in 2nd minibatch.

the last loss is 0.08…, cann’t become smaller.
Train Epoch: 29 (19%)Loss: 0.081794
Train Epoch: 29 (39%)Loss: 0.080127
Train Epoch: 29 (59%)Loss: 0.083426
Train Epoch: 29 (79%)Loss: 0.086233
Train Epoch: 29 (99%)Loss: 0.082037

but for 252(36*7) destinations, for different image, for example 10 images from test set, the model gave a same prediction,just like:
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
… ⋱ …
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
0.0006 0.0006 0.0006 … 0.0891 0.0742 0.1139
[torch.FloatTensor of size 10x252]

I’m whole confused

SpandanMadan · August 8, 2017, 5:45am

Hi,

I used BCELoss. It’s the standard for multi label classification in many ways. Give it a shot.