How to do multi-label classification with TorchText?


(WG) #1

I can’t figure out how to properly setup a field object for multi-label classification with torchtext. Here is what I have in my dataset class:

… where lbl is a OHE numpy array (e.g., [0, 1, 0 ,0, 1, 1, 0])

My torchtext field object is defined like this:

tt_LABEL = data.Field(sequential=False, use_vocab=False)

But when I try to package everything up into a BucketIterator and get a mini-batch, I get the following exception:

only length-1 arrays can be converted to Python scalars

There error is on line 294 of field.py:
294 arr = [numericalization_func(x) for x in arr]


(Hiromi) #2

@wgpubs, did you ever find a work around?


(WG) #3

Hey @hiromi! I remember ya from fastai.

Here is code I’m using for the toxic comp. Appreciate any feedback and the good and ugly of it and what can be improved. Hope this helps:


(Hiromi) #4

Awesome! Thanks for the example!! You’re way ahead of me :slight_smile:


(Hiromi) #5

I’m currently trying to see if I can get data.TabularDataset to work kind of like this one:

My brain is too tired to keep going tonight, but I will get back to it tomorrow.


(Hiromi) #6

@wgpubs, I’ve tried many things, and your implementation is the best and cleanest!!!