TypeError: 'Tensor' object is not callable whilst batching

kerrangcash · December 20, 2021, 8:21pm

Hello,

I am getting a TypeError: ‘Tensor’ object is not callable error in the batching stage of my program, I have seen in ither posts this can arise from treating a tensore as a function and atempting to call it. I am unsure where this is happening. I am knew to pytorch so any help is greatly appreciated.

The code below shows the full error message.

And below is shown the program leading up to the error.

tokenizer = get_tokenizer('spacy', language='en_core_web_sm')

TEXT = data.Field(tokenize=tokenizer, use_vocab=True, lower=True, batch_first=True, include_lengths=True)
LABEL = data.LabelField(dtype=torch.long, batch_first=True, sequential=False)
fields = {'TEXT': ('text', TEXT), 'CONCLUSION': ('label', LABEL)}

training_data = data.TabularDataset(
    path='train.json',
    format='json',
    fields=fields,
    skip_header=True,
)

test_data = data.TabularDataset(
    path='test.json',
    format='json',
    fields=fields,
    skip_header=True,
)

vectors = GloVe(name='6B', dim=200)

TEXT.build_vocab(training_data, vectors=vectors, max_size=10000, min_freq=1)
LABEL.build_vocab(training_data)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

BATCH_SIZE = 2
train_itr, test_itr = BucketIterator.splits(
    (training_data, test_data),
    batch_size=BATCH_SIZE,
    sort_key=lambda x: len(x.text),
    device=device,
    shuffle=True,
    sort_within_batch=True,
    sort=False
)
for batch_no, batch in enumerate(train_itr):
    text, batch_len = batch.text
    print(text, batch_len)
    print(batch.label)

emb = nn.Embedding(2,4)# size of vocab = 2, vector len = 4
print(emb.weight)

model.embed.weight.data.copy_(TEXT.vocab.vectors)
print(model.embed.weight)

for batch in train_itr:
    text, len = batch.text
    emb = nn.Embedding(VOCAB_SIZE, EMBEDDING_DIM)
    emb.weight.data.copy_(TEXT.vocab.vectors)
    emb_out = emb(text)
    pack_out = nn.utils.rnn.pack_padded_sequence(emb_out,
                                                 len,
                                                 batch_first=True)
    rnn = nn.RNN(EMBEDDING_DIM, 4, batch_first=True)
    out, hidden = rnn(pack_out)

ptrblck · December 21, 2021, 12:03am

Based on the release notes you are using a deprecated API, which was previously available via torchtext.legacy.data.Field. However, this PR from November already removed the torchtext.legacy namespace so you might need to update your code using the migration guide.

aashish-chaubey · March 3, 2022, 5:04pm

@kerrangcash Did you find any solution. I am also stuck at the exact same step!

kerrangcash · March 3, 2022, 5:31pm

Yes. Adapting the code to the new versions of pytorch that don’t use tabular datasets and bucket iterators seemed to fix the problem. Instead I created a custom dataset for the data instead of reading them in as tabular datasets.

class Dataset(torch.utils.data.Dataset):
  def __init__(self, inputText, labels):
    self.labels = [label for label in labels]
    self.texts = [text for text in inputText]

  def classes(self):
        return self.labels

  def __len__(self):
      return len(self.labels)

  def get_batch_labels(self, idx):
      # Fetch a batch of labels
      return self.labels[idx]

  def get_batch_texts(self, idx):
      # Fetch a batch of inputs
      return self.texts[idx]

  def __getitem__(self, idx):

      batch_texts = self.get_batch_texts(idx)
      batch_y = self.get_batch_labels(idx)

      batch = {'text':batch_texts,
               'label':batch_y}

      return batch

test_data = Dataset(test_text, test_labels)
train_data = Dataset(train_text, train_labels)
validation_data = Dataset(validation_text, validation_labels)

And then batching as follows:

def collate(batch, pad_index):
    batch_ids = [i['ids'] for i in batch]
    batch_ids = nn.utils.rnn.pad_sequence(batch_ids, padding_value=pad_index, batch_first=True)
    batch_length = [i['length'] for i in batch]
    batch_length = torch.stack(batch_length)
    batch_label = [i['label'] for i in batch]
    batch_label = torch.stack(batch_label)
    batch = {'ids': batch_ids,
             'length': batch_length,
             'label': batch_label}
    return batch

batch_size = 8

collate = functools.partial(collate, pad_index=pad_index)

train_dataloader = torch.utils.data.DataLoader(train_data, 
                                               batch_size=batch_size, 
                                               collate_fn=collate, 
                                               shuffle=True)

valid_dataloader = torch.utils.data.DataLoader(valid_data, batch_size=batch_size, collate_fn=collate)
test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, collate_fn=collate)

Just to note, there was some extra processing to the data in between these steps to add a length field but the process should remain the same without that field.

aashish-chaubey · March 3, 2022, 6:21pm

Thank you so much @kerrangcash