Boolean array expected for the condition, not float

Rexedoziem · October 2, 2022, 5:58pm

I got this error when dealing with multiple targets in multioutput classification problem.

I don’t know if I need to turn it to Boolean first for the targets=train_df[targets]?

ptrblck · October 2, 2022, 8:34pm

I’m not sure which line of code is rasing the error but it seems you might have narrowed it down to the indexing of the pandas.DataFrame? Are you trying to pass floating point values as the index to it?

Rexedoziem · October 2, 2022, 9:28pm

This line targets=train_df[targets].values is the problem, since I tried passing in float numbers from the dataframe as targets which contains 6 target variables of all float numbers, so calling “ targets=train_df[targets].values ” gave the error?

Rexedoziem · October 2, 2022, 10:21pm

ptrblck · October 2, 2022, 10:49pm

Based on your screenshot it is indeed an error in the indexing of the pandas.DataFrame, so make sure you are passing valid indices to it.

PS: you can post code snippets by wrapping them into three backticks ``` instead of screenshots which generally makes debugging easier.
If you get stuck, feel free to post a minimal, executable code snippet which would reproduce the error.

Rexedoziem · October 3, 2022, 5:55am

import torch
import torch.nn as nn

class BertDataset:
def init(self, df, texts, targets):
self.df = df
self.targets = df[‘targets’]
self.max_len = config.max_len
self.texts = df[‘full_text’]
self.tokenizer = config.TOKENIZER

def __len__(self):
    return len(self.texts)

def __getitem__(self, index):
    texts = ' '.join(str(self.texts[index]).split())

    inputs = self.tokenizer.encode(
        texts,
        None,
        max_length = self.max_len,
        padding = 'max_length',
        truncation = True,
        add_special_tokens = True

    )

    resp = {
        'ids': torch.tensor(inputs['input_ids'], dtype=torch.long),
        'mask': torch.tensor(inputs['attention_mask'], dtype=torch.long),
        'token_type_ids': torch.tensor(inputs['token_type_ids'], dtype=torch.long),
        'target': torch.tensor(self.targets[index], dtype=torch.long)
    }

    return resp

I passed in the correct index in the dataframe and it’s still giving issues? It says none of the index are in the column?

ptrblck · October 3, 2022, 6:00am

Your code is not executable so I cannot debug it, but your error seems to be pandas-specific and unrelated to PyTorch. If you could create a minimal and executable code snippet which we could copy/paste to debug, I’m happy to help. Otherwise, I can only refer to the pandas docs to check how DataFrames should be indexed.

Rexedoziem · October 3, 2022, 9:48am

import pandas as pd

df = pd.read_csv(‘train_folds.csv’)
train_df = df[df.kfold != fold].reset_index(drop=True)
valid_df = df[df.kfold == fold].reset_index(drop=True)

train_dataset = DeBERTadataset(
train_df, texts = train_df.full_text.values,
target = train_df.loc[:, targets].values)
train_data_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size = config.TRAIN_BATCH_SIZE,
shuffle = True,
num_workers = 2)
valid_dataset = DeBERTadataset(
valid_df,
texts = valid_df.full_text.values,
target = valid_df.loc[:, targets].values
)
valid_data_loader = torch.utils.data.DataLoader(
valid_dataset,
batch_size = config.VALID_BATCH_SIZE,
shuffle = False,
num_workers = 1
)

Rexedoziem · October 3, 2022, 9:49am

This is the issue?

ptrblck · October 3, 2022, 4:06pm

The new error now points to an invalid key value (the type is correct this time).

Rexedoziem · October 3, 2022, 6:28pm

I don’t really know where it’s coming from but I feel it might be from the dataset class?

Rexedoziem · October 3, 2022, 6:48pm

Maybe the getitem function?

ptrblck · October 3, 2022, 7:17pm

No, it’s the indexing in the pandas.DataFrame again and unrelated to PyTorch:

df = pd.DataFrame({'a': np.random.randn(10)})
df[2885]
# KeyError: 2885

Rexedoziem · October 4, 2022, 8:24am

I guess I didn’t specify the targets columns in the df=pd.read_csv(input, column=targets), since i placed in targets to be targets = [‘cohesion’, ‘syntax’, ‘vocabulary’, ‘phraseology’, ‘grammar’, ‘conventions’]………
But I don’t know if it will work fine?