Dataloader not working in colab

Shubham_Singh1 · May 5, 2022, 2:19pm

i am trying to train longformer model for token classification, but some strange thing is happening on colab, as i am trying to iterate through dataloader object it’s not fetching data from dataloader.
when i am using same code on kaggle it’s working perfectly fine.
if code is required then i can provide code along with data which i am using.

Andrei_Cristea · May 5, 2022, 2:55pm

Have you double-checked whether the Dataset you passed to the DataLoader actually returns the items you think it should, when you iterate through it / has a nonzero len()?

Shubham_Singh1 · May 5, 2022, 11:38pm

thanks for reply!
i am using same code on kaggle and there it’s running smoothly
yes it’s of non zero lenght, i have checked it’s length before iterating over it, but on google colab i am facing this issue.
fyi my dataloader returns a tuple of two items, to check whether it’s working or not i am unpacking 3 items, ideally it should give me error as soon as it will fetch the items from data loader, what i found out it’s not giving any errors, so from that i speculated that it’s not fetching items from dataloader.

for batch_idx,(inputs,targets,var1) in enumerate(train_loader):
code
ideally it should throw an error here but it’s not doing that(bcz train_loader returns a tuple of 2 items but i am unpacking 3 items)

Andrei_Cristea · May 6, 2022, 12:00am

Hi Shubham,

That’s weird. Could you run the following two lines and share the output:

print(len(train_data))
print(next(iter(train_loader)))

If those print out reasonable output, then it sounds like maybe your code isn’t getting to this line:

for batch_idx,(inputs,targets,var1) in enumerate(train_loader):

since this should indeed raise an error due to how you are unpacking. You can add a bunch of print statements at previous steps to see where the code is stopping.

Even though it’s the same code as on kaggle, it’s possible that it would execute in one place but not another due to differences in Python runtime, environment, or other stuff. By following the above steps (printing out something after each line) you’ll be able to pinpoint why the code fails to reach that line.

Shubham_Singh1 · May 6, 2022, 12:29am

it’s printing length=3119 which is correct
but when i am printing print(next(iter(train_loader))), i think this command is not getting executed.
and previously when i tried to figure out where is the problem, my code was getting executed up to line where i am iterating over train_loader, but i think this command(train_loader) was not running.

if you need i can provide my code along with the dataset.

thanks!

Andrei_Cristea · May 6, 2022, 12:33am

Very strange, could you share the code where you define train_loader?

Even better would be if you have a way to share the entire colab notebook and point to the troublesome section.

Shubham_Singh1 · May 6, 2022, 12:36am

sure i am sharing my colab notebook

this is link to data which i am using
https://drive.google.com/file/d/1Xv2mT8JQgwxL1-2x5ZFz6GB1tjjLiWXk/view?usp=sharing

and problem is inside train_fn
def train_fn(model,train_loader,optimizer,scheduler):
thanks!

Andrei_Cristea · May 6, 2022, 1:02am

Can’t replicate without this file:
'/content/drive/MyDrive/feedback_price_saurabh.csv'

I would drop a debugger here, inside class TrainData(Dataset), and step through the code to see where it stops, and why it returns nothing:

def __getitem__(self,item):
    import ipdb; ipdb.set_trace()  # go step by step from here
    text=self.df['text'][item]
    ...

Shubham_Singh1 · May 6, 2022, 1:14am

https://drive.google.com/file/d/1Xv2mT8JQgwxL1-2x5ZFz6GB1tjjLiWXk/view?usp=sharing
from here you can download the data

Shubham_Singh1 · May 6, 2022, 1:17am

here is the file
https://drive.google.com/file/d/1Xv2mT8JQgwxL1-2x5ZFz6GB1tjjLiWXk/view?usp=sharing
plz download from here

Andrei_Cristea · May 6, 2022, 12:19pm

Hi - not sure how this worked on a different runtime with the same data, but your issue is that the code is getting stuck in an infinite loop here:

while i<l:
            while j<len(w_ids):
                if w_ids[j]!=None and w_ids[j]==previous:
                    i=i-1
                    #print(i,j)
                    target[j]=labels[i]
                    previous=w_ids[j]
                    i=i+1
                elif w_ids[j]!=None and w_ids[j]!=previous:
                    #print(i,j)
                    target[j]=labels[i]
                    previous=w_ids[j]
                    i=i+1
                j=j+1

Specifically, j reaches len(w_ids) before i reaches l, and therefore the outer loop just loops forever with the inner loop not executing anymore.

ipdb> print(j, len(w_ids))
1500 1500
ipdb> print(i, l)
251 919

This has nothing to do with the DataLoader or anything, just the internal logic there.

Good luck!

Shubham_Singh1 · May 7, 2022, 2:50am

sorry to bother you, but i figured out my mistake, data type of labels columns should be list, and in kaggle i was changing it from str to list but in colab i was not changing it from str to list.
i appreciate you a lot for your help!
and my apologies for the trouble that i caused unnecessarily.

how can i close this discussion?
thanks !

Andrei_Cristea · May 7, 2022, 12:52pm

Glad you were able to figure it out! You could close it by giving me a “solution” on the post where I spotted the infinite loop. Or you can just leave it as is

Good luck in your Kaggle competitions!

Shubham_Singh1 · May 7, 2022, 2:03pm

i gave the solution on the post where you spotted the infinite loop.
I appreciate your time a lot.
thanks!