Text classification

Hi dear friends. I have a dataset for text classification. This dataset is divided into two folders of train and test. Them, each of them is divided into two folders of negative and positive. And, in every positive and negative folders I have 1000 text file in which you can see a sentence.
And, also next to these, I have a notepad file, in which there are 56050 words with their vectors. I think it’s called dictionary.
Now, I should just classify them. I didn’t worked on this topic at all. I uploaded my dataset on google drive and uploaded it in google colab.

from google.colab import drive

print(f’Number of training examples: {len(train_data)}’)
print(f’Number of testing examples: {len(test_data)}’)

now, when i run this code, it gives me 33 for length. Why does it happen? What is wrong in my proedure. Please help me.
thank you

Hi neda,

with len(train_data) you do not get the number of files in this directory but the umber of characters in your string.
train_data is still only the pathname.

really thank you. Then, how can I reach number of sentences in my train_data?
according to what you said, I think I need a code to read a train data folder with its sub folders, but I don’t know how to do it. I appreciate it friends help me.
Thank you

import os

train_files = os.listdir(train_data) # returns list of file in pathargument || should give you folder ['positive','negative']
train_file_positive = os.listdir(train_data + train_data_files[0]) # return just one text file path

Depending the the file format for your text you need to open it.
For normal .txt files it will be something like:

with open(train_file_positive, 'rb') as f:
    sentence = f.read()

Thank you so much. I will try it and give you the output.
Many thanks