How to randomly segment lines from one text file into two different text files

I have a text file with say, 100 lines, I want to randomly segment these lines into 80-20.Using the code below but its not doing proper partition. I am getting a different number of files . I should get 80 lines in file2 and 20 files in file1.
can someone point out the error and plz suggest if there is a better way. please note in total.txt is the original file which needs to be segmented into file1 and file 2.

Regards

def partition(l, pred):
    fid_train=open('meta/file1.txt','w')
    fid_test = open('meta/file2.txt','w')
    for e in l:
        if pred(e):
            fid_test.write(e)
        else:
            #fid_train.write(e+'\n')
            fid_train.write(e)
    return fid_train,fid_test
lines = open("meta/total_list.txt").readlines()
lines1, lines2 = partition(lines, lambda x: random.random() < 0.2)        

Your code works correctly using lists:

def partition(l, pred):
    fid_train = []
    fid_test = []
    for e in l:
        if pred(e):
            fid_test.append(e)
        else:
            fid_train.append(e)
    return fid_train,fid_test

lines = ['{}\n'.format(i) for i in range(100)]
lines1, lines2 = partition(lines, lambda x: random.random() < 0.2)   
print(len(lines1))
> 82
print(len(lines2))
> 18

and I’m unsure why the files should fail assuming readlines() returns the expected number of lines in the source file.
Note that you are not closing the files, which is not recommended.

1 Like

yes sir, @ptrblck in lists its working but its giving only numbers, I need to read these lines. The total_list has the contents as
raw_data\Ses01F_impro01_M013.wav 1 1
raw_data\Ses01F_impro02_F000.wav 2 1
raw_data\Ses01F_impro02_F001.wav 2 1
raw_data\Ses01F_impro02_F002.wav 2 1
raw_data\Ses01F_impro02_F003.wav 3 1

Now I need two separate text files which have a random selection of these files as 80- 20 %
, My code is doing so but the selection is not exact division,
I hope I am clear now.

That would be expected, since you are randomly splitting it.
If you want an exact division in 80%-20%, you could use something like:

lines = [a for a in range(100)]
idx = torch.randperm(len(lines))
idx_train = idx[:int(0.8*len(idx))]
idx_val = idx[int(0.8*len(idx)):]
1 Like

Sir I have tried a similar thing initially, But then how to write these two files using indexes?
The source file named ‘total_list.txt’ is created in same script. I have closed it,
Then open it, the train val ids are there but from source file I need to extract those lines with corresponding Idx into two text files. I have tried this

fid_train=open('meta/training1.txt','w')
fid_test = open('meta/testing1.txt','w')
f1=open("meta/total_list.txt").readlines()
lines = [a for a in range(100)]
idx = torch.randperm(len(lines))
idx_train = idx[:int(0.8*len(idx))]
idx_val = idx[int(0.8*len(idx)):]

for lines in f1:
    fid_test.write(f1[idx_val])
    
fid_test.close()

for lines in f1:
    fid_train.write(f1[idx_train])
    
fid_train.close()    

but it throws error :only integer tensors of a single element can be converted to an index
Also replaced f1 with lines ,

for lines in f1:
    fid_test.write(lines[idx_val])
    
fid_test.close()

I am not getting this error actually why it says only integer tensors … all indexes are integer only.
Thanks for reply :slight_smile:

@ptrblck sir the problem is solved,

lines = open("meta/total_list.txt").readlines()
random.shuffle(lines)

open('meta/training1.txt', 'w').writelines(lines[:80])
open('meta/testing1.txt', 'w').writelines(lines[80:])

Regards