Files wav labelling automated and processing to create datasets

Hello everyone
I’m embarking on a project that in principle is a bit intimidating (for me of course) I’m mapping the city where I live with recordings in different landmarks. With them I want to check the biotic and atropogenic levels of each place in order to make an ecoacoustic study.
I need a workflow for audio tagging and processing. This is very important for further data processing, but I lack a solid knowledge of the subject.
I have had some ‘talk’ with my friend ‘Copilot’ but after one morning I abandoned that way for not getting satisfactory results. Anyone who has worked these things around here and is willing to share the experience and way of working?
Attached are some attempts to create a script to do this but they didn’t work
Thank you very much.

import os
import pandas as pd
from import Dataset, DataLoader
import torchaudio

annotations_file = 
class AudioDataset(Dataset):
    def __init__(self, annotations_file, audio_dir, transform=None):
        self.annotations = pd.read_csv(annotations_file)
        self.audio_dir = audio_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        audio_sample_path = os.path.join(self.audio_dir, self.annotations.iloc[index, 0])
        label = self.annotations.iloc[index, 1]
        waveform, sample_rate = torchaudio.load(audio_sample_path)
        if self.transform:
            waveform = self.transform(waveform)
        return waveform, label

# Suponiendo que ya tienes un modelo entrenado llamado 'modelo'
# y una función de transformación llamada 'transformacion_audio'

# Cargar el dataset no etiquetado
dataset_no_etiquetado = AudioDataset('path/to/annotations.csv', 'path/to/audio_dir', transform=transformacion_audio)
dataloader = DataLoader(dataset_no_etiquetado, batch_size=1, shuffle=False)

# Etiquetar los nuevos sonidos
predicciones = []
for waveform, _ in dataloader:
    outputs = modelo(waveform)
    _, predicted_labels = torch.max(outputs, 1)

# Guardar las predicciones en un archivo CSV
nuevas_etiquetas = pd.DataFrame(predicciones, columns=['Etiqueta'])
nuevas_etiquetas.to_csv('nuevas_etiquetas.csv', index=False)

I’m unsure where exactly you are stuck in your workflow. Are you looking for a system to tag your audio samples first or did you do it already and would like to create a Dataset now, but are stuck?

Hi, sorry if I have not been clear in explaining myself. I have recorded soundscapes, I have them unprocessed and unlabeled. I want to create windows with durations of 3 seconds and label them so that I can then process them as data for an eco acoustic job (show the areas of the city with biotic and antepogenic noise levels.