Hi, I am trying to create a learning model. I propose to convert an orchestral score to piano score, what we technically call ‘reduction’.
The first thing I am thinking about is the type of data I should use. Associated to this, the approach (vision, NLP, etc…).
I have a dataset It contains scores of orchestral with its reduction to piano each one.
The dataset is distributed in several folders according to the origin of the files…
- Each folder contains its folders indexed with numbers.
- another folder with a CSV with the same name and metadata.
-in each indexed subfolder, there are two MIDI files and their corresponding CSV.
- the MIDI files are an orchestration and its corresponding piano version.
- CSV contains mappings between the midi track names and some normalized names for the instrument names or e.g., a MIDI track name, a MIDI track name, a MIDI track name, a MIDI track name, a MIDI track name and a MIDI track name:
*Beethoven_classical_archives/32/beet9m2.csv and Beethoven_classical_archives/32/symphony_9_2_orch.csv.
Could someone give some advice for this task? Also if someone is interested in the project, we could work on It as a team.
I would be very gratefull!!
A vision model(i.e. “UNet”) could work for what you’re looking to achieve. One drawback, though, might be retaining key and time signature context between “frames”. If you break it up into “lines”, you’d want to ensure that you’re able to fit the sheet music for the entire orchestra as an input, and, from what I understand, those can be pages worth of music for single a line. So just be sure to account for fringe cases(i.e. full orchestra).
An approach working directly with the midi could work, too, as either a hybrid vision approach(i.e. midi → learned embedding → transformer encoder/decoder → reverse resnet) or strictly “NLP” based model architecture. The advantage being a more rigid structure, less prone to error.
Given how little data is actually involved in a midi file, you could likely run entire pieces at a time with an NLP approach, instead of breaking it up line by line.
Have you done any work with raw midi files?
Hello and Happy New Year;
I have worked with mel spectograms with vision models but I see a lot of problems when working with orchestral audio files. I am not referring to timbre identification, but to specific aspects of the music and the score such as compas, rhythm, harmony, fill voices or duplications and structural voices, etc…
The point is that the learning is done with inputs (data) from orchestral midi files associated with your piano reductions.
I have not yet worked with NLP and MIDI. But I think a good application to start with, might be to build a model architecture based strictly on NLP (as you say) that works directly with MIDI data. This could involve preprocessing MIDI data to convert it into an input format suitable for an NLP model, such as a sequence of tokens representing musical concepts or events. You could then train that NLP model, like a transformer, on this preprocessed data to generate new MIDI data.
Do you know of any successful applications along these lines?
When you say you have the “orchestral scores”, you mean midi files and not sheet music - is that correct? Or are these audio files? Sorry, but when I hear music “scores”, I think of sheet music. Albeit, easy enough to produce sheet music, too, if you have the midi files and a program like Encore or Finale.
Midi files are binary:
It shouldn’t be too tough to set up a dictionary and use
nn.embedding for a learnable set of inputs.
And just to double check, is the goal to make the reduction piano outputs as midi, sheet music, or audio?
Also, how complex do you want to get with subdivision of beats? (i.e. triplet divisions of a 128th note is usually the smallest I’ve seen)
Yes, sorry, I don’t think I made myself clear. I meant MIDI files linked to a CSV each. I mean that in each folder I have 4 files, 2 CSV and 2 MIDI: One CSV and one MIDI are the orchestral version and the other two are the piano reduced version.
The second question is that the objective would be to return a MIDI file…
I see there is a Python library for interacting with MIDI files here:
In particular, the “Parsing Midi Bytes” looks quite useful: