Research is most probably the most important aspect of any project; implementation can go as smooth as a breeze if done right. For this exact reason, most of this week’s work was dedicated towards research, in preparation for my new project.

Music information retrieval is a small but rapidly growing field of research; concerned with extracting information from music for the purpose of scientific analysis. MIR can be divided into four sections, musical content, musical context, bibliographical content, and user context. Most of this week’s research was on musical content, which can be in turn divided into 4 parts: timbre, rhythm, melody, and loudness.

Rhythm analysis is concerned with tempo, beat and the general pattern used across the music piece. When analyzing musical rhythm, finding onsets is very important as they mark the beginning of every musical note. From onset analysis through various methods of signal processing, the rhythm information can be extracted and consequently processed. This might involve, spectrogram analysis, signal transform, tempogram analysis…etc.

As for the melody, the first step in processing it is to separate it from all the noise surrounding it. Luckily enough melody is normally very prominent, and this is where onset analysis plays a role again, by analyzing the pitch of each onset and turning it into musical notes through frequency analysis the melody of any musical piece can be extracted. Of course, however, since this project revolves around judging the performance of a piece, the pitch of each musical note is compared to the ideal frequency of the note, and by averaging all the differences from the ideal frequency a general precision or loss metric can be produced.

Timbre is what is described as the quality or the tone of a musical note, normally it is focused on understanding the source of that musical note ( a human voice, or an instrument). Nonetheless, as far as we are concerned the importance of timbre lies in relating the tone of notes in the piece to how well received the said piece is. This can be performed through different means of analysis, one of which is to represent the note visually (using spectrograms, MFCCs, DWT…ect) and training a machine learning model on it. This way human preferences might be able to take a place inside a machine.

Though the field of MIR is in its infancy, great advancements were made in it, perhaps enough to make the project we are planning on implementing possible.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.