Converting between key signatures is really easy so we could put key signature as a user input. It's just reading the signature from an image that is tricky.
Like key signature converting between clefs is very easy so that could be a user input too.
Accidentals are beyond the scope of our computer vision algorithm.
This means that the notes range are from A to C. We could increase this range of notes, but it did not seem necessary for our demonstration so we just stuck with these notes.
Using hough transform, we can determine where the lines on a piece of sheet music are. There are many different parameters we can use to alter when using the hough transorm, so how program requires that the user alter these parameters of blurring value, determining weak and strong edges, minimum count to label a line, etc so that they can find good lines for the image.
We know that the staff lines will be the longest line on a page of sheet music. Therefore we discard all lines that have a length that is much smaller than the length of the longest line. Likewise we compare the slope of this long line to the slope of all the other lines and discard all lines that have a line of a significantly different slope. We also rotate the image so that the staff lines are horizontal.
We are looking for five lines that have gaps that are all the same within a certain range of variance to each other. We search through our list of lines and if the difference between y positions of the lines of the adjacent parallel lines is the same as the gap of the previous two lines we look at the third gap, and if that is also the same we look at the fourth gap. If all these gaps are the same we mark a staff.
However if the second gap is not the same as the first, or third not the same as the second, or fourth not the same as the third, we start looking again for a new staff with the most recent gap being the first gap.
We create a scan window that travels left to right horizontally along each staff (remember we will have rotated the image so that the staffs are horizontal if there were not originally). The width and height of the window is gap, and we search downward along the columns of the pixels in staff and calculate the average darkness of the window. If the average darkness is below the average threshold (a user input), we declare that there is a note there. If there is a note we add gap to the next x position we scan and scan along the staff from that position. This is necessary so that we don't count the same note twice, but ensures that we don't jump too far and skip a note.
Also note that we determined the notes above and below the staff by using the gap and simply adding/subtracting above/below the staff.
When we find notes we label them with a number from -2 to 10 and print out the corresponding notes assuming c major and treble clef and no accidentals. This is why it would be simple to map these numbers to other key signatures or clefs if we knew what those were.
Since this uses hough transform, users enter these parameters into the command line,
./hough sigma lo hi nBlur mincnt avgthresh in.png out.png
avgthresh is specific to our program, and determines the average darkness of the scan window at which we will consider something a note.
length startx starty endx endy slope staff# staffline
References: Professor Scharstein and "Computer Vision" by Shapiro and Stockman