Adapting learning materials to the level of skill of a student is important in education. In the context of music training, one essential ability is sight-reading---playing unfamiliar scores at first sight---which benefits from progressive and level-appropriate practice. However, creating exercises at the appropriate level of difficulty demands significant time and effort. We address this challenge as a controlled symbolic music generation task that aims to produce piano scores with a desired difficulty level. Controlling symbolic generation through conditioning is commonly done using control tokens, but these do not always have a clear impact on global properties, such as difficulty. To improve conditioning, we introduce an auxiliary optimization target for difficulty prediction that helps prevent conditioning collapse---a common issue in which models ignore control signals in the absence of explicit supervision. This auxiliary objective helps the model to learn internal representations aligned with the target difficulty. This enables more precise and adaptive score generation. Evaluation with automatic metrics and expert judgments shows better control of difficulty and potential educational value. Our approach represents a step toward personalized music education through the generation of difficulty-aware practice material.
- You have 10 minutes to study the score.
- Do not play during the study time, only look at the score.
- Play the piece once from beginning to end without stopping.
- Focus on rhythm, tempo, and musicality, even with mistakes.
One of the authors performed these internal examples, picking 5 random samples from each category. To avoid identification, we reduced video resolution; if the paper is accepted, the resolution will be restored. These exercises are not included in the user study, and no author participates as a subject in the user study.
This is only a supplementary material that we think is interesting, and shows the model is performing well in order to generate exercises for sight-reading.
We can clearly observe how mistakes increase as the difficulty rises (Easy, Medium, Advanced).