On April 23, 2025, researchers Nicolas Jonason, Luca Casini, and Bob L. T. Sturm published SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward, exploring how reinforcement learning, guided by Meta’s Audiobox Aesthetics ratings, can refine piano MIDI models to produce more appealing compositions while balancing diversity in output.

The study investigates using aesthetic rating models to fine-tune a symbolic music generation system via reinforcement learning. Using group relative policy optimization, the researchers fine-tuned a piano MIDI model with Meta Audiobox Aesthetics ratings as rewards. The optimization improved low-level generated output features and increased average subjective ratings in a listening test. However, over-optimization significantly reduced diversity in model outputs.

Recent advancements in machine learning have significantly enhanced models’ ability to generate high-quality symbolic music, such as MIDI files or sheet music. A notable development is the use of large language models (LLMs) trained with specialized musical knowledge, exemplified by Notagen. This approach has demonstrated superior performance in terms of musicality and creativity compared to existing methods.

In experiments, various soundfonts were utilized, including MuseScore, FluidR3, Grandeur, and Yamaha. These collections of sounds are crucial for accurately reproducing intended musical nuances, impacting the perceived quality of generated music. The choice of soundfont can significantly affect how realistic and expressive the output sounds.

Notagen was trained on a diverse dataset featuring works by composers like Chopin, Mozart, and Philip Glass, ensuring varied and nuanced music generation. Evaluations using a linear mixed-effects model revealed that Notagen’s generated music received higher ratings than other systems, with statistically significant results (p < 0.001). This suggests that users perceive Notagen’s output as more appealing or higher quality.

Looking ahead, integrating real-time feedback and multi-modal approaches could enhance interactivity and creativity. Techniques to preserve distinct musical styles while allowing for innovation are essential, ensuring the model doesn’t blend genres into an indistinct mix. Additionally, methods like Clamp 3 may help maintain coherence across different aspects of music generation.

Notagen represents a significant advancement in symbolic music generation by leveraging LLMs with specialized training. While promising, further details on technical aspects and creativity metrics would provide deeper insights into the model’s capabilities. This innovation opens exciting possibilities for future developments in AI-generated music, offering potential for both artistic expression and practical applications.

👉 More information
🗞 SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
🧠 DOI: https://doi.org/10.48550/arXiv.2504.16839

Tags:

aesthetic ratings group relative policy optimization low-level features Meta Audiobox Aesthetics ratings model output diversity music audio piano MIDI model Reinforcement Learning subjective ratings symbolic music generation

Quantum News

Latest Posts by Quantum News:

AQT Arithmos Quantum Technologies Launches Real-World Testing Program, Starting March 31, 2026

Rigetti Computing Announces Date for Q4 & Full-Year 2025 Financial Results

Quantonation Closes €220M Fund, Becoming Largest Dedicated Quantum Investment Firm