Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure
Published in IJCNN, 2021
The rise of deep learning technologies has quickly advanced many fields, including generative music systems. There exists a number of systems that allow for the generation of musically sounding short snippets, yet, these generated snippets often lack an overarching, longer-term structure. In this work, we propose CM-HRNN: a conditional melody generation model based on a hierarchical recurrent neural network. This model allows us to generate melodies with long-term structures based on given chord accompaniments. We also propose a novel and concise event-based representation to encode musical lead sheets while retaining the melodies’ relative position within the bar with respect to the musical meter. With this new data representation, the proposed architecture is able to simultaneously model the rhythmic, as well as the pitch structures effectively. Melodies generated by the proposed model were extensively evaluated in quantitative experiments as well as a user study to ensure the musical quality and long-term structure of the output. We also compared the system with the state-of-the-art AttentionRNN. The comparison shows that melodies generated by CM-HRNN contain more repeated patterns (i.e., higher compression ratio) and a lower tonal tension (i.e., more tonally concise). Results from our listening test indicate that CM-HRNN outperforms AttentionRNN in terms of long-term structure and overall rating.