Daniel Johnson (Harvey Mudd College).
We describe a neural network architecture which enables prediction and composition of polyphonic music in a manner that preserves translation-invariance of the dataset. Specifically, we demonstrate training a probabilistic model of polyphonic music using a set of parallel, tied-weight recurrent networks, inspired by the structure of convolutional neural networks. This model is designed to be invariant to transpositions, but otherwise is intentionally given minimal information about the musical domain, and tasked with discovering patterns present in the source dataset. We present two versions of the model, denoted TP-LSTM-NADE and BALSTM, and also give methods for training the network and for generating novel music. This approach attains high performance at a musical prediction task and successfully creates note sequences which possess measure-level musical structure.
The extended model was trained on the Piano-Midi.de classical piano MIDI database, restricted to pieces with the 4/4 time signature, with the goal of producing novel musical compositions. See the full paper for a complete description of the additional modifications.
Simpler models were trained on a variety of datasets in order to quantitatively compare the accuracy of the model relative to older, existing models. The datasets used were:
The BALSTM network had the best performance on the music prediction task. The LSTM-NADE, a non-parallel model for comparison, had the worst performance. See the full paper for a complete analysis.
|LSTM-NADE (non-parallel)||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3|
|TP-LSTM-NADE||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3|
|BALSTM||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3||1 | 2 | 3|
You can also download all of the samples in MIDI format, if you wish.
The original training data is available at http://www-etud.iro.umontreal.ca/~boulanni/icml2012.