Daniel Johnson (Harvey Mudd College).

Read the paper (PDF).


We describe a neural network architecture which enables prediction and composition of polyphonic music in a manner that preserves translation-invariance of the dataset. Specifically, we demonstrate training a probabilistic model of polyphonic music using a set of parallel, tied-weight recurrent networks, inspired by the structure of convolutional neural networks. This model is designed to be invariant to transpositions, but otherwise is intentionally given minimal information about the musical domain, and tasked with discovering patterns present in the source dataset. We present two versions of the model, denoted TP-LSTM-NADE and BALSTM, and also give methods for training the network and for generating novel music. This approach attains high performance at a musical prediction task and successfully creates note sequences which possess measure-level musical structure.

Extended Model Results

The extended model was trained on the Piano-Midi.de classical piano MIDI database, restricted to pieces with the 4/4 time signature, with the goal of producing novel musical compositions. See the full paper for a complete description of the additional modifications.

Model and Dataset Comparison

Simpler models were trained on a variety of datasets in order to quantitatively compare the accuracy of the model relative to older, existing models. The datasets used were:

  • JSB Chorales: a corpus of 382 four-part chorales by J.S. Bach.
  • MuseData: an electronic classical music library, from CCARH at Stanford.
  • Nottingham: a collection of 1200 folk tunes in ABC notation, consisting of a simple melody on top of chords.
  • Piano-Midi.de: a classical piano MIDI database. (This is the most complex dataset.)

The BALSTM network had the best performance on the music prediction task. The LSTM-NADE, a non-parallel model for comparison, had the worst performance. See the full paper for a complete analysis.

JSB Chorales MuseData Nottingham Piano-Midi.de
LSTM-NADE (non-parallel) 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3
TP-LSTM-NADE 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3
BALSTM 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3 1   |   2   |   3

You can also download all of the samples in MIDI format, if you wish.

The original training data is available at http://www-etud.iro.umontreal.ca/~boulanni/icml2012.