This model consists of recurrent neural networks with hierarchical attention and conditional variational autoencoder that takes a sequence of note-level score features extracted from MusicXML as input and predicts piano performance features of the corresponding notes.