Eric Yang

Home | Lab | Publication | Teaching | Resources

Talks

Deep Learning-based Automatic Music Generation: An Overview @ CMMR 2023
Automatic Music Composition with Transformers, in Jan 2021
Machine Learning for Creative AI Applications in Music @ The 5th Taiwanese Music and Audio Computing Workshop (link)
Research at MAC Lab, Academia Sincia, in 2017
Music information retrieval @ Annual Conference of the Taiwan Data Science Foundation (台灣資料科學愛好者年會) 2016

Datasets

EGDB-PG: an extended version of EGDB that comes with 256 amp-rendered tones
EMOPIA+: extended version of EMOPIA that comes with a functional representation-based tokenization
EMOPIA: a multimodal dataset comprising audio+MIDI of emotion-annotated pop piano solo pieces
EGDB_BIAS_FX2: the EGDB dataset rendered with the Positive Grid BIAS FX2 Plugin, published at DAFx’24
EGDB: a dataset that contains transcriptions of the electric guitar performance of 240 tablatures rendered with different tones, published at ICASSP’22
AILabs.tw Pop1K7: a dataset comprising 1747 transcribed piano performances of Western, Japanese and Korean pop songs, compiled in the Compound Word Transformer paper (AAAI’21)
DadaGP: a dataset of ~26k GuitarPro songs in ~800 genres, converted to a token sequence format for generative language models like GPT2, TransformerXL, etc
CCMED & WWMED: corpora of Western classical music excerpts (WCMED) and Chinese classical music excerpts (CCMED) annotated with emotional valence and arousal values (ICASSP’20 paper-a)
#nowplaying-RS: a new benchmark dataset for building context-aware music recommender systems (SMC’18 paper)
Symbolic-Musical-Datasets: list of symbolic musical datasets, including lead sheets and MIDIs
Lakh Pianoroll Dataset (LPD): a collection of 174,154 unique multi-track piano-rolls derived from the MIDI files in Lakh MIDI Dataset (LMD), used in our MuseGAN paper (AAAI’18 paper)
iKala: 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX SVS 2014-2016) (ICASSP’15 paper)
Su Dataset for automatic music transcription in piano solo, piano quintet, string quartet, violin sonata, choir, and symphony (ISMIR’16 and ISMIR’15 papers)
MACLab Dataset for violin offset detection (ISMIR’15 paper)
MACLab Dataset for guitar playing techniques (ISMIR’15 and ISMIR’14 papers)
SCREAM-MAC-EMT Dataset for expression analysis in violin (ISMIR’15 paper)
Octave dual-tone dataset (SMC’14 paper)
The AMG1608 dataset for personalized music emotion recognition (ICASSP’15 paper)
The CH818 dataset for music emotion recognition in Chinese Pop songs
The DEAM and MediaEval dataset for dynamic and static music emotion recognition (used in the ‘Emotion in Music’ Task in MediaEval 2013-2015)
CAL500exp Dataset for time-varying music auto-tagging (ICME’14 paper)
CAL10k: 10k songs with 140 genre tags (TMM’13 paper)
LiveJournal: 40k blog articles with user mood labels and music tags (TMM’13 paper)

Codes

MuseControlLite: Multifunctional music generation with lightweight conditioners (ICML’25 paper)
METEOR: Melody-aware Texture-controllable Symbolic Orchestral Music Generation via Transformer VAE (IJCAI’25 paper)
diffFx: A PyTorch-based library for differentiable audio effects processing, enabling deep learning integration with professional audio processing algorithms (ISMIR-LBD’25 paper)
PyNeuralFx: a Python package for neural audio effect modeling
EMO-Disentanger: emotion-driven piano music generation via two-stage disentanglement and functional representation (ISMIR’24 paper)
EMO_Harmonizer: early version of EMO-Disentanger, for emotion-contorllable melody harmonization
MusiConGen: rhythm and chord control for Transformer-based text-to-music generation (ISMIR’24 paper)
AP-adapter: audio prompt adapter: unleashing music editing abilities for text-to-music with lightweight finetuning (ISMIR’24 paper)
PiCoGen2: piano cover generation with transfer learning approach and weakly aligned data (ISMIR’24 paper)
Compose & Embellish: Well-structured piano performance generation via a two-stage approach (ICASSP’23 paper)
MuseMorphose: a Transformer-VAE architecture for per-bar music style transfer
Variable-length piano infilling: a XLNet-based model for inpainting a piano sequence with variable number of notes (up to 128 notes) (ISMIR’21 paper)
LoopTest: a benchmark of audio-domain musical phrase generation using drum loops (ISMIR’21 paper)
drum-aware4beat: drum-aware ensemble architecture for improved joint musical beat and downbeat tracking (SPL’21 paper)
CP Transformer: the world’s first neural sequence model for music generation at full-song length (AAAI’21 paper)
Pop Music Transformer: a neural sequence model for beat-based automatic piano music composition (MM’20 paper)
MIDI toolkit: Designed for handling MIDI in symbolic timing (ticks), which is the native format of MIDI timing; we keep the midi parser as simple as possible, and offer several useful utility functions
Singer-identification-in-artist20: the convolutional recurrent neural network with melody model for singer identification with the shuffle-and-remix data augmentation technique (ICASSP’20 paper-c)
Speech-to-Singing Conversion: an end to end model for converting speech voice into singing voice (ICASSP’20 paper-b)
Latent inspector for LeadsheetVAE model
DrumVAE: a recurrent VAE model for generating regular drum patterns (MILC’19 paper)
musical-ml-web-demo-minimal-template (MILC’19 paper)
DANtest: a simple framework based on discriminative adversarial networks for testing different adversarial losses (arxiv paper)
Learning to match transient sound events using attentional similarity for few-shot sound recognition (ICASSP’19 paper-c)
Audio-to-midi: faster version of melodic-segnet
melodic-segnet: for vocal melody and general melody extraction (ICASSP’19 paper-b)
Hung’s instrument streaming model (ICASSP’19 paper-a)
Hypergraph embedding: implementation of a graph embedding learning method for hypergraphs (CIKM’18 paper)
LeadsheetVAE: a recurrent VAE model for generating lead sheets (ISMIR-LBD’18 paper-b)
Lead sheet generation and arrangement: a generative adversarial network for generating lead sheets and their arrangement (ICMLA’18 paper)
Pypianoroll: an ppen source Python package for handling multitrack pianorolls (ISMIR-LBD’18 paper)
BMuseGAN: an extended version of MuseGAN that uses binary neurons (ISMIR’18 paper)
Hung’s instrument recognizer: a CNN based model that performs frame-level instrument prediction (i.e. instrument activity detection) (ISMIR’18 paper)
M&mnet: a model that uses attentional supervision to deal with transient sound event detection (IJCAI’18 paper)
pop-music-highlighter: a convolutional attention network for music highlight detection (i.e. thumbnailing), based on emotion labels (arxiv’18 paper)
SEN: the similarity embedding network we proposed to deal with music medley and other music puzzle games (AAAI’18 paper)
MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment (AAAI’18 paper)
MidiNet: a convolutional generative adversarial network for symbolic-domain music (melody) generation (ISMIR’17 paper) [the PyTorch version]
The clip2frame CNN algorithm for event localization in music auto-tagging (MM’16 paper)
Adaptive linear mapping model (ALMM) for conten-based next item recommendation (RecSys’16 paper)
Informed group-sparse representation for singing voice separation (SPL’17 paper)
Polar n-complex and n-bicomplex singular value decomposition and principal component pursuit (TSP’16 paper)
The SPORCO library for convolutional sparse coding algorithm, developed by Brendt Wohlberg (TASLP’16 paper)
Complex and quaternionic principal component pursuit for source separation (SPL’15 paper)
Musical onset detection using constrained linear reconstruction (SPL’15 paper)
The Acoustic Emotion Gaussians (AEG) Model for music emotion recognition of valence and arousal values (TAC’15 and MM’12 papers)
AWtoolbox for characterizing audio information using sparse coding based audio words (MM’14 paper)
Multiple low rank representation (MLRR) for source separation (ISMIR’13 paper) (related report by Alex Berrian in 2014)