- Talks
- Datasets
- EGDB-PG:
an extended version of EGDB that comes with 256 amp-rendered tones
- EMOPIA+: extended version
of EMOPIA that comes with a functional representation-based tokenization
- EMOPIA: a multimodal dataset
comprising audio+MIDI of emotion-annotated pop piano solo pieces
- EGDB_BIAS_FX2: the EGDB dataset
rendered with the Positive Grid BIAS FX2 Plugin, published at
DAFx’24
- EGDB:
a dataset that contains transcriptions of the electric guitar performance
of 240 tablatures rendered with different tones, published at
ICASSP’22
- AILabs.tw Pop1K7: a
dataset comprising 1747 transcribed piano performances of Western,
Japanese and Korean pop songs, compiled in the Compound Word Transformer
paper (AAAI’21)
- DadaGP: a dataset of ~26k
GuitarPro songs in ~800 genres, converted to a token sequence format for
generative language models like GPT2, TransformerXL, etc
- CCMED & WWMED:
corpora of Western classical music excerpts (WCMED) and Chinese classical
music excerpts (CCMED) annotated with emotional valence and arousal
values (ICASSP’20 paper-a)
- #nowplaying-RS:
a new benchmark dataset for building context-aware
music recommender systems
(SMC’18 paper)
- Symbolic-Musical-Datasets: list of symbolic musical datasets, including lead
sheets and MIDIs
- Lakh Pianoroll
Dataset
(LPD): a collection of 174,154 unique multi-track piano-rolls derived
from the MIDI files in Lakh MIDI Dataset (LMD), used in our MuseGAN paper
(AAAI’18 paper)
- iKala: 252 30-second excerpts
sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX
SVS 2014-2016) (ICASSP’15 paper)
- Su
Dataset for automatic music transcription in piano solo, piano
quintet, string quartet, violin sonata, choir, and symphony
(ISMIR’16 and ISMIR’15 papers)
- MACLab Dataset for violin offset
detection (ISMIR’15 paper)
- MACLab Dataset for guitar playing
techniques (ISMIR’15 and ISMIR’14 papers)
- SCREAM-MAC-EMT
Dataset for expression analysis in violin (ISMIR’15 paper)
- Octave dual-tone dataset (SMC’14 paper)
- The
AMG1608 dataset
for personalized
music emotion recognition (ICASSP’15
paper)
- The
CH818 dataset for music emotion recognition in Chinese Pop songs
- The
DEAM and MediaEval dataset for dynamic and static music
emotion recognition
(used in the ‘Emotion in Music’ Task in MediaEval 2013-2015)
- CAL500exp
Dataset
for time-varying music auto-tagging (ICME’14 paper)
- CAL10k:
10k songs with 140 genre tags (TMM’13 paper)
- LiveJournal:
40k blog articles with user mood labels and music tags (TMM’13
paper)
- Codes
- MuseControlLite:
Multifunctional music generation with lightweight conditioners (ICML’25
paper)
- PyNeuralFx: a Python
package for neural audio effect modeling
- EMO-Disentanger: emotion-driven
piano music generation via two-stage disentanglement and functional
representation (ISMIR’24
paper)
- EMO_Harmonizer:
early version of
EMO-Disentanger, for emotion-contorllable melody harmonization
- MusiConGen: rhythm and
chord control for Transformer-based text-to-music generation (ISMIR’24
paper)
- AP-adapter: audio prompt adapter:
unleashing music editing abilities for text-to-music with lightweight
finetuning (ISMIR’24 paper)
- PiCoGen2: piano cover
generation with transfer learning approach and weakly aligned data
(ISMIR’24 paper)
- Compose &
Embellish: Well-structured piano performance generation via a
two-stage approach (ICASSP’23
paper)
- MuseMorphose: a
Transformer-VAE architecture for per-bar music style transfer
- Variable-length
piano infilling: a XLNet-based model for inpainting a piano sequence
with variable number of notes (up to 128 notes) (ISMIR’21 paper)
- LoopTest: a
benchmark of audio-domain musical phrase generation using drum loops
(ISMIR’21 paper)
- drum-aware4beat:
drum-aware ensemble architecture for improved joint musical beat and
downbeat tracking (SPL’21 paper)
- CP
Transformer: the world’s first neural sequence model for music
generation at full-song length (AAAI’21 paper)
- Pop Music Transformer: a
neural sequence model for beat-based automatic piano music composition
(MM’20 paper)
- MIDI toolkit:
Designed for handling MIDI in symbolic timing (ticks), which is the
native format of MIDI timing; we keep the midi parser as simple as
possible, and offer several useful utility functions
- Singer-identification-in-artist20:
the convolutional recurrent neural network with melody model for singer
identification with the shuffle-and-remix data augmentation technique
(ICASSP’20 paper-c)
- Speech-to-Singing
Conversion:
an end to end model for converting speech voice into singing voice
(ICASSP’20 paper-b)
- Latent inspector
for LeadsheetVAE model
- DrumVAE: a
recurrent VAE model for generating regular drum patterns (MILC’19
paper)
- musical-ml-web-demo-minimal-template (MILC’19
paper)
- DANtest: a simple framework
based on discriminative adversarial networks for testing different
adversarial losses (arxiv paper)
- Learning
to match transient sound events using attentional
similarity for few-shot sound recognition (ICASSP’19 paper-c)
- Audio-to-midi:
faster version of melodic-segnet
- melodic-segnet: for vocal melody
and general melody extraction (ICASSP’19 paper-b)
- Hung’s
instrument streaming model (ICASSP’19 paper-a)
- Hypergraph embedding: implementation of a
graph embedding learning method for hypergraphs (CIKM’18 paper)
- LeadsheetVAE: a
recurrent VAE model for generating lead sheets (ISMIR-LBD’18
paper-b)
- Lead sheet
generation and arrangement: a generative adversarial network for generating
lead sheets and their arrangement (ICMLA’18 paper)
- Pypianoroll: an ppen source Python package for handling multitrack
pianorolls
(ISMIR-LBD’18 paper)
- BMuseGAN: an extended
version of MuseGAN that uses binary neurons (ISMIR’18 paper)
- Hung’s
instrument recognizer: a CNN based model that performs frame-level
instrument prediction (i.e. instrument activity detection)
(ISMIR’18 paper)
- M&mnet: a model that
uses attentional supervision to deal with transient sound event detection
(IJCAI’18 paper)
- pop-music-highlighter: a convolutional
attention network for music highlight detection (i.e. thumbnailing),
based on emotion labels (arxiv’18 paper)
- SEN: the
similarity embedding network we proposed to deal with music medley and
other music puzzle games (AAAI’18 paper)
- MuseGAN: multi-track
sequential generative adversarial networks for symbolic music generation
and accompaniment (AAAI’18 paper)
- MidiNet: a
convolutional generative adversarial network for symbolic-domain music
(melody) generation (ISMIR’17 paper) [the PyTorch version]
- The clip2frame CNN algorithm
for event localization in music auto-tagging (MM’16 paper)
- Adaptive linear mapping model (ALMM) for
conten-based next item recommendation (RecSys’16 paper)
- Informed group-sparse
representation for singing voice separation (SPL’17 paper)
- Polar n-complex and
n-bicomplex singular value decomposition and principal component pursuit (TSP’16 paper)
- The SPORCO library
for convolutional sparse coding algorithm, developed by Brendt Wohlberg
(TASLP’16 paper)
- Complex and
quaternionic principal component pursuit for source separation
(SPL’15 paper)
- Musical onset
detection using constrained linear reconstruction (SPL’15 paper)
- The Acoustic Emotion Gaussians
(AEG) Model
for music emotion recognition of valence and arousal values (TAC’15
and MM’12 papers)
- AWtoolbox
for characterizing audio information using sparse coding based audio words (MM’14 paper)
- Multiple low rank
representation (MLRR) for source separation (ISMIR’13 paper)
(related report
by Alex Berrian in 2014)