- Deep Learning for Music Analysis and Generation, NTU
COMME5070, 2025 Fall
- Official
course website at NTU (access limited): https://cool.ntu.edu.tw/courses/52013
- Slides: https://github.com/affige/DeepMIR/
- Syllabus
- W1.
Introductions & fundamentals of musical audio
- W2.
Music classification & music foundation models
- W3.
Audio codec models & audio language models & audio captioning
- W4.
Audio effect modeling
- W5.
Transformer-based music generation
- W6.
Diffusion-based music generation
- W7.
Singing voice generation & song generation
- W8.
Differentiable DSP models and automatic mixing
- W9.
Fundamentals of symbolic music & symbolic MIDI generation
- W10.
Advanced symbolic MIDI generation
- W11.
Cover generation & MIDI-to-audio generation
- W13.
Miscellaneous topics (transcription, source separation, etc)
- W14.
Miscellaneous topics (transcription, source separation, etc)
- List of final projects presented
by the class students
- BLT: An IPA-Aware Agentic Framework for Controllable
Lyric Translation
- High-level Semantic Direction Discovery
– extensions of this project has been accepted for publication
at KDD’26:
“AnchorSteer: Self-discovered concept injection for structure-preserving
music editing”
- Style Transfer of Audio
Effects with Differentiable Signal Processing
- Unified Audio and
Text-Controlled Automatic Music Mixing via CLAP Embeddings
– extensions of this project still ongoing
- Make Spoken Language Models
Sing: A Preliminary Study using Text-Aligned Tokenization
– extensions of this project still ongoing
- Rap2Beat: Generating Drum
Grooves from Cappella Rap Vocals
- Playability-Constrained
Midi-to-Tab Transcription for Guitar
- Critic-Guided Music
Generation via Test-Time Compute
- Simulated Phone Recording
Restoration
- Transition as Inpainting:
Context-Aware DJ Mixing with Latent Diffusion
– extensions of this project still ongoing
- Meow The Song
- MyGO2MUSIC: Generating Music
for Anime Memes
- Multi-Emotion Singing Voice
Synthesis via Emelodygen and Diffsinger
- Prompt-Driven Music
Generation with Diffusion Refinement
- Improving Large-Vocabulary
Chord Recognition via Fine-tuning and HCQT
- Dual-Stream Cover Song
Identification: Dynamically Fusing Semantic Embeddings with SOTA Audio
Models
- Enhancing Music Genre and
Emotion Classification Via Parameter-Efficient Fine-Tuning of Large Language
Models