- Deep Learning for Music Analysis and Generation, NTU
COMME5070, 2025 Fall
- Official
course website at NTU (access limited): https://cool.ntu.edu.tw/courses/52013
- Slides: https://github.com/affige/DeepMIR/
- Syllabus
- W1.
Introductions & fundamentals of musical audio
- W2.
Music classification & music foundation models
- W3.
Audio codec models & audio language models & audio captioning
- W4.
Audio effect modeling
- W5.
Transformer-based music generation
- W6.
Diffusion-based music generation
- W7.
Singing voice generation & song generation
- W8.
Differentiable DSP models and automatic mixing
- W9.
Fundamentals of symbolic music & symbolic MIDI generation
- W10.
Advanced symbolic MIDI generation
- W11.
Cover generation & MIDI-to-audio generation
- W13.
Miscellaneous topics (transcription, source separation, etc)
- W14.
Miscellaneous topics (transcription, source separation, etc)
- List of final projects presented
by the class students
- BLT: An IPA-Aware Agentic Framework for Controllable
Lyric Translation
- High-level Semantic Direction Discovery
- Style Transfer of Audio Effects with Differentiable
Signal Processing
- Unified Audio and Text-Controlled Automatic Music
Mixing via Clap Embeddings
- Make Spoken Language Models Sing: A Preliminary Study
using Text-Aligned Tokenization
- Rap2Beat: Generating Drum Grooves from Cappella Rap
Vocals
- Playability-Constrained Midi-to-Tab Transcription for
Guitar
- Critic-Guided Music Generation via Test-Time Compute
- Simulated Phone Recording Restoration
- Transition as Inpainting:
Context-Aware DJ Mixing with Latent Diffusion
- Meow The Song
- MyGO2MUSIC: Generating Music for Anime Memes
- Multi-Emotion Singing Voice Synthesis via Emelodygen and Diffsinger
- Prompt-Driven Music Generation with Diffusion
Refinement
- Improving Large-Vocabulary Chord Recognition via
Fine-tuning and HCQT
- Dual-Stream Cover Song Identification: Dynamically
Fusing Semantic Embeddings with SOTA Audio
Models
- Enhancing Music Genre and Emotion Classification Via
Parameter-Efficient Fine-Tuning of Large Language Models