Skip to main content

Showing 1–12 of 12 results for author: Nistal, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2601.01294  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

    Authors: Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas

    Abstract: We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic an… ▽ More

    Submitted 28 January, 2026; v1 submitted 3 January, 2026; originally announced January 2026.

    Comments: 5 pages, 2 figures, 3 tables

  2. arXiv:2506.11476  [pdf, ps, other

    cs.SD cs.LG eess.AS

    LiLAC: A Lightweight Latent ControlNet for Musical Audio Generation

    Authors: Tom Baker, Javier Nistal

    Abstract: Text-to-audio diffusion models produce high-quality and diverse music but many, if not most, of the SOTA models lack the fine-grained, time-varying controls essential for music production. ControlNet enables attaching external controls to a pre-trained generative model by cloning and fine-tuning its encoder on new conditionings. However, this approach incurs a large memory footprint and restricts… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted at ISMIR 2025

  3. arXiv:2503.06346  [pdf, other

    cs.SD eess.AS

    Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems

    Authors: Maarten Grachten, Javier Nistal

    Abstract: Generative systems of musical accompaniments are rapidly growing, yet there are no standardized metrics to evaluate how well generations align with the conditional audio prompt. We introduce a distribution-based measure called "Accompaniment Prompt Adherence" (APA), and validate it through objective experiments on synthetic data perturbations, and human listening tests. Results show that APA align… ▽ More

    Submitted 8 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: Accepted for publication at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

  4. arXiv:2411.18447  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

    Authors: Marco Pasini, Javier Nistal, Stefan Lattner, George Fazekas

    Abstract: Autoregressive models are typically applied to sequences of discrete tokens, but recent research indicates that generating sequences of continuous embeddings in an autoregressive manner is also feasible. However, such Continuous Autoregressive Models (CAMs) can suffer from a decline in generation quality over extended sequences due to error accumulation during inference. We introduce a novel metho… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024 - Audio Imagination Workshop

  5. arXiv:2410.23005  [pdf, ps, other

    cs.SD eess.AS

    Improving Musical Accompaniment Co-creation via Diffusion Transformers

    Authors: Javier Nistal, Marco Pasini, Stefan Lattner

    Abstract: Building upon Diff-A-Riff, a latent diffusion model for musical instrument accompaniment generation, we present a series of improvements targeting quality, diversity, inference speed, and text-driven control. First, we upgrade the underlying autoencoder to a stereo-capable model with superior fidelity and replace the latent U-Net with a Diffusion Transformer. Additionally, we refine text prompting… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: 5 pages; 1 table

  6. arXiv:2406.08384  [pdf, other

    cs.SD cs.AI eess.AS

    Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models

    Authors: Javier Nistal, Marco Pasini, Cyran Aouameur, Maarten Grachten, Stefan Lattner

    Abstract: Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we int… ▽ More

    Submitted 30 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 2 figures, 3 tables

    Journal ref: Proc. of the 25th International Society for Music Information Retrieval, 2024

  7. Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks

    Authors: Stefan Lattner, Javier Nistal

    Abstract: Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep learning techniques. However, only a few works tackle the… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 21 pages, 5 figures, published in MDPI Electronics Special Issue "Machine Learning Applied to Music/Audio Signal Processing"

    Journal ref: MDPI Electronics 2021, 10, 1349

  8. arXiv:2206.14723  [pdf, other

    cs.SD cs.LG eess.AS

    DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks

    Authors: Javier Nistal, Cyran Aouameur, Ithan Velarde, Stefan Lattner

    Abstract: In contemporary popular music production, drum sound design is commonly performed by cumbersome browsing and processing of pre-recorded samples in sound libraries. One can also use specialized synthesis hardware, typically controlled through low-level, musically meaningless parameters. Today, the field of Deep Learning offers methods to control the synthesis process via learned high-level features… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit https://cslmusicteam.sony.fr/drumgan-vst/

  9. arXiv:2108.01216  [pdf, other

    cs.SD eess.AS

    DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domai… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 2 tables, accepted to ISMIR2021

    Journal ref: 22nd International Society for Music Information Retrieval (ISMIR 2021)

  10. arXiv:2105.01531  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

    Authors: Javier Nistal, Cyran Aouameur, Stefan Lattner, Gaël Richard

    Abstract: Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio… ▽ More

    Submitted 30 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure, 1 table; accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

    Journal ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

  11. arXiv:2008.12073  [pdf, other

    eess.AS cs.SD

    DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks

    Authors: J. Nistal, S. Lattner, G. Richard

    Abstract: Synthetic creation of drum sounds (e.g., in drum machines) is commonly performed using analog or digital synthesis, allowing a musician to sculpt the desired timbre modifying various parameters. Typically, such parameters control low-level features of the sound and often have no musical meaning or perceptual correspondence. With the rise of Deep Learning, data-driven processing of audio emerges as… ▽ More

    Submitted 28 June, 2022; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st International Society for Music Information Retrieval (ISMIR2020)

  12. arXiv:2006.09266  [pdf, other

    eess.AS cs.SD

    Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a full… ▽ More

    Submitted 17 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 5 pages, 1 figure, 5 tables, to be published in European Signal Processing Conference (EUSIPCO)