Skip to main content

Showing 1–40 of 40 results for author: Lattner, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2601.01294  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

    Authors: Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas

    Abstract: We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic an… ▽ More

    Submitted 28 January, 2026; v1 submitted 3 January, 2026; originally announced January 2026.

    Comments: 5 pages, 2 figures, 3 tables

  2. arXiv:2511.05350  [pdf, ps, other

    cs.SD cs.AI

    Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders

    Authors: Mathias Rose Bjare, Giorgia Cantisani, Marco Pasini, Stefan Lattner, Gerhard Widmer

    Abstract: We argue that training autoencoders to reconstruct inputs from noised versions of their encodings, when combined with perceptual losses, yields encodings that are structured according to a perceptual hierarchy. We demonstrate the emergence of this hierarchical structure by showing that, after training an audio autoencoder in this manner, perceptually salient information is captured in coarser repr… ▽ More

    Submitted 10 November, 2025; v1 submitted 7 November, 2025; originally announced November 2025.

    Comments: Accepted at NeurIPS 2025 - AI for Music Workshop, 11 pages, 5 figures, 1 table

  3. arXiv:2509.09836  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Efficiently representing audio signals in a compressed latent space is critical for latent generative modelling. However, existing autoencoders often force a choice between continuous embeddings and discrete tokens. Furthermore, achieving high compression ratios while maintaining audio fidelity remains a challenge. We introduce CoDiCodec, a novel audio autoencoder that overcomes these limitations… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: Accepted to ISMIR 2025

  4. arXiv:2508.05306  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces

    Authors: Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer

    Abstract: Recently, the information content (IC) of predictions from a Generative Infinite-Vocabulary Transformer (GIVT) has been used to model musical expectancy and surprisal in audio. We investigate the effectiveness of such modelling using IC calculated with autoregressive diffusion models (ADMs). We empirically show that IC estimates of models based on two different diffusion ordinary differential equa… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: 9 pages, 1 figure, 5 tables. Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR), Daejeon, South Korea, 2025 2025

  5. arXiv:2508.01488  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective

    Authors: Alain Riou, Bernardo Torres, Ben Hayes, Stefan Lattner, Gaëtan Hadjeres, Gaël Richard, Geoffroy Peeters

    Abstract: In this paper, we introduce PESTO, a self-supervised learning approach for single-pitch estimation using a Siamese architecture. Our model processes individual frames of a Variable-$Q$ Transform (VQT) and predicts pitch distributions. The neural network is designed to be equivariant to translations, notably thanks to a Toeplitz fully-connected layer. In addition, we construct pitch-shifted pairs b… ▽ More

    Submitted 27 October, 2025; v1 submitted 2 August, 2025; originally announced August 2025.

    Journal ref: Transactions of the International Society for Music Information Retrieval, 8(1): 334-352 (2025)

  6. arXiv:2507.07764  [pdf, ps, other

    cs.SD eess.AS

    Assessing the Alignment of Audio Representations with Timbre Similarity Ratings

    Authors: Haokun Tian, Stefan Lattner, Charalampos Saitis

    Abstract: Psychoacoustical so-called "timbre spaces" map perceptual similarity ratings of instrument sounds onto low-dimensional embeddings via multidimensional scaling, but suffer from scalability issues and are incapable of generalization. Recent results from audio (music and speech) quality assessment as well as image similarity have shown that deep learning is able to produce embeddings that align well… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted to ISMIR 2025

  7. arXiv:2501.17578  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Efficiently compressing high-dimensional audio signals into a compact and informative latent space is crucial for various tasks, including generative modeling and music information retrieval (MIR). Existing audio autoencoders, however, often struggle to achieve high compression ratios while preserving audio fidelity and facilitating efficient downstream applications. We introduce Music2Latent2, a… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025

  8. arXiv:2501.12796  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Hybrid Losses for Hierarchical Embedding Learning

    Authors: Haokun Tian, Stefan Lattner, Brian McFee, Charalampos Saitis

    Abstract: In traditional supervised learning, the cross-entropy loss treats all incorrect predictions equally, ignoring the relevance or proximity of wrong labels to the correct answer. By leveraging a tree hierarchy for fine-grained labels, we investigate hybrid losses, such as generalised triplet and cross-entropy losses, to enforce similarity between labels within a multi-task learning framework. We prop… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025

  9. arXiv:2501.07474  [pdf, other

    cs.SD cs.AI eess.AS

    Estimating Musical Surprisal in Audio

    Authors: Mathias Rose Bjare, Giorgia Cantisani, Stefan Lattner, Gerhard Widmer

    Abstract: In modeling musical surprisal expectancy with computational methods, it has been proposed to use the information content (IC) of one-step predictions from an autoregressive model as a proxy for surprisal in symbolic music. With an appropriately chosen model, the IC of musical events has been shown to correlate with human perception of surprise and complexity aspects, including tonal and rhythmic c… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: 5 pages, 2 figures, 1 table. Accepted at the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India

  10. arXiv:2411.19806  [pdf, other

    cs.SD cs.AI eess.AS

    Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures

    Authors: Alain Riou, Antonin Gagneré, Gaëtan Hadjeres, Stefan Lattner, Geoffroy Peeters

    Abstract: In this paper, we tackle the task of musical stem retrieval. Given a musical mix, it consists in retrieving a stem that would fit with it, i.e., that would sound pleasant if played together. To do so, we introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context and predict latent rep… ▽ More

    Submitted 24 February, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: Accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)

  11. arXiv:2411.18447  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

    Authors: Marco Pasini, Javier Nistal, Stefan Lattner, George Fazekas

    Abstract: Autoregressive models are typically applied to sequences of discrete tokens, but recent research indicates that generating sequences of continuous embeddings in an autoregressive manner is also feasible. However, such Continuous Autoregressive Models (CAMs) can suffer from a decline in generation quality over extended sequences due to error accumulation during inference. We introduce a novel metho… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024 - Audio Imagination Workshop

  12. arXiv:2410.23005  [pdf, ps, other

    cs.SD eess.AS

    Improving Musical Accompaniment Co-creation via Diffusion Transformers

    Authors: Javier Nistal, Marco Pasini, Stefan Lattner

    Abstract: Building upon Diff-A-Riff, a latent diffusion model for musical instrument accompaniment generation, we present a series of improvements targeting quality, diversity, inference speed, and text-driven control. First, we upgrade the underlying autoencoder to a stereo-capable model with superior fidelity and replace the latent U-Net with a Diffusion Transformer. Additionally, we refine text prompting… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: 5 pages; 1 table

  13. The evolution of inharmonicity and noisiness in contemporary popular music

    Authors: Emmanuel Deruty, David Meredith, Stefan Lattner

    Abstract: Much of Western classical music relies on instruments based on acoustic resonance, which produce harmonic or quasi-harmonic sounds. In contrast, since the mid-twentieth century, popular music has increasingly been produced in recording studios, where it is not bound by the constraints of harmonic sounds. In this study, we use modified MPEG-7 features to explore and characterise the evolution of no… ▽ More

    Submitted 6 December, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 41 pages, 23 figures

    MSC Class: 68T05; 42C40 ACM Class: I.5.4; H.5.5

    Journal ref: Journal of New Music Research, 1-28 (2024)

  14. arXiv:2408.06500  [pdf, other

    cs.SD cs.LG eess.AS

    Music2Latent: Consistency Autoencoders for Latent Audio Compression

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Efficient audio representations in a compressed continuous latent space are critical for generative audio modeling and Music Information Retrieval (MIR) tasks. However, some existing audio autoencoders have limitations, such as multi-stage training procedures, slow iterative sampling, or low reconstruction quality. We introduce Music2Latent, an audio autoencoder that overcomes these limitations by… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ISMIR 2024

  15. arXiv:2408.06022  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Controlling Surprisal in Music Generation via Information Content Curve Matching

    Authors: Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer

    Abstract: In recent years, the quality and public interest in music generation systems have grown, encouraging research into various ways to control these systems. We propose a novel method for controlling surprisal in music generation using sequence models. To achieve this goal, we define a metric called Instantaneous Information Content (IIC). The IIC serves as a proxy function for the perceived musical s… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, 2 tables, accepted at the 25th Int. Society for Music Information Retrieval Conf., San Francisco, USA, 2024

  16. arXiv:2408.02514  [pdf, other

    cs.SD cs.LG eess.AS

    Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Michael Anslow, Geoffroy Peeters

    Abstract: This paper explores the automated process of determining stem compatibility by identifying audio recordings of single instruments that blend well with a given musical context. To tackle this challenge, we present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset using a self-supervised learning approach. Our model comprises two networks: an encode… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Proceedings of the 25th International Society for Music Information Retrieval Conference, ISMIR 2024

  17. arXiv:2406.08384  [pdf, other

    cs.SD cs.AI eess.AS

    Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models

    Authors: Javier Nistal, Marco Pasini, Cyran Aouameur, Maarten Grachten, Stefan Lattner

    Abstract: Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we int… ▽ More

    Submitted 30 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 2 figures, 3 tables

    Journal ref: Proc. of the 25th International Society for Music Information Retrieval, 2024

  18. arXiv:2405.08679  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the cont… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

  19. arXiv:2402.01412  [pdf, other

    cs.SD cs.LG eess.AS

    Bass Accompaniment Generation via Latent Diffusion

    Authors: Marco Pasini, Maarten Grachten, Stefan Lattner

    Abstract: The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent dif… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: ICASSP 2024

  20. arXiv:2401.05064  [pdf, other

    cs.SD cs.LG eess.AS

    Singer Identity Representation Learning using Self-Supervised Techniques

    Authors: Bernardo Torres, Stefan Lattner, Gaël Richard

    Abstract: Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted at the ISMIR conference, Milan, Italy, 2023

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

  21. arXiv:2311.13058  [pdf, other

    cs.SD eess.AS

    Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Music source separation is focused on extracting distinct sonic elements from composite tracks. Historically, many methods have been grounded in supervised learning, necessitating labeled data, which is occasionally constrained in its diversity. More recent methods have delved into N-shot techniques that utilize one or more audio samples to aid in the separation. However, a challenge with some of… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 4 pages, 2 figures, 1 table; Accepted at the 37th Conference on Neural Information Processing Systems (2023), Machine Learning for Audio Workshop

  22. PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: In this paper, we address the problem of pitch estimation using Self Supervised Learning (SSL). The SSL paradigm we use is equivariance to pitch transposition, which enables our model to accurately perform pitch estimation on monophonic audio after being trained only on a small unlabeled dataset. We use a lightweight ($<$ 30k parameters) Siamese neural network that takes as inputs two different pi… ▽ More

    Submitted 27 October, 2025; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Best Paper Award of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

    Journal ref: ISMIR 2023: 535-544

  23. arXiv:2308.09454  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Sampling Techniques for Generating Melodies with a Transformer Language Model

    Authors: Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer

    Abstract: Research in natural language processing has demonstrated that the quality of generations from trained autoregressive language models is significantly influenced by the used sampling strategy. In this study, we investigate the impact of different sampling techniques on musical qualities such as diversity and structure. To accomplish this, we train a high-capacity transformer model on a vast collect… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 7 pages, 5 figures, 1 table, accepted at the 24th Int. Society for Music Information Retrieval Conf., Milan, Italy, 2023

  24. arXiv:2211.13016  [pdf, other

    cs.SD cs.AI eess.AS

    On the Typicality of Musical Sequences

    Authors: Mathias Rose Bjare, Stefan Lattner

    Abstract: It has been shown in a recent publication that words in human-produced English language tend to have an information content close to the conditional entropy. In this paper, we show that the same is true for events in human-produced monophonic musical sequences. We also show how "typical sampling" influences the distribution of information around the entropy for single events and sequences.

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 2 pages, 1 figure, Accepted at the Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022

  25. arXiv:2208.01141  [pdf, other

    cs.SD cs.LG eess.AS

    SampleMatch: Drum Sample Retrieval by Musical Context

    Authors: Stefan Lattner

    Abstract: Modern digital music production typically involves combining numerous acoustic elements to compile a piece of music. Important types of such elements are drum samples, which determine the characteristics of the percussive components of the piece. Artists must use their aesthetic judgement to assess whether a given drum sample fits the current musical context. However, selecting drum samples from a… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: 8 pages, 3 figures, 1 table; Accepted at the ISMIR conference, Bengaluru, India, 2022

  26. Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks

    Authors: Stefan Lattner, Javier Nistal

    Abstract: Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep learning techniques. However, only a few works tackle the… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 21 pages, 5 figures, published in MDPI Electronics Special Issue "Machine Learning Applied to Music/Audio Signal Processing"

    Journal ref: MDPI Electronics 2021, 10, 1349

  27. arXiv:2206.14723  [pdf, other

    cs.SD cs.LG eess.AS

    DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks

    Authors: Javier Nistal, Cyran Aouameur, Ithan Velarde, Stefan Lattner

    Abstract: In contemporary popular music production, drum sound design is commonly performed by cumbersome browsing and processing of pre-recorded samples in sound libraries. One can also use specialized synthesis hardware, typically controlled through low-level, musically meaningless parameters. Today, the field of Deep Learning offers methods to control the synthesis process via learned high-level features… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit https://cslmusicteam.sony.fr/drumgan-vst/

  28. arXiv:2108.01216  [pdf, other

    cs.SD eess.AS

    DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domai… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 2 tables, accepted to ISMIR2021

    Journal ref: 22nd International Society for Music Information Retrieval (ISMIR 2021)

  29. arXiv:2105.01531  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

    Authors: Javier Nistal, Cyran Aouameur, Stefan Lattner, Gaël Richard

    Abstract: Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio… ▽ More

    Submitted 30 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure, 1 table; accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

    Journal ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

  30. arXiv:2008.12073  [pdf, other

    eess.AS cs.SD

    DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks

    Authors: J. Nistal, S. Lattner, G. Richard

    Abstract: Synthetic creation of drum sounds (e.g., in drum machines) is commonly performed using analog or digital synthesis, allowing a musician to sculpt the desired timbre modifying various parameters. Typically, such parameters control low-level features of the sound and often have no musical meaning or perceptual correspondence. With the rise of Deep Learning, data-driven processing of audio emerges as… ▽ More

    Submitted 28 June, 2022; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st International Society for Music Information Retrieval (ISMIR2020)

  31. arXiv:2006.09266  [pdf, other

    eess.AS cs.SD

    Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a full… ▽ More

    Submitted 17 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 5 pages, 1 figure, 5 tables, to be published in European Signal Processing Conference (EUSIPCO)

  32. arXiv:2001.01720  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Modeling Musical Structure with Artificial Neural Networks

    Authors: Stefan Lattner

    Abstract: In recent years, artificial neural networks (ANNs) have become a universal tool for tackling real-world problems. ANNs have also shown great success in music-related tasks including music summarization and classification, similarity estimation, computer-aided or autonomous composition, and automatic music analysis. As structure is a fundamental characteristic of Western music, it plays a role in a… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 152 pages, 28 figures, 10 tables. PhD thesis, Johannes Kepler University Linz, October 2019. Includes results from https://www.ijcai.org/Proceedings/15/Papers/348.pdf, arXiv:1612.04742, arXiv:1708.05325, arXiv:1806.08236, and arXiv:1806.08686 (see Section 1.2 for detailed information)

  33. arXiv:1908.00948  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction

    Authors: Stefan Lattner, Maarten Grachten

    Abstract: Spurred by the potential of deep learning, computational music generation has gained renewed academic interest. A crucial issue in music generation is that of user control, especially in scenarios where the music generation process is conditioned on existing musical material. Here we propose a model for conditional kick drum track generation that takes existing musical material as input, in additi… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: Paper accepted at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019), New Paltz, New York, U.S.A., October 20-23; 6 pages, 3 figures, 1 table

  34. arXiv:1907.05982  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Learning Complex Basis Functions for Invariant Representations of Audio

    Authors: Stefan Lattner, Monika Dörfler, Andreas Arzt

    Abstract: Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. In music information retrieval (MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called C… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: Paper accepted at the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8; 8 pages, 4 figures, 4 tables

  35. arXiv:1807.07278  [pdf, other

    cs.SD cs.MM eess.AS

    Audio-to-Score Alignment using Transposition-invariant Features

    Authors: Andreas Arzt, Stefan Lattner

    Abstract: Audio-to-score alignment is an important pre-processing step for in-depth analysis of classical music. In this paper, we apply novel transposition-invariant audio features to this task. These low-dimensional features represent local pitch intervals and are learned in an unsupervised fashion by a gated autoencoder. Our results show that the proposed features are indeed fully transposition-invariant… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Comments: 19th International Society for Music Information Retrieval Conference, Paris, France, 2018

  36. arXiv:1806.08686  [pdf, other

    cs.SD cs.AI eess.AS

    A Predictive Model for Music Based on Learned Interval Representations

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: Connectionist sequence models (e.g., RNNs) applied to musical sequences suffer from two known problems: First, they have strictly "absolute pitch perception". Therefore, they fail to generalize over musical concepts which are commonly perceived in terms of relative distances between pitches (e.g., melodies, scale types, modes, cadences, or chord types). Second, they fall short of capturing the con… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 3 figures

  37. arXiv:1806.08236  [pdf, other

    cs.SD cs.LG eess.AS

    Learning Transposition-Invariant Interval Features from Symbolic Music and Audio

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: Many music theoretical constructs (such as scale types, modes, cadences, and chord types) are defined in terms of pitch intervals---relative distances between pitches. Therefore, when computer models are employed in music tasks, it can be useful to operate on interval representations rather than on the raw musical surface. Moreover, interval representations are transposition-invariant, valuable fo… ▽ More

    Submitted 4 February, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

    Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 5 figures

  38. arXiv:1708.05325  [pdf, other

    cs.SD cs.AI cs.LG

    Learning Musical Relations using Gated Autoencoders

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: Music is usually highly structured and it is still an open question how to design models which can successfully learn to recognize and represent musical structure. A fundamental problem is that structurally related patterns can have very distinct appearances, because the structural relationships are often based on transformations of musical material, like chromatic or diatonic transposition, inver… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: In Proceedings of the 2nd Conference on Computer Simulation of Musical Creativity (CSMC 2017)

  39. arXiv:1707.01357  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Improving Content-Invariance in Gated Autoencoders for 2D and 3D Object Rotation

    Authors: Stefan Lattner, Maarten Grachten

    Abstract: Content-invariance in mapping codes learned by GAEs is a useful feature for various relation learning tasks. In this paper we show that the content-invariance of mapping codes for images of 2D and 3D rotated objects can be substantially improved by extending the standard GAE loss (symmetric reconstruction error) with a regularization term that penalizes the symmetric cross-reconstruction error. Th… ▽ More

    Submitted 5 July, 2017; originally announced July 2017.

    Comments: 10 pages

  40. arXiv:1612.04742  [pdf, other

    cs.SD cs.AI cs.NE

    Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimisation to provide further control over the generation process. Among other things, this allows for the use of a "template" piece, from which some structural properties can be extracted… ▽ More

    Submitted 14 April, 2018; v1 submitted 14 December, 2016; originally announced December 2016.

    Comments: 31 pages, 11 figures

    Journal ref: Journal of Creative Music Systems, Volume 2, Issue 1, March 2018