Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Authors: Nick Stracke, Kolja Bauer, Stefan Andreas Baumann, Miguel Angel Bautista, Josh Susskind, Björn Ommer

Abstract: Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from… ▽ More Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models. This enables efficient generation of long, realistic motions that fulfill goals specified via text prompts or spatial pokes. To achieve this, we first learn a highly compressed motion embedding with a temporal compression factor of 64x. In this space, we train a conditional flow-matching model to generate motion latents conditioned on task descriptions. The resulting motion distributions outperform those of both state-of-the-art video models and specialized task-specific approaches. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: for the project page and code, view https://compvis.github.io/long-term-motion

arXiv:2511.20462 [pdf, ps, other]

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Authors: Jiatao Gu, Ying Shen, Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Angel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai

Abstract: Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit thi… ▽ More Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation. Building upon the recently proposed STARFlow, STARFlow-V operates in the spatiotemporal latent space with a global-local architecture which restricts causal dependencies to a global latent space while preserving rich local within-frame interactions. This eases error accumulation over time, a common pitfall of standard autoregressive diffusion model generation. Additionally, we propose flow-score matching, which equips the model with a light-weight causal denoiser to improve the video generation consistency in an autoregressive fashion. To improve the sampling efficiency, STARFlow-V employs a video-aware Jacobi iteration scheme that recasts inner updates as parallelizable iterations without breaking causality. Thanks to the invertible structure, the same model can natively support text-to-video, image-to-video as well as video-to-video generation tasks. Empirically, STARFlow-V achieves strong visual fidelity and temporal consistency with practical sampling throughput relative to diffusion-based baselines. These results present the first evidence, to our knowledge, that NFs are capable of high-quality autoregressive video generation, establishing them as a promising research direction for building world models. Code and generated samples are available at https://github.com/apple/ml-starflow. △ Less

Submitted 25 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

Comments: 21 pages, 9 figures. Code and samples are available at https://github.com/apple/ml-starflow

arXiv:2510.14630 [pdf, ps, other]

Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

Authors: Ming Gui, Johannes Schusterbauer, Timy Phan, Felix Krause, Josh Susskind, Miguel Angel Bautista, Björn Ommer

Abstract: We introduce Representation Tokenizer (RepTok), a generative modeling framework that represents an image using a single continuous latent token obtained from self-supervised vision transformers. Building on a pre-trained SSL encoder, we fine-tune only the semantic token embedding and pair it with a generative decoder trained jointly using a standard flow matching objective. This adaptation enriche… ▽ More We introduce Representation Tokenizer (RepTok), a generative modeling framework that represents an image using a single continuous latent token obtained from self-supervised vision transformers. Building on a pre-trained SSL encoder, we fine-tune only the semantic token embedding and pair it with a generative decoder trained jointly using a standard flow matching objective. This adaptation enriches the token with low-level, reconstruction-relevant details, enabling faithful image reconstruction. To preserve the favorable geometry of the original SSL space, we add a cosine-similarity loss that regularizes the adapted token, ensuring the latent space remains smooth and suitable for generation. Our single-token formulation resolves spatial redundancies of 2D latent spaces and significantly reduces training costs. Despite its simplicity and efficiency, RepTok achieves competitive results on class-conditional ImageNet generation and naturally extends to text-to-image synthesis, reaching competitive zero-shot performance on MS-COCO under extremely limited training budgets. Our findings highlight the potential of fine-tuned SSL representations as compact and effective latent spaces for efficient generative modeling. △ Less

Submitted 16 October, 2025; originally announced October 2025.

Comments: Code: https://github.com/CompVis/RepTok

arXiv:2509.18480 [pdf, ps, other]

SimpleFold: Folding Proteins is Simpler than You Think

Authors: Yuyang Wang, Jiarui Lu, Navdeep Jaitly, Josh Susskind, Miguel Angel Bautista

Abstract: Protein folding models have achieved groundbreaking results typically via a combination of integrating domain knowledge into the architectural blocks and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a necessary condition to build performant models. In this paper, we i… ▽ More Protein folding models have achieved groundbreaking results typically via a combination of integrating domain knowledge into the architectural blocks and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a necessary condition to build performant models. In this paper, we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving triangular updates, explicit pair representations or multiple training objectives curated for this specific domain. Instead, SimpleFold employs standard transformer blocks with adaptive layers and is trained via a generative flow-matching objective with an additional structural term. We scale SimpleFold to 3B parameters and train it on approximately 9M distilled protein structures together with experimental PDB data. On standard folding benchmarks, SimpleFold-3B achieves competitive performance compared to state-of-the-art baselines, in addition SimpleFold demonstrates strong performance in ensemble prediction which is typically difficult for models trained via deterministic reconstruction objectives. Due to its general-purpose architecture, SimpleFold shows efficiency in deployment and inference on consumer-level hardware. SimpleFold challenges the reliance on complex domain-specific architectures designs in protein folding, opening up an alternative design space for future progress. △ Less

Submitted 9 December, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

Comments: 30 pages, 11 figures, 15 tables

arXiv:2509.15679 [pdf, ps, other]

doi 10.1142/S0219887825500707

On the classification of Jacobi curves and their conformal curvatures

Authors: A. Bautista, A. Ibort, J. Lafuente

Abstract: This paper describes the theory of Jacobi curves, a far reaching extension of the spaces of Jacobi fields along Riemannian geodesics, developed by Agrachev and Zelenko. Jacobi curves are curves in the Lagrangian Grassmannian of a symplectic space satisfying appropriate regularity conditions. It is shown that they are fully characterised in terms of a family of conformal symplectic invariant curvat… ▽ More This paper describes the theory of Jacobi curves, a far reaching extension of the spaces of Jacobi fields along Riemannian geodesics, developed by Agrachev and Zelenko. Jacobi curves are curves in the Lagrangian Grassmannian of a symplectic space satisfying appropriate regularity conditions. It is shown that they are fully characterised in terms of a family of conformal symplectic invariant curvatures. In addition to a new derivation of the Ricci curvature tensor of a Jacobi curve, a Cartan-like theory of Jacobi curves is presented that allows to associate to any admissible Jacobi curve a reduced normal Cartan matrix. A reconstruction theorem proving that an admissible Jacobi curve is characterised, up to conformal symplectic transformations, by a reduced normal Cartan matrix and a geometric parametrization is obtained. The theory of cycles is studied proving that they correspond to flat Jacobi curves. △ Less

Submitted 19 September, 2025; originally announced September 2025.

Comments: 33 pages, 1 figure

MSC Class: 53

Journal ref: Int. J. Geom. Methods Mod. Phys. Vol.22, No.9 (2025) 2550070

arXiv:2507.00425 [pdf, ps, other]

Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows

Authors: Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Yizhe Zhang, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Josh Susskind, Navdeep Jaitly

Abstract: Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and single-pass decoding, while central to their success, also inspires the exploration of a design space that could offer new axes of modeling flexibility. In this work, we explore an alternative paradigm, shifting language modeling from a discrete to… ▽ More Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and single-pass decoding, while central to their success, also inspires the exploration of a design space that could offer new axes of modeling flexibility. In this work, we explore an alternative paradigm, shifting language modeling from a discrete token space to a continuous latent space. We propose a novel framework TarFlowLM, that employs transformer-based autoregressive normalizing flows to model these continuous representations. This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context through stacked, alternating-direction autoregressive transformations, support block-wise generation with flexible token patch sizes, and facilitate a hierarchical multi-pass generation process. We further propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data, and demonstrate theoretical connections to conventional discrete autoregressive models. Extensive experiments on language modeling benchmarks demonstrate strong likelihood performance and highlight the flexible modeling capabilities inherent in our framework. △ Less

Submitted 1 July, 2025; originally announced July 2025.

arXiv:2506.06276 [pdf, ps, other]

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Authors: Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai

Abstract: We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlo… ▽ More We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: TLDR: We show for the first time that normalizing flows can be scaled for high-resolution and text-conditioned image synthesis

arXiv:2501.12198 [pdf, ps, other]

doi 10.1016/j.physa.2025.130379

Opinion dynamics in bounded confidence models with manipulative agents: Moving the Overton window

Authors: A. Bautista

Abstract: This paper focuses on the opinion dynamics under the influence of manipulative agents. This type of agents is characterized by the fact that their opinions follow a trajectory that does not respond to the dynamics of the model, although it does influence the rest of the normal agents. Simulation has been implemented to study how one manipulative group modifies the natural dynamics of some opinion… ▽ More This paper focuses on the opinion dynamics under the influence of manipulative agents. This type of agents is characterized by the fact that their opinions follow a trajectory that does not respond to the dynamics of the model, although it does influence the rest of the normal agents. Simulation has been implemented to study how one manipulative group modifies the natural dynamics of some opinion models of bounded confidence. It is studied what strategies based on the number of manipulative agents and their common opinion trajectory can be carried out by a manipulative group to influence normal agents and attract them to their opinions. In certain weighted models, some effects are observed in which normal agents move in the opposite direction to the manipulator group. Moreover, the conditions which ensure the influence of a manipulative group on a group of normal agents over time are also established for the Hegselmann-Krause model. △ Less

Submitted 27 January, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: 26 pages, 29 figures

Journal ref: Physica A: Statistical Mechanics and its Applications, Volume 660, 2025, 130379

arXiv:2501.03889 [pdf, other]

Cosmic-ray acceleration and escape from supernova remnant W44 as probed by Fermi-LAT and MAGIC

Authors: S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, K. Asano, A. Babi'c, A. Baquero, U. Barres de Almeida, J. A. Barrio, I. Batkovi'c, A. Bautista, J. Baxter, J. Becerra Gonz'alez, W. Bednarek, E. Bernardini, J. Bernete, A. Berti, J. Besenrieder , et al. (196 additional authors not shown)

Abstract: Context. The supernova remnant (SNR) W44 and its surroundings are a prime target for studying the acceleration of cosmic rays (CRs). Several previous studies established an extended gamma-ray emission that is set apart from the radio shell of W44. This emission is thought to originate from escaped high-energy CRs that interact with a surrounding dense molecular cloud complex. Aims. We present a de… ▽ More Context. The supernova remnant (SNR) W44 and its surroundings are a prime target for studying the acceleration of cosmic rays (CRs). Several previous studies established an extended gamma-ray emission that is set apart from the radio shell of W44. This emission is thought to originate from escaped high-energy CRs that interact with a surrounding dense molecular cloud complex. Aims. We present a detailed analysis of Fermi-LAT data with an emphasis on the spatial and spectral properties of W44 and its surroundings. We also report the results of the observations performed with the MAGIC telescopes of the northwestern region of W44. Finally, we present an interpretation model to explain the gamma-ray emission of the SNR and its surroundings. Methods. We first performed a detailed spatial analysis of 12 years of Fermi-LAT data at energies above 1 GeV, in order to exploit the better angular resolution, while we set a threshold of 100MeV for the spectral analysis. We performed a likelihood analysis of 174 hours of MAGIC data above 130 GeV using the spatial information obtained with Fermi-LAT. Results. The combined spectra of Fermi-LAT and MAGIC, extending from 100MeV to several TeV, were used to derive constraints on the escape of CRs. Using a time-dependent model to describe the particle acceleration and escape from the SNR, we show that the maximum energy of the accelerated particles has to be ' 40 GeV. However, our gamma-ray data suggest that a small number of lower-energy particles also needs to escape. We propose a novel model, the broken-shock scenario, to account for this effect and explain the gamma-ray emission. △ Less

Submitted 7 January, 2025; originally announced January 2025.

arXiv:2501.03831 [pdf, other]

Characterization of Markarian 421 during its most violent year: Multiwavelength variability and correlations

Authors: K. Abe, S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, K. Asano, D. Baack, A. Babić, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, J. Bernete, A. Berti , et al. (190 additional authors not shown)

Abstract: Mrk 421 was in its most active state around early 2010, which led to the highest TeV gamma-ray flux ever recorded from any active galactic nuclei. We aim to characterize the multiwavelength behavior during this exceptional year for Mrk 421, and evaluate whether it is consistent with the picture derived with data from other less exceptional years. We investigated the period from November 5, 2009, (… ▽ More Mrk 421 was in its most active state around early 2010, which led to the highest TeV gamma-ray flux ever recorded from any active galactic nuclei. We aim to characterize the multiwavelength behavior during this exceptional year for Mrk 421, and evaluate whether it is consistent with the picture derived with data from other less exceptional years. We investigated the period from November 5, 2009, (MJD 55140) until July 3, 2010, (MJD 55380) with extensive coverage from very-high-energy (VHE; E$\,>\,$100$\,$GeV) gamma rays to radio with MAGIC, VERITAS, Fermi-LAT, RXTE, Swift, GASP-WEBT, VLBA, and a variety of additional optical and radio telescopes. We investigated the variability and correlation behavior among different energy bands in great detail. We find the strongest variability in X-rays and VHE gamma rays, and PSDs compatible with power-law functions. We observe strong correlations between X-rays and VHE gamma rays. We also report a marginally significant positive correlation between high-energy (HE; E$\,>\,$100$\,$MeV) gamma rays and the ultraviolet band. We detected marginally significant correlations between the HE and VHE gamma rays, and between HE gamma rays and the X-ray, that disappear when the large flare in February 2010 is excluded from the correlation study. The activity of Mrk 421 also yielded the first ejection of features in the VLBA images of the jet of Mrk 421. Yet the large uncertainties in the ejection times of these radio features prevent us from firmly associating them to the specific flares recorded during the campaign. We also show that the collected multi-instrument data are consistent with a scenario where the emission is dominated by two regions, a compact and extended zone, which could be considered as a simplified implementation of an energy-stratified jet as suggested by recent IXPE observations. △ Less

Submitted 7 January, 2025; originally announced January 2025.

Comments: Accepted for publication in Astronomy & Astrophysics. Corresponding authors: Felix Schmuckermaier, David Paneque, Axel Arbet Engels

arXiv:2412.15836 [pdf, other]

doi 10.1051/0004-6361/202451378

Time-dependent modelling of short-term variability in the TeV-blazar VER J0521+211 during the major flare in 2020

Authors: MAGIC Collaboration, S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, M. Artero, K. Asano, D. Baack, A. Babić, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, J. Bernete , et al. (206 additional authors not shown)

Abstract: The BL Lacertae object VER J0521+211 underwent a notable flaring episode in February 2020. A short-term monitoring campaign, led by the MAGIC (Major Atmospheric Gamma Imaging Cherenkov) collaboration, covering a wide energy range from radio to very-high-energy (VHE, 100 GeV < E < 100 TeV) gamma rays was organised to study its evolution. These observations resulted in a consistent detection of the… ▽ More The BL Lacertae object VER J0521+211 underwent a notable flaring episode in February 2020. A short-term monitoring campaign, led by the MAGIC (Major Atmospheric Gamma Imaging Cherenkov) collaboration, covering a wide energy range from radio to very-high-energy (VHE, 100 GeV < E < 100 TeV) gamma rays was organised to study its evolution. These observations resulted in a consistent detection of the source over six consecutive nights in the VHE gamma-ray domain. Combining these nightly observations with an extensive set of multiwavelength data made modelling of the blazar's spectral energy distribution (SED) possible during the flare. This modelling was performed with a focus on two plausible emission mechanisms: i) a leptonic two-zone synchrotron-self-Compton scenario, and ii) a lepto-hadronic one-zone scenario. Both models effectively replicated the observed SED from radio to the VHE gamma-ray band. Furthermore, by introducing a set of evolving parameters, both models were successful in reproducing the evolution of the fluxes measured in different bands throughout the observing campaign. Notably, the lepto-hadronic model predicts enhanced photon and neutrino fluxes at ultra-high energies (E > 100 TeV). While the photon component, generated via decay of neutral pions, is not directly observable as it is subject to intense pair production (and therefore extinction) through interactions with the cosmic microwave background photons, neutrino detectors (e.g. IceCube) can probe the predicted neutrino component. Finally, the analysis of the gamma-ray spectra, as observed by MAGIC and the Fermi-LAT telescopes, yielded a conservative 95\% confidence upper limit of z \leq 0.244 for the redshift of this blazar. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: Accepted for publication in A&A

Journal ref: A&A 694, A308 (2025)

arXiv:2412.06329 [pdf, ps, other]

Normalizing Flows are Capable Generative Models

Authors: Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind

Abstract: Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly perfor… ▽ More Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model. We make our code available at https://github.com/apple/ml-tarflow. △ Less

Submitted 6 June, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: ICML 2025

arXiv:2412.03791 [pdf, ps, other]

INRFlow: Flow Matching for INRs in Ambient Space

Authors: Yuyang Wang, Anurag Ranjan, Josh Susskind, Miguel Angel Bautista

Abstract: Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on irregular or unstructured data like 3D point clouds or even protein structures. These models are commonly trained in two stages: first, a data compressor is trained, and in a subsequent training stage a flow matching generative model is trained in the latent space of the dat… ▽ More Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on irregular or unstructured data like 3D point clouds or even protein structures. These models are commonly trained in two stages: first, a data compressor is trained, and in a subsequent training stage a flow matching generative model is trained in the latent space of the data compressor. This two-stage paradigm sets obstacles for unifying models across data domains, as hand-crafted compressors architectures are used for different data modalities. To this end, we introduce INRFlow, a domain-agnostic approach to learn flow matching transformers directly in ambient space. Drawing inspiration from INRs, we introduce a conditionally independent point-wise training objective that enables INRFlow to make predictions continuously in coordinate space. Our empirical results demonstrate that INRFlow effectively handles different data modalities such as images, 3D point clouds and protein structure data, achieving strong performance in different domains and outperforming comparable approaches. INRFlow is a promising step towards domain-agnostic flow matching generative models that can be trivially adopted in different data domains. △ Less

Submitted 28 May, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 22 pages, 14 figures, 13 tables

arXiv:2412.01821 [pdf, other]

World-consistent Video Diffusion with Explicit 3D Modeling

Authors: Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista, Kevin Miao, Alexander Toshev, Joshua Susskind, Jiatao Gu

Abstract: Recent advancements in diffusion models have set new benchmarks in image and video generation, enabling realistic visual synthesis across single- and multi-frame contexts. However, these models still struggle with efficiently and explicitly generating 3D-consistent content. To address this, we propose World-consistent Video Diffusion (WVD), a novel framework that incorporates explicit 3D supervisi… ▽ More Recent advancements in diffusion models have set new benchmarks in image and video generation, enabling realistic visual synthesis across single- and multi-frame contexts. However, these models still struggle with efficiently and explicitly generating 3D-consistent content. To address this, we propose World-consistent Video Diffusion (WVD), a novel framework that incorporates explicit 3D supervision using XYZ images, which encode global 3D coordinates for each image pixel. More specifically, we train a diffusion transformer to learn the joint distribution of RGB and XYZ frames. This approach supports multi-task adaptability via a flexible inpainting strategy. For example, WVD can estimate XYZ frames from ground-truth RGB or generate novel RGB frames using XYZ projections along a specified camera trajectory. In doing so, WVD unifies tasks like single-image-to-3D generation, multi-view stereo, and camera-controlled video generation. Our approach demonstrates competitive performance across multiple benchmarks, providing a scalable solution for 3D-consistent video and image generation with a single pretrained model. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: 16 pages, 10 figures

arXiv:2409.18823 [pdf, other]

doi 10.1016/j.jheap.2024.09.011

Standardised formats and open-source analysis tools for the MAGIC telescopes data

Authors: S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, M. Artero, K. Asano, A. Babić, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, J. Bernete, A. Berti, J. Besenrieder , et al. (186 additional authors not shown)

Abstract: Instruments for gamma-ray astronomy at Very High Energies ($E>100\,{\rm GeV}$) have traditionally derived their scientific results through proprietary data and software. Data standardisation has become a prominent issue in this field both as a requirement for the dissemination of data from the next generation of gamma-ray observatories and as an effective solution to realise public data legacies o… ▽ More Instruments for gamma-ray astronomy at Very High Energies ($E>100\,{\rm GeV}$) have traditionally derived their scientific results through proprietary data and software. Data standardisation has become a prominent issue in this field both as a requirement for the dissemination of data from the next generation of gamma-ray observatories and as an effective solution to realise public data legacies of current-generation instruments. Specifications for a standardised gamma-ray data format have been proposed as a community effort and have already been successfully adopted by several instruments. We present the first production of standardised data from the Major Atmospheric Gamma-ray Imaging Cherenkov (MAGIC) telescopes. We converted $166\,{\rm h}$ of observations from different sources and validated their analysis with the open-source software Gammapy. We consider six data sets representing different scientific and technical analysis cases and compare the results obtained analysing the standardised data with open-source software against those produced with the MAGIC proprietary data and software. Aiming at a systematic production of MAGIC data in this standardised format, we also present the implementation of a database-driven pipeline automatically performing the MAGIC data reduction from the calibrated down to the standardised data level. In all the cases selected for the validation, we obtain results compatible with the MAGIC proprietary software, both for the manual and for the automatic data productions. Part of the validation data set is also made publicly available, thus representing the first large public release of MAGIC data. This effort and this first data release represent a technical milestone toward the realisation of a public MAGIC data legacy. △ Less

Submitted 7 October, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

Comments: Accepted for publication in the Journal of High Energy Astrophysics

Journal ref: Journal of High Energy Astrophysics, Volume 44, pp. 266-278, November 2024

arXiv:2406.07140 [pdf, other]

doi 10.1088/1475-7516/2024/07/044

Constraints on Lorentz invariance violation from the extraordinary Mrk 421 flare of 2014 using a novel analysis method

Authors: MAGIC Collaboration, S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, M. Artero, K. Asano, A. Babić, A. Baquero, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, J. Bernete , et al. (192 additional authors not shown)

Abstract: The Lorentz Invariance Violation (LIV), a proposed consequence of certain quantum gravity (QG) scenarios, could instigate an energy-dependent group velocity for ultra-relativistic particles. This energy dependence, although suppressed by the massive QG energy scale $E_\mathrm{QG}$, expected to be on the level of the Planck energy $1.22 \times 10^{19}$ GeV, is potentially detectable in astrophysica… ▽ More The Lorentz Invariance Violation (LIV), a proposed consequence of certain quantum gravity (QG) scenarios, could instigate an energy-dependent group velocity for ultra-relativistic particles. This energy dependence, although suppressed by the massive QG energy scale $E_\mathrm{QG}$, expected to be on the level of the Planck energy $1.22 \times 10^{19}$ GeV, is potentially detectable in astrophysical observations. In this scenario, the cosmological distances traversed by photons act as an amplifier for this effect. By leveraging the observation of a remarkable flare from the blazar Mrk\,421, recorded at energies above 100 GeV by the MAGIC telescopes on the night of April 25 to 26, 2014, we look for time delays scaling linearly and quadratically with the photon energies. Using for the first time in LIV studies a binned-likelihood approach we set constraints on the QG energy scale. For the linear scenario, we set $95\%$ lower limits $E_\mathrm{QG}>2.7\times10^{17}$ GeV for the subluminal case and $E_\mathrm{QG}> 3.6 \times10^{17}$ GeV for the superluminal case. For the quadratic scenario, the $95\%$ lower limits for the subluminal and superluminal cases are $E_\mathrm{QG}>2.6 \times10^{10}$ GeV and $E_\mathrm{QG}>2.5\times10^{10}$ GeV, respectively. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Journal ref: Journal of Cosmology and Astroparticle Physics 07 p.044 (2024)

arXiv:2405.07913 [pdf, other]

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models

Authors: Nick Stracke, Stefan Andreas Baumann, Joshua M. Susskind, Miguel Angel Bautista, Björn Ommer

Abstract: Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditio… ▽ More Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel conditional LoRA block that enables zero-shot control. LoRAdapter is an efficient, powerful, and architecture-agnostic approach to condition text-to-image diffusion models, which enables fine-grained control conditioning during generation and outperforms recent state-of-the-art approaches. △ Less

Submitted 8 October, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: for the project page and code, view https://compvis.github.io/LoRAdapter/

arXiv:2404.17623 [pdf]

doi 10.1051/0004-6361/202450497

Broadband Multi-wavelength Properties of M87 during the 2018 EHT Campaign including a Very High Energy Flaring Episode

Authors: J. C. Algaba, M. Balokovic, S. Chandra, W. Y. Cheong, Y. Z. Cui, F. D'Ammando, A. D. Falcone, N. M. Ford, M. Giroletti, C. Goddi, M. A. Gurwell, K. Hada, D. Haggard, S. Jorstad, A. Kaur, T. Kawashima, S. Kerby, J. Y. Kim, M. Kino, E. V. Kravchenko, S. S. Lee, R. S. Lu, S. Markoff, J. Michail, J. Neilsen , et al. (721 additional authors not shown)

Abstract: The nearby elliptical galaxy M87 contains one of the only two supermassive black holes whose emission surrounding the event horizon has been imaged by the Event Horizon Telescope (EHT). In 2018, more than two dozen multi-wavelength (MWL) facilities (from radio to gamma-ray energies) took part in the second M87 EHT campaign. The goal of this extensive MWL campaign was to better understand the physi… ▽ More The nearby elliptical galaxy M87 contains one of the only two supermassive black holes whose emission surrounding the event horizon has been imaged by the Event Horizon Telescope (EHT). In 2018, more than two dozen multi-wavelength (MWL) facilities (from radio to gamma-ray energies) took part in the second M87 EHT campaign. The goal of this extensive MWL campaign was to better understand the physics of the accreting black hole M87*, the relationship between the inflow and inner jets, and the high-energy particle acceleration. Understanding the complex astrophysics is also a necessary first step towards performing further tests of general relativity. The MWL campaign took place in April 2018, overlapping with the EHT M87* observations. We present a new, contemporaneous spectral energy distribution (SED) ranging from radio to very high energy (VHE) gamma-rays, as well as details of the individual observations and light curves. We also conduct phenomenological modelling to investigate the basic source properties. We present the first VHE gamma-ray flare from M87 detected since 2010. The flux above 350 GeV has more than doubled within a period of about 36 hours. We find that the X-ray flux is enhanced by about a factor of two compared to 2017, while the radio and millimetre core fluxes are consistent between 2017 and 2018. We detect evidence for a monotonically increasing jet position angle that corresponds to variations in the bright spot of the EHT image. Our results show the value of continued MWL monitoring together with precision imaging for addressing the origins of high-energy particle acceleration. While we cannot currently pinpoint the precise location where such acceleration takes place, the new VHE gamma-ray flare already presents a challenge to simple one-zone leptonic emission model approaches, and emphasises the need for combined image and spectral modelling. △ Less

Submitted 5 December, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 46 pages, 23 figures, accepted by Astronomy & Astrophysics on August. 29, 2024

Journal ref: A&A 692, A140 (2024)

arXiv:2404.06878 [pdf, other]

PROJECT-J: JWST observations of HH46~IRS and its outflow. Overview and first results

Authors: B. Nisini, M. G. Navarro, T. Giannini, S. Antoniucci, P. J. Kavanagh, P. Hartigan, F. Bacciotti, A. Caratti o Garatti, A. Noriega Crespo, E. van Dishoek, E. Whelan, H. G. Arce, S. Cabrit, D. Coffey, D. Fedele, J. Eisloeffel, M. E. Palumbo, L. Podio, T. P. Ray, M. Schultze, R. G. Urso, J. M. Alcala', M. A. Bautista, C. Codella, T. G. Greene , et al. (1 additional authors not shown)

Abstract: We present the first results of the JWST program PROJECT-J (PROtostellar JEts Cradle Tested with JWST ), designed to study the Class I source HH46 IRS and its outflow through NIRSpec and MIRI spectroscopy (1.66 to 28 micron). The data provide line-images (~ 6.6" in length with NIRSpec, and up to 20" with MIRI) revealing unprecedented details within the jet, the molecular outflow and the cavity. We… ▽ More We present the first results of the JWST program PROJECT-J (PROtostellar JEts Cradle Tested with JWST ), designed to study the Class I source HH46 IRS and its outflow through NIRSpec and MIRI spectroscopy (1.66 to 28 micron). The data provide line-images (~ 6.6" in length with NIRSpec, and up to 20" with MIRI) revealing unprecedented details within the jet, the molecular outflow and the cavity. We detect, for the first time, the red-shifted jet within ~ 90 au from the source. Dozens of shock-excited forbidden lines are observed, including highly ionized species such as [Ne III] 15.5 micron, suggesting that the gas is excited by high velocity (> 80 km/s) shocks in a relatively high density medium. Images of H2 lines at different excitations outline a complex molecular flow, where a bright cavity, molecular shells, and a jet-driven bow-shock interact with and are shaped by the ambient conditions. Additional NIRCam 2 micron images resolve the HH46 IRS ~ 110 au binary system and suggest that the large asymmetries observed between the jet and the H2 wide angle emission could be due to two separate outflows being driven by the two sources. The spectra of the unresolved binary show deep ice bands and plenty of gaseous lines in absorption, likely originating in a cold envelope or disk. In conclusion, JWST has unraveled for the first time the origin of the HH46 IRS complex outflow demonstrating its capability to investigate embedded regions around young stars, which remain elusive even at near-IR wavelengths. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 28 pages, 15 figures, Accepted for publication on The Astrophysical Journal (9 April 2024)

arXiv:2402.04755 [pdf, other]

doi 10.1093/mnras/stae697

Performance and first measurements of the MAGIC Stellar Intensity Interferometer

Authors: MAGIC Collaboration, S. Abe, J. Abhir, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, M. Artero, K. Asano, A. Babić, A. Baquero, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, E. Bernardini, M. Bernardos, J. Bernete, A. Berti , et al. (195 additional authors not shown)

Abstract: In recent years, a new generation of optical intensity interferometers has emerged, leveraging the existing infrastructure of Imaging Atmospheric Cherenkov Telescopes (IACTs). The MAGIC telescopes host the MAGIC-SII system (Stellar Intensity Interferometer), implemented to investigate the feasibility and potential of this technique on IACTs. After the first successful measurements in 2019, the sys… ▽ More In recent years, a new generation of optical intensity interferometers has emerged, leveraging the existing infrastructure of Imaging Atmospheric Cherenkov Telescopes (IACTs). The MAGIC telescopes host the MAGIC-SII system (Stellar Intensity Interferometer), implemented to investigate the feasibility and potential of this technique on IACTs. After the first successful measurements in 2019, the system was upgraded and now features a real-time, dead-time-free, 4-channel, GPU-based correlator. These hardware modifications allow seamless transitions between MAGIC's standard very-high-energy gamma-ray observations and optical interferometry measurements within seconds. We establish the feasibility and potential of employing IACTs as competitive optical Intensity Interferometers with minimal hardware adjustments. The measurement of a total of 22 stellar diameters are reported, 9 corresponding to reference stars with previous comparable measurements, and 13 with no prior measurements. A prospective implementation involving telescopes from the forthcoming Cherenkov Telescope Array Observatory's northern hemisphere array, such as the first prototype of its Large-Sized Telescopes, LST-1, is technically viable. This integration would significantly enhance the sensitivity of the current system and broaden the UV-plane coverage. This advancement would enable the system to achieve competitive sensitivity with the current generation of long-baseline optical interferometers over blue wavelengths. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 18 pages, 13 figures, submitted to MNRAS

arXiv:2401.08560 [pdf, ps, other]

doi 10.1051/0004-6361/202348709

Insights into the broad-band emission of the TeV blazar Mrk 501 during the first X-ray polarization measurements

Authors: S. Abe, J. Abhir, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, K. Asano, A. Babić, A. Baquero, U. Barres de Almeida, J. A. Barrio, I. Batković, A. Bautista, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, M. Bernardos, J. Bernete, A. Berti, J. Besenrieder , et al. (239 additional authors not shown)

Abstract: We present the first multi-wavelength study of Mrk 501 including very-high-energy (VHE) gamma-ray observations simultaneous to X-ray polarization measurements from the Imaging X-ray Polarimetry Explorer (IXPE). We use radio-to-VHE data from a multi-wavelength campaign organized between 2022-03-01 and 2022-07-19. The observations were performed by MAGIC, Fermi-LAT, NuSTAR, Swift (XRT and UVOT), and… ▽ More We present the first multi-wavelength study of Mrk 501 including very-high-energy (VHE) gamma-ray observations simultaneous to X-ray polarization measurements from the Imaging X-ray Polarimetry Explorer (IXPE). We use radio-to-VHE data from a multi-wavelength campaign organized between 2022-03-01 and 2022-07-19. The observations were performed by MAGIC, Fermi-LAT, NuSTAR, Swift (XRT and UVOT), and several instruments covering the optical and radio bands. During the IXPE pointings, the VHE state is close to the average behavior with a 0.2-1 TeV flux of 20%-50% the emission of the Crab Nebula. Despite the average VHE activity, an extreme X-ray behavior is measured for the first two IXPE pointings in March 2022 with a synchrotron peak frequency >1 keV. For the third IXPE pointing in July 2022, the synchrotron peak shifts towards lower energies and the optical/X-ray polarization degrees drop. The X-ray polarization is systematically higher than at lower energies, suggesting an energy-stratification of the jet. While during the IXPE epochs the polarization angle in the X-ray, optical and radio bands align well, we find a clear discrepancy in the optical and radio polarization angles in the middle of the campaign. We model the broad-band spectra simultaneous to the IXPE pointings assuming a compact zone dominating in the X-rays and VHE, and an extended zone stretching further downstream the jet dominating the emission at lower energies. NuSTAR data allow us to precisely constrain the synchrotron peak and therefore the underlying electron distribution. The change between the different states observed in the three IXPE pointings can be explained by a change of magnetization and/or emission region size, which directly connects the shift of the synchrotron peak to lower energies with the drop in polarization degree. △ Less

Submitted 1 September, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: A&A, 685, A117 (2024). 19 pages, 9 figures. Corresponding authors: Lea Heckmann, Axel Arbet Engels, David Paneque

arXiv:2401.08541 [pdf, other]

Scalable Pre-training of Large Autoregressive Image Models

Authors: Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Abstract: This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value o… ▽ More This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks. We illustrate the practical implication of these findings by pre-training a 7 billion parameter AIM on 2 billion images, that achieves 84.0% on ImageNet-1k with a frozen trunk. Interestingly, even at this scale, we observe no sign of saturation in performance, suggesting that AIM potentially represents a new frontier for training large-scale vision models. The pre-training of AIM is similar to the pre-training of LLMs, and does not require any image-specific strategy to stabilize the training at scale. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: https://github.com/apple/ml-aim

arXiv:2311.17932 [pdf, other]

Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

Authors: Yuyang Wang, Ahmed A. Elhag, Navdeep Jaitly, Joshua M. Susskind, Miguel Angel Bautista

Abstract: We present a novel way to predict molecular conformers through a simple formulation that sidesteps many of the heuristics of prior works and achieves state of the art results by using the advantages of scale. By training a diffusion generative model directly on 3D atomic positions without making assumptions about the explicit structure of molecules (e.g. modeling torsional angles) we are able to r… ▽ More We present a novel way to predict molecular conformers through a simple formulation that sidesteps many of the heuristics of prior works and achieves state of the art results by using the advantages of scale. By training a diffusion generative model directly on 3D atomic positions without making assumptions about the explicit structure of molecules (e.g. modeling torsional angles) we are able to radically simplify structure learning, and make it trivial to scale up the model sizes. This model, called Molecular Conformer Fields (MCF), works by parameterizing conformer structures as functions that map elements from a molecular graph directly to their 3D location in space. This formulation allows us to boil down the essence of structure prediction to learning a distribution over functions. Experimental results show that scaling up the model capacity leads to large gains in generalization performance without enforcing inductive biases like rotational equivariance. MCF represents an advance in extending diffusion models to handle complex scientific problems in a conceptually simple, scalable and effective manner. △ Less

Submitted 10 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 19 pages, 11 figures

arXiv:2310.08866 [pdf, other]

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Authors: Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

Abstract: Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et a… ▽ More Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers. This model demonstrates higher accuracy and a fairer allocation of computational resources when generalizing to higher numbers of computation steps. We conclude that mechanisms for adaptive depth and modularity complement each other in improving efficient generalization concerning example complexity. Additionally, to emphasize the broad applicability of our findings, we illustrate that in a standard image recognition task, Hyper- UT's performance matches that of a ViT model but with considerably reduced computational demands (achieving over 70\% average savings by effectively using fewer layers). △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.08587 [pdf, other]

Pseudo-Generalized Dynamic View Synthesis from a Video

Authors: Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing

Abstract: Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best… ▽ More Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best knowledge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. To answer whether generalized dynamic novel view synthesis from monocular videos is possible today, we establish an analysis framework based on existing techniques and work toward the generalized approach. We find a pseudo-generalized process without scene-specific appearance optimization is possible, but geometrically and temporally consistent depth estimates are needed. Despite no scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods. △ Less

Submitted 19 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: ICLR 2024; Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?"; Project page: https://xiaoming-zhao.github.io/projects/pgdvs

arXiv:2306.07290 [pdf, other]

Value function estimation using conditional diffusion models for control

Authors: Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind

Abstract: A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative to address, sooner than later, the potential problem of running out of high-quality demonstrations. In this case, instead of collecting only new data via costly… ▽ More A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative to address, sooner than later, the potential problem of running out of high-quality demonstrations. In this case, instead of collecting only new data via costly human demonstrations or risking a simulation-to-real transfer with uncertain effects, it would be beneficial to leverage vast amounts of readily-available low-quality data. Since classical control algorithms such as behavior cloning or temporal difference learning cannot be used on reward-free or action-free data out-of-the-box, this solution warrants novel training paradigms for continuous control. We propose a simple algorithm called Diffused Value Function (DVF), which learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. This model can be efficiently learned from state sequences (i.e., without access to reward functions nor actions), and subsequently used to estimate the value of each action out-of-the-box. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers, and show promising qualitative and quantitative results on challenging robotics benchmarks. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.15586 [pdf, other]

Manifold Diffusion Fields

Authors: Ahmed A. Elhag, Yuyang Wang, Joshua M. Susskind, Miguel Angel Bautista

Abstract: We present Manifold Diffusion Fields (MDF), an approach that unlocks learning of diffusion models of data in general non-Euclidean geometries. Leveraging insights from spectral geometry analysis, we define an intrinsic coordinate system on the manifold via the eigen-functions of the Laplace-Beltrami Operator. MDF represents functions using an explicit parametrization formed by a set of multiple in… ▽ More We present Manifold Diffusion Fields (MDF), an approach that unlocks learning of diffusion models of data in general non-Euclidean geometries. Leveraging insights from spectral geometry analysis, we define an intrinsic coordinate system on the manifold via the eigen-functions of the Laplace-Beltrami Operator. MDF represents functions using an explicit parametrization formed by a set of multiple input-output pairs. Our approach allows to sample continuous functions on manifolds and is invariant with respect to rigid and isometric transformations of the manifold. In addition, we show that MDF generalizes to the case where the training set contains functions on different manifolds. Empirical results on multiple datasets and manifolds including challenging scientific problems like weather prediction or molecular conformation show that MDF can capture distributions of such functions with better diversity and fidelity than previous approaches. △ Less

Submitted 19 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: ICLR24 paper

arXiv:2305.13218 [pdf, other]

Ground truth clustering is not the optimum clustering

Authors: Lucia Absalom Bautista, Timotej Hrga, Janez Povh, Shudian Zhao

Abstract: The clustering of data is one of the most important and challenging topics in data science. The minimum sum-of-squares clustering (MSSC) problem asks to cluster the data points into $k$ clusters such that the sum of squared distances between the data points and their cluster centers (centroids) is minimized. This problem is NP-hard, but there exist exact solvers that can solve such problem to opti… ▽ More The clustering of data is one of the most important and challenging topics in data science. The minimum sum-of-squares clustering (MSSC) problem asks to cluster the data points into $k$ clusters such that the sum of squared distances between the data points and their cluster centers (centroids) is minimized. This problem is NP-hard, but there exist exact solvers that can solve such problem to optimality for small or medium size instances. In this paper, we use a branch-and-bound solver based on semidefinite programming relaxations called SOS-SDP to compute the optimum solutions of the MSSC problem for various $k$ and for multiple datasets, with real and artificial data, for which the data provider has provided ground truth clustering. Next, we use several extrinsic and intrinsic measures to evaluate how the optimum clustering and ground truth clustering matches, and how well these clusterings perform with respect to the criteria underlying the intrinsic measures. Our calculations show that the ground truth clusterings are generally far from the optimum solution to the MSSC problem. Moreover, the intrinsic measures evaluated on the ground truth clusterings are generally significantly worse compared to the optimum clusterings. However, when the ground truth clustering is in the form of convex sets, e.g., ellipsoids, that are well separated from each other, the ground truth clustering comes very close to the optimum clustering. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 23 pages; 2 figures, 5 tables

arXiv:2303.00165 [pdf, other]

Diffusion Probabilistic Fields

Authors: Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista

Abstract: Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains. However, to adapt diffusion generative modeling to these domains the denoising network needs to be carefully designed for each domain independently, oftentimes under the assumption that data lives in a Euclidean grid. In this paper we introduce Diffusion Prob… ▽ More Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains. However, to adapt diffusion generative modeling to these domains the denoising network needs to be carefully designed for each domain independently, oftentimes under the assumption that data lives in a Euclidean grid. In this paper we introduce Diffusion Probabilistic Fields (DPF), a diffusion model that can learn distributions over continuous functions defined over metric spaces, commonly known as fields. We extend the formulation of diffusion probabilistic models to deal with this field parametrization in an explicit way, enabling us to define an end-to-end learning algorithm that side-steps the requirement of representing fields with latent vectors as in previous approaches (Dupont et al., 2022a; Du et al., 2021). We empirically show that, while using the same denoising network, DPF effectively deals with different modalities like 2D images and 3D geometry, in addition to modeling distributions over fields defined on non-Euclidean metric spaces. △ Less

Submitted 28 February, 2023; originally announced March 2023.

Comments: Accepted to ICLR 2023. 20 pages, 17 figures

arXiv:2210.04955 [pdf, other]

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind

Abstract: Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains. Standard DMs can be viewed as an instantiation of hierarchical variational autoencoders (VAEs) where the latent variables are inferred from input-centered Gaussian distributions with fixed scales and variances. Unlike VAEs, this formulation limits DMs from changing the latent spaces and learning… ▽ More Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains. Standard DMs can be viewed as an instantiation of hierarchical variational autoencoders (VAEs) where the latent variables are inferred from input-centered Gaussian distributions with fixed scales and variances. Unlike VAEs, this formulation limits DMs from changing the latent spaces and learning abstract representations. In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation. More precisely, we extend DMs to incorporate a set of (hand-designed or learned) transformations, where the transformed input is the mean of each diffusion step. We propose a generalized formulation and derive the corresponding de-noising objective with a modified sampling algorithm. As a demonstration, we apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations based on the encoder of pretrained VAEs. In addition, we identify the importance of adjusting the noise levels whenever the signal is sub-sampled and propose a simple rescaling recipe. f-DM can produce high-quality samples on standard image generation benchmarks like FFHQ, AFHQ, LSUN, and ImageNet with better efficiency and semantic interpretation. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 28 pages, 21 figures, work in progress

arXiv:2207.13751 [pdf, other]

GAUDI: A Neural Architect for Immersive 3D Scene Generation

Authors: Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

Abstract: We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generati… ▽ More We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: Project webpage: https://github.com/apple/ml-gaudi

arXiv:2207.01292 [pdf, other]

doi 10.1007/s10714-018-2479-9

L-extensions and L-boundary of conformal spacetimes

Authors: A. Bautista, A. Ibort, J. Lafuente

Abstract: The notion of L-boundary, a new causal boundary proposed by R. Low based on constructing a `sky at infinity' for any light ray, is discussed in detail. The analysis of the notion of L-boundary will be done in the 3-dimensional situation for the ease of presentation. The proposed notion of causal boundary is intrinsically conformal and, as it will be proved in the paper, under natural conditions pr… ▽ More The notion of L-boundary, a new causal boundary proposed by R. Low based on constructing a `sky at infinity' for any light ray, is discussed in detail. The analysis of the notion of L-boundary will be done in the 3-dimensional situation for the ease of presentation. The proposed notion of causal boundary is intrinsically conformal and, as it will be proved in the paper, under natural conditions provides a natural extension $\bar{M}$ of the given spacetime $M$ with smooth boundary $\partial M = \bar{M} \backslash M$. The extensions $\bar{M}$ of any conformal manifold $M$ constructed in this way are characterised exclusively in terms of local properties at the boundary points. Such extensions are called L-extensions and it is proved that, if they exist, they are essentially unique. Finally it is shown that in the 3-dimensional case, any L-extension is equivalent to the canonical extension obtained by using the L-boundary of the manifold. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: 39 pages, 6 figures

Journal ref: General Relativity and Gravitation, volume 50, article number: 153 (2018)

arXiv:2207.01273 [pdf, other]

doi 10.1063/1.4976506

A conformal boundary for space-times based on light-like geodesics: the 3-dimensional case

Authors: A. Bautista, A. Ibort, J. Lafuente, R. Low

Abstract: A new causal boundary, which we will term the $l$-boundary, inspired by the geometry of the space of light rays and invariant by conformal diffeomorphisms for space-times of any dimension $m\geq 3$, proposed by one of the authors (R.J. Low, The space of null geodesics (and a new causal boundary), Lecture Notes in Physics, 692, Springer, 2006, 35--50) is analyzed in detail for space-times of dimens… ▽ More A new causal boundary, which we will term the $l$-boundary, inspired by the geometry of the space of light rays and invariant by conformal diffeomorphisms for space-times of any dimension $m\geq 3$, proposed by one of the authors (R.J. Low, The space of null geodesics (and a new causal boundary), Lecture Notes in Physics, 692, Springer, 2006, 35--50) is analyzed in detail for space-times of dimension 3. Under some natural assumptions it is shown that the completed space-time becomes a smooth manifold with boundary and its relation with Geroch-Kronheimer-Penrose causal boundary is discussed. A number of examples illustrating the properties of this new causal boundary as well as a discussion on the obtained results will be provided. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: 28 pages, 5 figures

Journal ref: Journal of Mathematical Physics, 58, 022503 (2017)

arXiv:2206.14095 [pdf, ps, other]

doi 10.1051/0004-6361/202243875

Atomic Radiative Data for Oxygen and Nitrogen for Solar Photospheric Studies

Authors: Manuel A. Bautista, Maria Bergemann, Helena Carvajal Gallego, Sébastien Gamrath, Patrick Palmeri, Pascal Quinet

Abstract: Our recent re-analysis of the solar photospheric spectra with non-local thermodynamic equilibrium (non-LTE) models resulted in higher metal abundances compared to previous works. When applying the new chemical abundances to Standard Solar Model calculations, the new composition resolves the long-standing discrepancies with independent constraints on the solar structure from helioseismology. Critic… ▽ More Our recent re-analysis of the solar photospheric spectra with non-local thermodynamic equilibrium (non-LTE) models resulted in higher metal abundances compared to previous works. When applying the new chemical abundances to Standard Solar Model calculations, the new composition resolves the long-standing discrepancies with independent constraints on the solar structure from helioseismology. Critical to the determination of chemical abundances is the accuracy of the atomic data, specially the $f$-values, used in the radiative transfer models. Here we describe in detail the calculations of $f$-values for neutral oxygen and nitrogen used in our non-LTE models. Our calculations of $f$-values are based on a multi-method, multi-code approach and are the most detailed and extensive of its kind for the spectral lines of interest. We also report in this paper the details of extensive R-matrix calculation of photo-ionization cross sections for oxygen. Our calculation resulted in reliable $f$-values with well constrained uncertainties. We compare our results with previous theoretical and experimental determinations {of atomic data. We also quantify the influence of adopted photo-ionisation cross-sections on the spectroscopic estimate of the solar O abundance, using the data from different sources. We confirm that our 3D non-LTE value is robust and unaffected by the choice of photo-ionisation data, contrary to the recent claim made by Nahar. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 16 pages, 6 tables, 5 figures. Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 665, A18 (2022)

arXiv:2206.04970 [pdf, ps, other]

doi 10.1142/S0219887822501687

The sky invariant: A new conformal invariant for Schwarzschild spacetime

Authors: A. Bautista, A. Ibort, J. Lafuente

Abstract: A new class of conformal invariants for a given spacetime $M$ is introduced exploiting the conformal geometry of any light ray $Γ$. Each congruence of light rays passing through a given point $p$ defines the sky $S(p)$ of such point. The new conformal invariants are defined on the bundle of skies of the spacetime $M$, being called sky invariants accordingly. The natural conformal covariant derivat… ▽ More A new class of conformal invariants for a given spacetime $M$ is introduced exploiting the conformal geometry of any light ray $Γ$. Each congruence of light rays passing through a given point $p$ defines the sky $S(p)$ of such point. The new conformal invariants are defined on the bundle of skies of the spacetime $M$, being called sky invariants accordingly. The natural conformal covariant derivative defined on a light ray and its associated covariant calculus allows us show the existence of a natural conformal invariant differential of arc that, together with the restriction of the curvature of the conformal covariant derivative, can be used to construct a sky invariant that will be called the sky curvature. An algorithm, that can be implemented on any symbolic manipulation software system, to compute the sky curvature will be discussed and the main ideas and the explicit computation of the sky curvature are illustrated in Schwarzschild spacetime. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 20 pages, 2 figures, accepted for publication in International Journal of Geometric Methods in Modern Physics

arXiv:2206.04964 [pdf, ps, other]

doi 10.1007/s10714-022-02942-3

The space of light rays: Causality and $L$-boundary

Authors: A. Bautista, A. Ibort, J. Lafuente

Abstract: The space of light rays $\mathcal{N}$ of a conformal Lorentz manifold $(M,\mathcal{C})$ is, under some topological conditions, a manifold whose basic elements are unparametrized null geodesics. This manifold $\mathcal{N}$, strongly inspired on R. Penrose's twistor theory, keeps all information of $M$ and it could be used as a space complementing the spacetime model. In the present review, the geom… ▽ More The space of light rays $\mathcal{N}$ of a conformal Lorentz manifold $(M,\mathcal{C})$ is, under some topological conditions, a manifold whose basic elements are unparametrized null geodesics. This manifold $\mathcal{N}$, strongly inspired on R. Penrose's twistor theory, keeps all information of $M$ and it could be used as a space complementing the spacetime model. In the present review, the geometry and related structures of $\mathcal{N}$, such as the space of skies $Σ$ and the contact structure $\mathcal{H}$, are introduced. The causal structure of $M$ is characterized as part of the geometry of $\mathcal{N}$. A new causal boundary for spacetimes $M$ prompted by R. Low, the $L$-boundary, is constructed in the case of $3$-dimensional manifolds $M$ and proposed as a model of its construction for general dimension. Its definition only depends on the geometry of $\mathcal{N}$ and not on the geometry of the spacetime $M$. The properties satisfied by the $L$-boundary $\partial M$ permit to characterize the obtained extension $\overline{M}=M\cup \partial M$ and this characterization is also proposed for general dimension. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 57 pages, 15 figures, accepted for publication in the topical collection of General Relativity and Gravitation dedicated to the meeting Singularity theorems, causality, and all that (SCRI21) in honor of Roger Penrose

arXiv:2205.07763 [pdf, other]

FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

Authors: Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang

Abstract: Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy i… ▽ More Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. The core of our approach is a fast and robust multi-view reconstruction algorithm to jointly refine 3D geometry and camera pose estimation using learnable neural network modules. We provide a thorough benchmark of state-of-the-art approaches for this problem on ShapeNet. Our approach achieves best-in-class results. It is also two orders of magnitude faster than the recent optimization-based approach IDR. Our code is released at \url{https://github.com/zhenpeiyang/FvOR/} △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: CVPR 2022

arXiv:2205.04708 [pdf, other]

doi 10.3847/1538-4357/acbd40

Time Dependent Photoionization Modeling of Warm Absorbers in Active Galactic Nuclei

Authors: Dev R Sadaula, Manuel A Bautista, Javier A Garcia, Timothy R Kallman

Abstract: Warm absorber spectra contain bound-bound and bound-free absorption features seen in the X-ray and UV spectra from many active galactic nuclei (AGN). The widths and centroid energies of these features indicate they occur in outflowing gas, and the outflow can affect the gas within the host galaxy. Thus the warm absorber mass and energy budgets are of great interest. Estimates for these properties… ▽ More Warm absorber spectra contain bound-bound and bound-free absorption features seen in the X-ray and UV spectra from many active galactic nuclei (AGN). The widths and centroid energies of these features indicate they occur in outflowing gas, and the outflow can affect the gas within the host galaxy. Thus the warm absorber mass and energy budgets are of great interest. Estimates for these properties depend on models which connect the observed strengths of the absorption features with the density, composition, and ionization state of the absorbing gas. Such models assume that the ionization and heating of the gas come primarily from the strong continuum near the central black hole. They also assume that the various heating, cooling, ionization, and recombination processes are in a time-steady balance. This assumption may not be valid, owing to the intrinsic time-variability of the illuminating continuum, or other factors which change the cloud environment. This paper presents models for warm absorbers which follow the time dependence of the ionization, temperature, and radiation field in warm absorber gas clouds in response to a changing continuum illumination. We show that the effects of time variability are important over a range of parameter values, that time dependent models differ from equilibrium models in important ways, and that these effects should be included in models which derive properties of warm absorber outflows. △ Less

Submitted 17 February, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

arXiv:2110.09957 [pdf, other]

doi 10.1051/0004-6361/202141550

Plasma environment effects on K lines of astrophysical interest. V. Universal formulae for ionization potential and K-threshold shifts

Authors: P. Palmeri, J. Deprince, M. A. Bautista, S. Fritzsche, J. A. Garcia, T. R. Kallman, C. Mendoza, P. Quinet

Abstract: Aims. We calculate the plasma environment effects on the ionization potentials (IPs) and K-thresholds used in the modeling of K lines for all the ions belonging to the isonuclear sequences of abundant elements apart from oxygen and iron, namely: carbon, silicon, calcium, chromium, and nickel. These calculations are used to extend the data points for the fits of the universal formulae, first propos… ▽ More Aims. We calculate the plasma environment effects on the ionization potentials (IPs) and K-thresholds used in the modeling of K lines for all the ions belonging to the isonuclear sequences of abundant elements apart from oxygen and iron, namely: carbon, silicon, calcium, chromium, and nickel. These calculations are used to extend the data points for the fits of the universal formulae, first proposed in our fourth paper of this series, to predict the IP and K-threshold lowerings in any elemental ion. Methods. We used the fully relativistic multi-configuration Dirac-Fock (MCDF) method and approximated the plasma electron-nucleus and electron-electron screenings with a time-averaged Debye-Huckel potential. Results. We report the modified ionization potentials and K-threshold energies for plasmas characterized by electron temperatures and densities in the ranges of 10^5-10^7 K and 10^18-10^22 cm^-3 . In addition, the improved universal fitting formulae are obtained. Conclusions. We conclude that since explicit calculations of the atomic structures for each ion of each element under different plasma conditions is impractical, the use of these universal formulae for predicting the IP and K-threshold lowerings in plasma modeling codes is still recommended. However, their comparatively moderate to low accuracies may affect the predicted opacities with regard to certain cases under extreme plasma conditions that are characterized by a plasma screening parameter of μ> 0.2 a.u., especially for the K-thresholds. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Journal ref: A&A 657, A61 (2022)

arXiv:2107.05775 [pdf, other]

Fast and Explicit Neural View Synthesis

Authors: Pengsheng Guo, Miguel Angel Bautista, Alex Colburn, Liang Yang, Daniel Ulbricht, Joshua M. Susskind, Qi Shan

Abstract: We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radian… ▽ More We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radiance field representations have gained a lot of attention due to their expressive power, our simple approach obtains comparable or even better novel view reconstruction quality comparing with state-of-the-art baselines while increasing rendering speed by over 400x. Our model is trained in a category-agnostic manner and does not require scene-specific optimization. Therefore, it is able to generalize novel view synthesis to object categories not seen during training. In addition, we show that with our simple formulation, we can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision. △ Less

Submitted 8 December, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2104.00670 [pdf, other]

Unconstrained Scene Generation with Locally Conditioned Radiance Fields

Authors: Terrance DeVries, Miguel Angel Bautista, Nitish Srivastava, Graham W. Taylor, Joshua M. Susskind

Abstract: We tackle the challenge of learning a distribution over complex, realistic, indoor scenes. In this paper, we introduce Generative Scene Networks (GSN), which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera. Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations. Rece… ▽ More We tackle the challenge of learning a distribution over complex, realistic, indoor scenes. In this paper, we introduce Generative Scene Networks (GSN), which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera. Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations. Recent work has shown that generative models of radiance fields can capture properties such as multi-view consistency and view-dependent lighting. However, these models are specialized for constrained viewing of single objects, such as cars or faces. Due to the size and complexity of realistic indoor environments, existing models lack the representational capacity to adequately capture them. Our decomposition scheme scales to larger and more complex scenes while preserving details and diversity, and the learned prior enables high-quality rendering from viewpoints that are significantly different from observed viewpoints. When compared to existing models, GSN produces quantitatively higher-quality scene renderings across several different scene datasets. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2012.02041 [pdf, other]

The XSTAR Atomic Database

Authors: Claudio Mendoza, Manuel A. Bautista, Jérôme Deprince, Javier A. García, Efraín Gatuzz, Thomas W. Gorczyca, Timothy R. Kallman, Patrick Palmeri, Pascal Quinet, Michael C. Witthoeft

Abstract: We describe the atomic database of the XSTAR spectral modeling code, summarizing the systematic upgrades carried out in the past twenty years to enable the modeling of K lines from chemical elements with atomic number $Z\leq 30$ and recent extensions to handle high-density plasmas. Such plasma environments are found, for instance, in the inner region of accretion disks round compact objects (neutr… ▽ More We describe the atomic database of the XSTAR spectral modeling code, summarizing the systematic upgrades carried out in the past twenty years to enable the modeling of K lines from chemical elements with atomic number $Z\leq 30$ and recent extensions to handle high-density plasmas. Such plasma environments are found, for instance, in the inner region of accretion disks round compact objects (neutron stars and black holes), which emit rich information about the system physical properties. Our intention is to offer a reliable modeling tool to take advantage of the outstanding spectral capabilities of the new generation of X-ray space telescopes (e.g., XRISM and ATHENA) to be launched in the coming years. Data curatorial aspects are discussed and an updated list of reference sources is compiled to improve the database provenance metadata. Two XSTAR spin-offs -- the ISMabs absorption model and the uaDB database -- are also described. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 36 pages, 11 figures

arXiv:2011.02523 [pdf, other]

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Authors: Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, Joshua M. Susskind

Abstract: For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images… ▽ More For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense per-pixel semantic instance segmentations and complete camera information for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, computation time, and annotation effort. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model. We also evaluate sim-to-real transfer performance on two real-world scene understanding tasks - semantic segmentation and 3D shape prediction - where we find that pre-training on our dataset significantly improves performance on both tasks, and achieves state-of-the-art performance on the most challenging Pix3D test set. All of our rendered image data, as well as all the code we used to generate our dataset and perform our experiments, is available online. △ Less

Submitted 17 August, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021

arXiv:2009.10586 [pdf, other]

doi 10.3390/atoms8040066

Atomic Data Assessment with PyNeb

Authors: Christophe Morisset, Valentina Luridiana, Jorge García-Rojas, Verónica Gómez-Llanos, Manuel A. Bautista, Claudio Mendoza

Abstract: PyNeb is a Python package widely used to model emission lines in gaseous nebulae. We take advantage of its object-oriented architecture, class methods, and historical atomic database to structure a practical environment for atomic data assessment. Our aim is to reduce the uncertainties in parameter space (line-ratio diagnostics, electron density and temperature, and ionic abundances) arising from… ▽ More PyNeb is a Python package widely used to model emission lines in gaseous nebulae. We take advantage of its object-oriented architecture, class methods, and historical atomic database to structure a practical environment for atomic data assessment. Our aim is to reduce the uncertainties in parameter space (line-ratio diagnostics, electron density and temperature, and ionic abundances) arising from the underlying atomic data by critically selecting the PyNeb default datasets. We evaluate the questioned radiative-rate accuracy of the collisionally excited forbidden lines of the N- and P-like ions (O II, Ne IV, S II, Cl III, and Ar IV), which are used as density diagnostics. With the aid of observed line ratios in the dense NGC 7027 planetary nebula and careful data analysis, we arrive at emissivity-ratio uncertainties from the radiative rates within 10\%, a considerable improvement over a previously predicted 50\%. We also examine the accuracy of an extensive dataset of electron-impact effective collision strengths for the carbon isoelectronic sequence recently published. By estimating the impact of the new data on the pivotal temperature diagnostics of [N II] and [O III] and by benchmarking the collision strength with a measured resonance position, we question their usefulness in nebular modeling. We confirm that the effective-collision-strength scatter of selected datasets for these two ions does not lead to uncertainties in the temperature diagnostics larger than 10\%. △ Less

Submitted 8 October, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: published in Atoms MDPI

Journal ref: PyNeb. Atoms 2020, 8, 66

arXiv:2006.15427 [pdf, other]

On the generalization of learning-based 3D reconstruction

Authors: Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish Srivastava, Joshua M Susskind

Abstract: State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training. In this paper we study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods. We find that 3… ▽ More State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training. In this paper we study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods. We find that 3 inductive biases impact performance: the spatial extent of the encoder, the use of the underlying geometry of the scene to describe point features, and the mechanism to aggregate information from multiple views. Additionally, we propose mechanisms to enforce those inductive biases: a point representation that is aware of camera position, and a variance cost to aggregate information across views. Our model achieves state-of-the-art results on the standard ShapeNet 3D reconstruction benchmark in various settings. △ Less

Submitted 27 June, 2020; originally announced June 2020.

arXiv:2006.10705 [pdf, other]

Set Distribution Networks: a Generative Model for Sets of Images

Authors: Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Carlos Guestrin, Josh M. Susskind

Abstract: Images with shared characteristics naturally form sets. For example, in a face verification benchmark, images of the same identity form sets. For generative models, the standard way of dealing with sets is to represent each as a one hot vector, and learn a conditional generative model $p(\mathbf{x}|\mathbf{y})$. This representation assumes that the number of sets is limited and known, such that th… ▽ More Images with shared characteristics naturally form sets. For example, in a face verification benchmark, images of the same identity form sets. For generative models, the standard way of dealing with sets is to represent each as a one hot vector, and learn a conditional generative model $p(\mathbf{x}|\mathbf{y})$. This representation assumes that the number of sets is limited and known, such that the distribution over sets reduces to a simple multinomial distribution. In contrast, we study a more generic problem where the number of sets is large and unknown. We introduce Set Distribution Networks (SDNs), a novel framework that learns to autoencode and freely generate sets. We achieve this by jointly learning a set encoder, set discriminator, set generator, and set prior. We show that SDNs are able to reconstruct image sets that preserve salient attributes of the inputs in our benchmark datasets, and are also able to generate novel objects/identities. We examine the sets generated by SDN with a pre-trained 3D reconstruction network and a face verification network, respectively, as a novel way to evaluate the quality of generated sets of images. △ Less

Submitted 18 June, 2020; originally announced June 2020.

arXiv:2006.07630 [pdf, other]

Equivariant Neural Rendering

Authors: Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan

Abstract: We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer… ▽ More We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks. △ Less

Submitted 21 December, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: Add link to code

arXiv:2005.02139 [pdf, ps, other]

doi 10.1093/mnras/staa1311

On the changes in the physical properties of the ionized region around the Weigelt structures in Eta Carinae over the 5.54-yr spectroscopic cycle

Authors: M. Teodoro, T. R. Gull, M. A. Bautista, D. J. Hillier, G Weigelt, M. Corcoran

Abstract: We present HST/STIS observations and analysis of two prominent nebular structures around the central source of Eta Carinae, the knots C and D. The former is brighter than the latter for emission lines from intermediate or high ionization potential ions. The brightness of lines from intermediate and high ionization potential ions significantly decreases at phases around periastron. We do not see co… ▽ More We present HST/STIS observations and analysis of two prominent nebular structures around the central source of Eta Carinae, the knots C and D. The former is brighter than the latter for emission lines from intermediate or high ionization potential ions. The brightness of lines from intermediate and high ionization potential ions significantly decreases at phases around periastron. We do not see conspicuous changes in the brightness of lines from low ionization potential (<13.6 eV) that the total extinction towards the Weigelt structures is that the total extinction towards the Weigelt structures is AsubV =2/0. that the total extinction towards the Weigelt structures is AV = 2.0. Weigelt C and D are characterized by an electron density of that the total extinction towards the Weigelt structures is AV = 2.0. Weigelt C and D are characterized by an electron density of 10exp6.9 cm-3 that does not significantly change throughout the orbital cycle. The electron temperature varies from 5500 K (around periastron) to 7200 K (around apastron). The relative changes in the brightness of He I lines are well reproduced by the variations in the electron temperature alone. We found that, at phases around periastron, the electron temperature seems to be higher for Weigelt C than that of D. The Weigelt structures are located close to the Homunculus equatorial plane, at a distance of about 1240 AU from the central source. From the analysis of proper motion and age, the Weigelt complex can be associated with the equatorial structure called the Butterfly Nebula surrounding the central binary system. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 19 pages, 18 figures

arXiv:2001.11915 [pdf, other]

doi 10.1051/0004-6361/201937088

Plasma-environment effects on K lines of astrophysical interest III. IPs, K thresholds, radiative rates, and Auger widths in Fe ix - Fe xvi

Authors: J. Deprince, M. A. Bautista, S. Fritzsche, J. A. Garcia, T. R. Kallman, C. Mendoza, P. Palmeri, P. Quinet

Abstract: Aims. In the context of black-hole accretion disks, we aim to compute the plasma-environment effects on the atomic parameters used to model the decay of K-vacancy states in moderately charged iron ions, namely Fe ix - Fe xvi. Methods. We used the fully relativistic multiconfiguration Dirac-Fock (MCDF) method approximating the plasma electron-nucleus and electron-electron screenings with a time-ave… ▽ More Aims. In the context of black-hole accretion disks, we aim to compute the plasma-environment effects on the atomic parameters used to model the decay of K-vacancy states in moderately charged iron ions, namely Fe ix - Fe xvi. Methods. We used the fully relativistic multiconfiguration Dirac-Fock (MCDF) method approximating the plasma electron-nucleus and electron-electron screenings with a time-averaged Debye-Huckel potential. Results. We report modified ionization potentials, K-threshold energies, wavelengths, radiative emission rates, and Auger widths for plasmas characterized by electron temperatures and densities in the ranges $10^5$ - $10^7$ K and $10^{18}$ - $10^{22}$ cm$^{-3}$. Conclusions. This study confirms that the high-resolution X-ray spectrometers onboard the future XRISM and ATHENA space missions will be capable of detecting the lowering of the K edges of these ions due to the extreme plasma conditions occurring in accretion disks around compact objects. △ Less

Submitted 31 January, 2020; originally announced January 2020.

Comments: 6 pages, 5 figures, to be published in A&A

Journal ref: A&A 635, A70 (2020)

arXiv:1905.05895 [pdf, other]

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

Authors: Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

Abstract: In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empir… ▽ More In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks. Importantly, our method is applicable to a wide range of loss functions and evaluation metrics. Furthermore, the learned policies are transferable across tasks and data, demonstrating the versatility of the method. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted to ICML 2019

Showing 1–50 of 108 results for author: Bautista, A