Skip to main content

Showing 1–50 of 84 results for author: Demir, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08171  [pdf, ps, other

    cs.CV cs.AI

    OceanMAE: A Foundation Model for Ocean Remote Sensing

    Authors: Viola-Joanna Stamer, Panagiotis Agrafiotis, Behnood Rasti, Begüm Demir

    Abstract: Accurate ocean mapping is essential for applications such as bathymetry estimation, seabed characterization, marine litter detection, and ecosystem monitoring. However, ocean remote sensing (RS) remains constrained by limited labeled data and by the reduced transferability of models pre-trained mainly on land-dominated Earth observation imagery. In this paper, we propose OceanMAE, an ocean-specifi… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2603.29630  [pdf, ps, other

    cs.CV

    BigEarthNet.txt: A Large-Scale Multi-Sensor Image-Text Dataset and Benchmark for Earth Observation

    Authors: Johann-Ludwig Herzog, Mathis Jürgen Adler, Leonard Hackel, Yan Shu, Angelos Zavras, Ioannis Papoutsis, Paolo Rota, Begüm Demir

    Abstract: Vision-langugage models (VLMs) have shown strong performance in computer vision (CV), yet their performance on remote sensing (RS) data remains limited due to the lack of large-scale, multi-sensor RS image-text datasets with diverse textual annotations. Existing datasets predominantly include aerial Red-Green-Blue imagery, with short or weakly grounded captions, and provide limited diversity in an… ▽ More

    Submitted 1 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

    Comments: For details, see https://txt.bigearth.net

  3. arXiv:2603.26468  [pdf, ps, other

    cs.CV

    HyVIC: A Metric-Driven Spatio-Spectral Hyperspectral Image Compression Architecture Based on Variational Autoencoders

    Authors: Martin Hermann Paul Fuchs, Behnood Rasti, Begüm Demir

    Abstract: The rapid growth of hyperspectral data archives in remote sensing (RS) necessitates effective compression methods for storage and transmission. Recent advances in learning-based hyperspectral image (HSI) compression have significantly enhanced both reconstruction fidelity and compression efficiency. However, existing methods typically adapt variational image compression models designed for natural… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  4. arXiv:2603.19039  [pdf, ps, other

    cs.CV

    TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

    Authors: Yan Shu, Bin Ren, Zhitong Xiong, Xiao Xiang Zhu, Begüm Demir, Nicu Sebe, Paolo Rota

    Abstract: Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospatial reasoning with two key capabilities: (1) modality-flexible reasoning: it handles single-modalit… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR20206 (Main Track)

  5. arXiv:2601.22841  [pdf, ps, other

    cs.CV

    How Much of a Model Do We Need? Redundancy and Slimmability in Remote Sensing Foundation Models

    Authors: Leonard Hackel, Tom Burgert, Begüm Demir

    Abstract: Large-scale foundation models (FMs) in remote sensing (RS) are developed based on the paradigms established in computer vision (CV) and have shown promise for various Earth observation applications. However, the direct transfer of scaling assumptions from CV to RS has not been adequately examined. We hypothesize that RS FMs enter an overparameterized regime at substantially smaller scales than the… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  6. arXiv:2601.16383  [pdf, ps, other

    eess.IV cs.CV

    On The Robustness of Foundational 3D Medical Image Segmentation Models Against Imprecise Visual Prompts

    Authors: Soumitri Chattopadhyay, Basar Demir, Marc Niethammer

    Abstract: While 3D foundational models have shown promise for promptable segmentation of medical volumes, their robustness to imprecise prompts remains under-explored. In this work, we aim to address this gap by systematically studying the effect of various controlled perturbations of dense visual prompts, that closely mimic real-world imprecision. By conducting experiments with two recent foundational mode… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

    Comments: Accepted at ISBI 2026

  7. arXiv:2601.08446  [pdf, ps, other

    cs.CV cs.LG

    Noise-Adaptive Regularization for Robust Multi-Label Remote Sensing Image Classification

    Authors: Tom Burgert, Julia Henkel, Begüm Demir

    Abstract: The development of reliable methods for multi-label classification (MLC) has become a prominent research direction in remote sensing (RS). As the scale of RS data continues to expand, annotation procedures increasingly rely on thematic products or crowdsourced procedures to reduce the cost of manual annotation. While cost-effective, these strategies often introduce multi-label noise in the form of… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: Submitted to TGRS

  8. arXiv:2601.02289  [pdf, ps, other

    cs.CV

    Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery

    Authors: Tom Burgert, Leonard Hackel, Paolo Rota, Begüm Demir

    Abstract: Self-supervised learning (SSL) has become a powerful paradigm for learning from large, unlabeled datasets, particularly in computer vision (CV). However, applying SSL to multispectral remote sensing (RS) images presents unique challenges and opportunities due to the geographical and temporal variability of the data. In this paper, we introduce GeoRank, a novel regularization method for contrastive… ▽ More

    Submitted 5 January, 2026; originally announced January 2026.

    Comments: accepted for publication at IEEE/CVF Winter Conference on Applications of Computer Vision

  9. arXiv:2511.17442  [pdf, ps, other

    cs.CV cs.AI

    REMSA: Foundation Model Selection for Remote Sensing via a Constraint-Aware Agent

    Authors: Binger Chen, Tacettin Emre Bök, Behnood Rasti, Volker Markl, Begüm Demir

    Abstract: Foundation Models (FMs) are increasingly integrated into remote sensing (RS) pipelines. These models include unimodal vision encoders and multimodal architectures. FMs are adapted to diverse perception tasks, such as image classification, change detection, and visual question answering. However, selecting the most suitable remote sensing foundation model (RSFM) for a specific task remains challeng… ▽ More

    Submitted 11 March, 2026; v1 submitted 21 November, 2025; originally announced November 2025.

    Comments: Code and data available at https://github.com/be-chen/REMSA

  10. Seabed-Net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters

    Authors: Panagiotis Agrafiotis, Begüm Demir

    Abstract: Accurate, detailed, and regularly updated bathymetry, coupled with complex semantic content, is essential for under-mapped shallow-water environments facing increasing climatological and anthropogenic pressures. However, existing approaches that derive either depth or seabed classes from remote sensing imagery treat these tasks in isolation, forfeiting the mutual benefits of their interaction and… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: Submitted to ISPRS Journal of Photogrammetry and Remote Sensing

  11. arXiv:2509.20234  [pdf, ps, other

    cs.CV cs.AI cs.LG

    ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression

    Authors: Tom Burgert, Oliver Stoll, Paolo Rota, Begüm Demir

    Abstract: The hypothesis that Convolutional Neural Networks (CNNs) are inherently texture-biased has shaped much of the discourse on feature use in deep learning. We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, tex… ▽ More

    Submitted 9 January, 2026; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: Accepted at NeurIPS 2025 (oral)

  12. arXiv:2509.14104  [pdf, ps, other

    cs.CV

    CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts

    Authors: Leonard Hackel, Tom Burgert, Begüm Demir

    Abstract: Self-supervised learning through masked autoencoders has attracted great attention for remote sensing (RS) foundation model (FM) development, enabling improved representation learning across diverse sensors and downstream tasks. However, existing RS FMs often either suffer from substantial computational complexity during both training and inference or exhibit limited representational capacity. The… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  13. arXiv:2508.07760  [pdf, ps, other

    eess.IV cs.CV cs.GR

    Sea-Undistort: A Dataset for Through-Water Image Restoration in High Resolution Airborne Bathymetric Mapping

    Authors: Maximilian Kromer, Panagiotis Agrafiotis, Begüm Demir

    Abstract: Accurate image-based bathymetric mapping in shallow waters remains challenging due to the complex optical distortions such as wave induced patterns, scattering and sunglint, introduced by the dynamic water surface, the water column properties, and solar illumination. In this work, we introduce Sea-Undistort, a comprehensive synthetic dataset of 1200 paired 512x512 through-water scenes rendered in… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: Under review in IEEE Geoscience and Remote Sensing Letters

  14. arXiv:2508.06256  [pdf, ps, other

    cs.CV

    FedX: Explanation-Guided Pruning for Communication-Efficient Federated Learning in Remote Sensing

    Authors: Barış Büyüktaş, Jonas Klotz, Begüm Demir

    Abstract: Federated learning (FL) enables the collaborative training of deep neural networks across decentralized data archives (i.e., clients), where each client stores data locally and only shares model updates with a central server. This makes FL a suitable learning paradigm for remote sensing (RS) image classification tasks, where data centralization may be restricted due to legal and privacy constraint… ▽ More

    Submitted 17 February, 2026; v1 submitted 8 August, 2025; originally announced August 2025.

    Comments: Accepted at the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

  15. Adjustable Spatio-Spectral Hyperspectral Image Compression Network

    Authors: Martin Hermann Paul Fuchs, Behnood Rasti, Begüm Demir

    Abstract: With the rapid growth of hyperspectral data archives in remote sensing (RS), the need for efficient storage has become essential, driving significant attention toward learning-based hyperspectral image (HSI) compression. However, a comprehensive investigation of the individual and joint effects of spectral and spatial compression on learning-based HSI compression has not been thoroughly examined y… ▽ More

    Submitted 3 November, 2025; v1 submitted 31 July, 2025; originally announced July 2025.

  16. arXiv:2507.05916  [pdf, ps, other

    cs.CV cs.AI

    On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification

    Authors: Jonas Klotz, Tom Burgert, Begüm Demir

    Abstract: The development of explainable artificial intelligence (xAI) methods for scene classification problems has attracted great attention in remote sensing (RS). Most xAI methods and the related evaluation metrics in RS are initially developed for natural images considered in computer vision (CV), and their direct usage in RS may not be suitable. To address this issue, in this paper, we investigate the… ▽ More

    Submitted 22 October, 2025; v1 submitted 8 July, 2025; originally announced July 2025.

    Comments: The code of this work will be publicly available at https://git.tu-berlin.de/rsim/xai4rs Accepted at IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

  17. Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing

    Authors: Lars Möllenbrok, Behnood Rasti, Begüm Demir

    Abstract: The development of continual learning (CL) methods, which aim to learn new tasks in a sequential manner from the training data acquired continuously, has gained great attention in remote sensing (RS). The existing CL methods in RS, while learning new tasks, enhance robustness towards catastrophic forgetting. This is achieved by using a large number of labeled training samples, which is costly and… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE Geoscience and Remote Sensing Letters. Our code is available at https://git.tu-berlin.de/rsim/CoSMAE

  18. arXiv:2506.02419  [pdf, ps, other

    cs.CV

    Guiding Registration with Emergent Similarity from Pre-Trained Diffusion Models

    Authors: Nurislam Tursynbek, Hastings Greer, Basar Demir, Marc Niethammer

    Abstract: Diffusion models, while trained for image generation, have emerged as powerful foundational feature extractors for downstream tasks. We find that off-the-shelf diffusion models, trained exclusively to generate natural RGB images, can identify semantically meaningful correspondences in medical images. Building on this observation, we propose to leverage diffusion model features as a similarity meas… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025

  19. arXiv:2506.01667  [pdf, ps, other

    cs.CV

    EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM

    Authors: Yan Shu, Bin Ren, Zhitong Xiong, Danda Pani Paudel, Luc Van Gool, Begüm Demir, Nicu Sebe, Paolo Rota

    Abstract: Earth Observation (EO) data analysis is vital for monitoring environmental and human dynamics. Recent Multimodal Large Language Models (MLLMs) show potential in EO understanding but remain restricted to single-sensor inputs, overlooking the complementarity across heterogeneous modalities. We propose EarthMind, a unified vision-language framework that handles both single- and cross-sensor inputs vi… ▽ More

    Submitted 28 September, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  20. arXiv:2505.11121  [pdf, ps, other

    cs.CV

    Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing

    Authors: Mathis Jürgen Adler, Leonard Hackel, Gencer Sumbul, Begüm Demir

    Abstract: The development of foundation models through pretraining of vision-language models (VLMs) has recently attracted great attention in remote sensing (RS). VLM pretraining aims to learn image and language alignments from a large number of image-text pairs. Each pretraining image is often associated with multiple captions containing redundant information due to repeated or semantically similar phrases… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2025. Our code is available at https://git.tu-berlin.de/rsim/redundacy-aware-rs-vlm

  21. Deep Learning-based Bathymetry Retrieval without In-situ Depths using Remote Sensing Imagery and SfM-MVS DSMs with Data Gaps

    Authors: Panagiotis Agrafiotis, Begüm Demir

    Abstract: Accurate, detailed, and high-frequent bathymetry is crucial for shallow seabed areas facing intense climatological and anthropogenic pressures. Current methods utilizing airborne or satellite optical imagery to derive bathymetry primarily rely on either SfM-MVS with refraction correction or Spectrally Derived Bathymetry (SDB). However, SDB methods often require extensive manual fieldwork or costly… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing

  22. arXiv:2504.00264  [pdf, other

    eess.IV cs.CV stat.ML

    DiffDenoise: Self-Supervised Medical Image Denoising with Conditional Diffusion Models

    Authors: Basar Demir, Yikang Liu, Xiao Chen, Eric Z. Chen, Lin Zhao, Boris Mailhe, Terrence Chen, Shanhui Sun

    Abstract: Many self-supervised denoising approaches have been proposed in recent years. However, these methods tend to overly smooth images, resulting in the loss of fine structures that are essential for medical applications. In this paper, we propose DiffDenoise, a powerful self-supervised denoising approach tailored for medical images, designed to preserve high-frequency details. Our approach comprises t… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  23. arXiv:2503.24088   

    cs.CV

    A Plasticity-Aware Method for Continual Self-Supervised Learning in Remote Sensing

    Authors: Lars Möllenbrok, Behnood Rasti, Begüm Demir

    Abstract: Continual self-supervised learning (CSSL) methods have gained increasing attention in remote sensing (RS) due to their capability to learn new tasks sequentially from continuous streams of unlabeled data. Existing CSSL methods, while learning new tasks, focus on preventing catastrophic forgetting. To this end, most of them use regularization strategies to retain knowledge of previous tasks. This… ▽ More

    Submitted 16 May, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: We found the reported results of the compared method to be misleading

  24. arXiv:2503.22862  [pdf, other

    cs.CV

    Zero-shot Domain Generalization of Foundational Models for 3D Medical Image Segmentation: An Experimental Study

    Authors: Soumitri Chattopadhyay, Basar Demir, Marc Niethammer

    Abstract: Domain shift, caused by variations in imaging modalities and acquisition protocols, limits model generalization in medical image segmentation. While foundation models (FMs) trained on diverse large-scale data hold promise for zero-shot generalization, their application to volumetric medical data remains underexplored. In this study, we examine their ability towards domain generalization (DG), by c… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  25. arXiv:2503.16842  [pdf, other

    eess.IV cs.CV

    Downstream Analysis of Foundational Medical Vision Models for Disease Progression

    Authors: Basar Demir, Soumitri Chattopadhyay, Thomas Hastings Greer, Boqi Chen, Marc Niethammer

    Abstract: Medical vision foundational models are used for a wide variety of tasks, including medical image segmentation and registration. This work evaluates the ability of these models to predict disease progression using a simple linear probe. We hypothesize that intermediate layer features of segmentation models capture structural information, while those of registration models encode knowledge of change… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  26. arXiv:2503.10262  [pdf, other

    cs.CV

    A Multi-Modal Federated Learning Framework for Remote Sensing Image Classification

    Authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir

    Abstract: Federated learning (FL) enables the collaborative training of deep neural networks across decentralized data archives (i.e., clients) without sharing the local data of the clients. Most of the existing FL methods assume that the data distributed across all clients is associated with the same data modality. However, remote sensing (RS) images present in different clients can be associated with dive… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  27. arXiv:2502.09598  [pdf, ps, other

    cs.CV

    GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis

    Authors: Angelos Zavras, Dimitrios Michail, Xiao Xiang Zhu, Begüm Demir, Ioannis Papoutsis

    Abstract: Existing Vision-Language Models (VLMs) are predominantly trained on web-scraped, noisy image-text data, exhibiting limited exposure to the specialized domain of RS. This deficiency results in poor performance on RS-specific tasks, as commonly used datasets often lack detailed, scientifically accurate textual descriptions and instead emphasize solely on attributes like date and location. To bridge… ▽ More

    Submitted 21 January, 2026; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: 26 pages, 14 figures

  28. arXiv:2501.19043  [pdf, ps, other

    cs.CV

    Self-Supervised Cross-Modal Text-Image Time Series Retrieval in Remote Sensing

    Authors: Genc Hoxha, Olivér Angyal, Begüm Demir

    Abstract: The development of image time series retrieval (ITSR) methods is a growing research interest in remote sensing (RS). Given a user-defined image time series (i.e., the query time series), ITSR methods search and retrieve from large archives the image time series that have similar content to the query time series. Existing ITSR methods in RS are designed for unimodal retrieval problems, relying on a… ▽ More

    Submitted 15 July, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

  29. arXiv:2501.11493  [pdf, other

    cs.CV cs.AI

    Communication-Efficient Federated Learning Based on Explanation-Guided Pruning for Remote Sensing Image Classification

    Authors: Jonas Klotz, Barış Büyüktaş, Begüm Demir

    Abstract: Federated learning (FL) is a decentralized machine learning paradigm in which multiple clients collaboratively train a global model by exchanging only model updates with the central server without sharing the local data of the clients. Due to the large volume of model updates required to be transmitted between clients and the central server, most FL systems are associated with high transfer costs… ▽ More

    Submitted 16 May, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

    Comments: Accepted at the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2025

  30. arXiv:2410.17264  [pdf, ps, other

    eess.SP cs.LG

    Radio Map Prediction from Aerial Images and Application to Coverage Optimization

    Authors: Fabian Jaensch, Giuseppe Caire, Begüm Demir

    Abstract: Several studies have explored deep learning algorithms to predict large-scale signal fading, or path loss, in urban communication networks. The goal is to replace costly measurement campaigns, inaccurate statistical models, or computationally expensive ray-tracing simulations with machine learning models that deliver quick and accurate predictions. We focus on predicting path loss radio maps using… ▽ More

    Submitted 23 June, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: 13 pages, 8 Figures, To appear in IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:2402.00878

  31. HyCoT: A Transformer-Based Autoencoder for Hyperspectral Image Compression

    Authors: Martin Hermann Paul Fuchs, Behnood Rasti, Begüm Demir

    Abstract: The development of learning-based hyperspectral image (HSI) compression models has recently attracted significant interest. Existing models predominantly utilize convolutional filters, which capture only local dependencies. Furthermore,they often incur high training costs and exhibit substantial computational complexity. To address these limitations, in this paper we propose Hyperspectral Compress… ▽ More

    Submitted 14 November, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted at 14th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 2024

  32. arXiv:2408.00221  [pdf, other

    eess.IV cs.CV

    multiGradICON: A Foundation Model for Multimodal Medical Image Registration

    Authors: Basar Demir, Lin Tian, Thomas Hastings Greer, Roland Kwitt, Francois-Xavier Vialard, Raul San Jose Estepar, Sylvain Bouix, Richard Jarrett Rushmore, Ebrahim Ebrahim, Marc Niethammer

    Abstract: Modern medical image registration approaches predict deformations using deep networks. These approaches achieve state-of-the-art (SOTA) registration accuracy and are generally fast. However, deep learning (DL) approaches are, in contrast to conventional non-deep-learning-based approaches, anatomy-specific. Recently, a universal deep registration approach, uniGradICON, has been proposed. However, u… ▽ More

    Submitted 7 February, 2025; v1 submitted 31 July, 2024; originally announced August 2024.

  33. arXiv:2407.03653  [pdf, other

    cs.CV eess.IV

    reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis

    Authors: Kai Norman Clasen, Leonard Hackel, Tom Burgert, Gencer Sumbul, Begüm Demir, Volker Markl

    Abstract: This paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset a… ▽ More

    Submitted 16 May, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2025. Our code is available at https://github.com/rsim-tu-berlin/bigearthnet-pipeline

  34. arXiv:2406.16513  [pdf, other

    cs.CV

    Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series

    Authors: Theresa Follath, David Mickisch, Jan Hemmerling, Stefan Erasmi, Marcel Schwieder, Begüm Demir

    Abstract: Using images acquired by different satellite sensors has shown to improve classification performance in the framework of crop mapping from satellite image time series (SITS). Existing state-of-the-art architectures use self-attention mechanisms to process the temporal dimension and convolutions for the spatial dimension of SITS. Motivated by the success of purely attention-based architectures in c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 1 table. Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024. Our code is available at https://git.tu-berlin.de/rsim/mmtsvit

  35. arXiv:2406.10107  [pdf, other

    cs.CV

    Annotation Cost-Efficient Active Learning for Deep Metric Learning Driven Remote Sensing Image Retrieval

    Authors: Genc Hoxha, Gencer Sumbul, Julia Henkel, Lars Möllenbrok, Begüm Demir

    Abstract: Deep metric learning (DML) has shown to be effective for content-based image retrieval (CBIR) in remote sensing (RS). Most of DML methods for CBIR rely on a high number of annotated images to accurately learn model parameters of deep neural networks (DNNs). However, gathering such data is time-consuming and costly. To address this, we propose an annotation cost-efficient active learning (ANNEAL) m… ▽ More

    Submitted 5 August, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in the IEEE Transactions on Geoscience and Remote Sensing (TGRS)

  36. MagicBathyNet: A Multimodal Remote Sensing Dataset for Bathymetry Prediction and Pixel-based Classification in Shallow Waters

    Authors: Panagiotis Agrafiotis, Łukasz Janowski, Dimitrios Skarlatos, Begüm Demir

    Abstract: Accurate, detailed, and high-frequent bathymetry, coupled with complex semantic content, is crucial for the undermapped shallow seabed areas facing intense climatological and anthropogenic pressures. Current methods exploiting remote sensing images to derive bathymetry or seabed classes mainly exploit non-open data. This lack of openly accessible benchmark archives prevents the wider use of deep l… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, 5 tables. Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024

    Journal ref: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 2024, pp. 249-253

  37. arXiv:2405.15405  [pdf, other

    cs.CV

    Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification

    Authors: Barış Büyüktaş, Kenneth Weitzel, Sebastian Völkers, Felix Zailskas, Begüm Demir

    Abstract: Federated learning (FL) aims to collaboratively learn deep learning model parameters from decentralized data archives (i.e., clients) without accessing training data on clients. However, the training data across clients might be not independent and identically distributed (non-IID), which may result in difficulty in achieving optimal model convergence. In this work, we investigate the capability o… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024. Our code is available at https://git.tu-berlin.de/rsim/FL-Transformer

  38. A Label Propagation Strategy for CutMix in Multi-Label Remote Sensing Image Classification

    Authors: Tom Burgert, Kai Norman Clasen, Jonas Klotz, Tim Siebert, Begüm Demir

    Abstract: The development of supervised deep learning-based methods for multi-label scene classification (MLC) is one of the prominent research directions in remote sensing (RS). However, collecting annotations for large RS image archives is time-consuming and costly. To address this issue, several data augmentation methods have been introduced in RS. Among others, the CutMix data augmentation technique, wh… ▽ More

    Submitted 5 November, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted at IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

  39. Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images

    Authors: Tom Burgert, Begüm Demir

    Abstract: The application of data augmentation for deep learning (DL) methods plays an important role in achieving state-of-the-art results in supervised, semi-supervised, and self-supervised image classification. In particular, channel transformations (e.g., solarize, grayscale, brightness adjustments) are integrated into data augmentation pipelines for remote sensing (RS) image classification tasks. Howev… ▽ More

    Submitted 24 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted at the IEEE International Geoscience and Remote Sensing Symposium

  40. arXiv:2402.09816  [pdf, ps, other

    cs.CV

    Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment

    Authors: Angelos Zavras, Dimitrios Michail, Begüm Demir, Ioannis Papoutsis

    Abstract: Deep Learning (DL) is undergoing a paradigm shift with the emergence of foundation models. In this work, we focus on Contrastive Language-Image Pre-training (CLIP), a Vision-Language foundation model that achieves high accuracy across various image classification tasks and often rivals fully supervised baselines, despite not being explicitly trained for those tasks. Nevertheless, there are still d… ▽ More

    Submitted 18 July, 2025; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted at the ISPRS Journal of Photogrammetry and Remote Sensing. Our code implementation and weights for all experiments are publicly available at https://github.com/Orion-AI-Lab/MindTheModalityGap

  41. arXiv:2402.00878  [pdf, other

    cs.NI cs.LG eess.SP

    Radio Map Estimation -- An Open Dataset with Directive Transmitter Antennas and Initial Experiments

    Authors: Fabian Jaensch, Giuseppe Caire, Begüm Demir

    Abstract: Over the last years, several works have explored the application of deep learning algorithms to determine the large-scale signal fading (also referred to as ``path loss'') between transmitter and receiver pairs in urban communication networks. The central idea is to replace costly measurement campaigns, inaccurate statistical models or computationally expensive ray-tracing simulations by machine l… ▽ More

    Submitted 12 January, 2024; originally announced February 2024.

    Comments: 13 pages, 121 figures, This work has been submitted to the IEEE for possible publication

  42. Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing

    Authors: Jakob Hackstein, Gencer Sumbul, Kai Norman Clasen, Begüm Demir

    Abstract: Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning, and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor,… ▽ More

    Submitted 11 December, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted at the IEEE Transactions on Geoscience and Remote Sensing. Our code is available at https://github.com/jakhac/CSMAE

  43. Federated Learning Across Decentralized and Unshared Archives for Remote Sensing Image Classification

    Authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir

    Abstract: Federated learning (FL) enables the collaboration of multiple deep learning models to learn from decentralized data archives (i.e., clients) without accessing data on clients. Although FL offers ample opportunities in knowledge discovery from distributed image archives, it is seldom considered in remote sensing (RS). In this paper, as a first time in RS, we present a comparative study of state-of-… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted at the IEEE Geoscience and Remote Sensing Magazine

  44. arXiv:2307.01741  [pdf, other

    cs.CV

    Ben-ge: Extending BigEarthNet with Geographical and Environmental Data

    Authors: Michael Mommert, Nicolas Kesseli, Joëlle Hanna, Linus Scheibenreif, Damian Borth, Begüm Demir

    Abstract: Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and e… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Accepted for presentation at the IEEE International Geoscience and Remote Sensing Symposium 2023

  45. arXiv:2306.11605  [pdf, other

    cs.CV

    Annotation Cost Efficient Active Learning for Content Based Image Retrieval

    Authors: Julia Henkel, Genc Hoxha, Gencer Sumbul, Lars Möllenbrok, Begüm Demir

    Abstract: Deep metric learning (DML) based methods have been found very effective for content-based image retrieval (CBIR) in remote sensing (RS). For accurately learning the model parameters of deep neural networks, most of the DML methods require a high number of annotated training images, which can be costly to gather. To address this problem, in this paper we present an annotation cost efficient active… ▽ More

    Submitted 26 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/ANNEAL

  46. arXiv:2306.08575  [pdf, other

    cs.CV

    Label Noise Robust Image Representation Learning based on Supervised Variational Autoencoders in Remote Sensing

    Authors: Gencer Sumbul, Begüm Demir

    Abstract: Due to the publicly available thematic maps and crowd-sourced data, remote sensing (RS) image annotations can be gathered at zero cost for training deep neural networks (DNNs). However, such annotation sources may increase the risk of including noisy labels in training data, leading to inaccurate RS image representation learning (IRL). To address this issue, in this paper we propose a label noise… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/RS-IRL-SVAE

  47. arXiv:2306.06908  [pdf, other

    cs.CV

    Active Learning Guided Fine-Tuning for enhancing Self-Supervised Based Multi-Label Classification of Remote Sensing Images

    Authors: Lars Möllenbrok, Begüm Demir

    Abstract: In recent years, deep neural networks (DNNs) have been found very successful for multi-label classification (MLC) of remote sensing (RS) images. Self-supervised pre-training combined with fine-tuning on a randomly selected small training set has become a popular approach to minimize annotation efforts of data-demanding DNNs. However, fine-tuning on a small and biased training set may limit model p… ▽ More

    Submitted 21 June, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium 2023

  48. arXiv:2306.01523  [pdf, other

    cs.CV cs.LG

    Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

    Authors: David Hoffmann, Kai Norman Clasen, Begüm Demir

    Abstract: In this paper, we introduce a novel Synchronized Class Token Fusion (SCT Fusion) architecture in the framework of multi-modal multi-label classification (MLC) of remote sensing (RS) images. The proposed architecture leverages modality-specific attention-based transformer encoders to process varying input modalities, while exchanging information across modalities by synchronizing the special class… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium 2023

  49. arXiv:2306.00792  [pdf, other

    cs.CV

    Learning Across Decentralized Multi-Modal Remote Sensing Archives with Federated Learning

    Authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir

    Abstract: The development of federated learning (FL) methods, which aim to learn from distributed databases (i.e., clients) without accessing data on clients, has recently attracted great attention. Most of these methods assume that the clients are associated with the same data modality. However, remote sensing (RS) images in different clients can be associated with different data modalities that can improv… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/MM-FL

  50. arXiv:2306.00758  [pdf, other

    cs.CV cs.LG

    LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing

    Authors: Leonard Hackel, Kai Norman Clasen, Mahdyar Ravanbakhsh, Begüm Demir

    Abstract: Visual question answering (VQA) methods in remote sensing (RS) aim to answer natural language questions with respect to an RS image. Most of the existing methods require a large amount of computational resources, which limits their application in operational scenarios in RS. To address this issue, in this paper we present an effective lightweight transformer-based VQA in RS (LiT-4-RSVQA) architect… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium 2023