Skip to main content

Showing 1–50 of 88 results for author: Pal, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.26836  [pdf, ps, other

    eess.IV cs.CV

    Reliability-Aware Weighted Multi-Scale Spatio-Temporal Maps for Heart Rate Monitoring

    Authors: Arpan Bairagi, Rakesh Dey, Siladittya Manna, Umapada Pal

    Abstract: Remote photoplethysmography (rPPG) allows for the contactless estimation of physiological signals from facial videos by analyzing subtle skin color changes. However, rPPG signals are extremely susceptible to illumination changes, motion, shadows, and specular reflections, resulting in low-quality signals in unconstrained environments. To overcome these issues, we present a Reliability-Aware Weight… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 6 pages, 4 figures. Under review at ICIP 2026

  2. arXiv:2601.09497  [pdf, ps, other

    cs.CV cs.LG

    Towards Robust Cross-Dataset Object Detection Generalization under Domain Specificity

    Authors: Ritabrata Chakraborty, Hrishit Mitra, Shivakumara Palaiahnakote, Umapada Pal

    Abstract: Object detectors often perform well in-distribution, yet degrade sharply on a different benchmark. We study cross-dataset object detection (CD-OD) through a lens of setting specificity. We group benchmarks into setting-agnostic datasets with diverse everyday scenes and setting-specific datasets tied to a narrow environment, and evaluate a standard detector family across all train--test pairs. This… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Comments: 15 pages, 4 figures, 6 tables

  3. arXiv:2510.21887  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications

    Authors: Shamim Yazdani, Akansha Singh, Nripsuta Saxena, Zichong Wang, Avash Palikhe, Deng Pan, Umapada Pal, Jie Yang, Wenbin Zhang

    Abstract: In recent years, deep learning based generative models, particularly Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models (DMs), have been instrumental in in generating diverse, high-quality content across various domains, such as image and video synthesis. This capability has led to widespread adoption of these models and has captured strong public interes… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: Accepted by the Journal of Big Data

  4. arXiv:2509.04123  [pdf, ps, other

    cs.CV

    TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering

    Authors: Ayan Banerjee, Josep Lladós, Umapada Pal, Anjan Dutta

    Abstract: Text-to-story visualization is challenging due to the need for consistent interaction among multiple characters across frames. Existing methods struggle with character consistency, leading to artifact generation and inaccurate dialogue rendering, which results in disjointed storytelling. In response, we introduce TaleDiffusion, a novel framework for generating multi-character stories with an itera… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  5. arXiv:2508.10737  [pdf, ps, other

    cs.CV

    Privacy-enhancing Sclera Segmentation Benchmarking Competition: SSBC 2025

    Authors: Matej Vitek, Darian Tomašević, Abhijit Das, Sabari Nathan, Gökhan Özbulak, Gözde Ayşe Tataroğlu Özbulak, Jean-Paul Calbimonte, André Anjos, Hariohm Hemant Bhatt, Dhruv Dhirendra Premani, Jay Chaudhari, Caiyong Wang, Jian Jiang, Chi Zhang, Qi Zhang, Iyyakutti Iyappan Ganapathi, Syed Sadaf Ali, Divya Velayudan, Maregu Assefa, Naoufel Werghi, Zachary A. Daniels, Leeon John, Ritesh Vyas, Jalil Nourmohammadi Khiarak, Taher Akbari Saeed , et al. (10 additional authors not shown)

    Abstract: This paper presents a summary of the 2025 Sclera Segmentation Benchmarking Competition (SSBC), which focused on the development of privacy-preserving sclera-segmentation models trained using synthetically generated ocular images. The goal of the competition was to evaluate how well models trained on synthetic data perform in comparison to those trained on real-world datasets. The competition featu… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: IEEE International Joint Conference on Biometrics (IJCB) 2025, 13 pages

  6. arXiv:2506.20255  [pdf, ps, other

    cs.CV cs.LG

    A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features

    Authors: Ayush Lodh, Ritabrata Chakraborty, Shivakumara Palaiahnakote, Umapada Pal

    Abstract: We posit that handwriting recognition benefits from complementary cues carried by the rasterized complex glyph and the pen's trajectory, yet most systems exploit only one modality. We introduce an end-to-end network that performs early fusion of offline images and online stroke data within a shared latent space. A patch encoder converts the grayscale crop into fixed-length visual tokens, while a l… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 15 pages, 7 figures

  7. arXiv:2503.23819  [pdf, other

    cs.LG cs.AI cs.CV

    Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics

    Authors: Swarnava Bhattacharyya, Umapada Pal, Tapabrata Chakraborti

    Abstract: Deep learning based diagnostic AI systems based on medical images are starting to provide similar performance as human experts. However these data hungry complex systems are inherently black boxes and therefore slow to be adopted for high risk applications like healthcare. This problem of lack of transparency is exacerbated in the case of recent large foundation models, which are trained in a self… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  8. arXiv:2503.15639  [pdf, other

    cs.CV cs.AI

    A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition

    Authors: Ritabrata Chakraborty, Shivakumara Palaiahnakote, Umapada Pal, Cheng-Lin Liu

    Abstract: Modern scene text recognition systems often depend on large end-to-end architectures that require extensive training and are prohibitively expensive for real-time scenarios. In such cases, the deployment of heavy models becomes impractical due to constraints on memory, computational resources, and latency. To address these challenges, we propose a novel, training-free plug-and-play framework that… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  9. arXiv:2502.14007  [pdf, other

    cs.GR cs.AI cs.CV cs.MM eess.IV

    d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

    Abstract: Structural guidance in an image-to-image translation allows intricate control over the shapes of synthesized images. Generating high-quality realistic images from user-specified rough hand-drawn sketches is one such task that aims to impose a structural constraint on the conditional generation process. While the premise is intriguing for numerous use cases of content creation and academic research… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted in The International Conference on Pattern Recognition (ICPR) 2024

  10. arXiv:2502.13637  [pdf, other

    cs.CV cs.MM

    Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

    Abstract: Human affordance learning investigates contextually relevant novel pose prediction such that the estimated pose represents a valid human action within the scene. While the task is fundamental to machine perception and automated interactive navigation agents, the exponentially large number of probable pose and action variations make the problem challenging and non-trivial. However, the existing dat… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 11 pages

  11. arXiv:2411.16783  [pdf, other

    cs.CV

    CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis

    Authors: Aravindan Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal, Srikrishna Karanam

    Abstract: Despite recent advancements in text-to-image models, achieving semantically accurate images in text-to-image diffusion models is a persistent challenge. While existing initial latent optimization methods have demonstrated impressive performance, we identify two key limitations: (a) attention neglect, where the synthesized image omits certain subjects from the input prompt because they do not have… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 15 pages, 12 figures

  12. arXiv:2410.01441  [pdf, other

    cs.CV

    Decorrelation-based Self-Supervised Visual Representation Learning for Writer Identification

    Authors: Arkadip Maitra, Shree Mitra, Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: Self-supervised learning has developed rapidly over the last decade and has been applied in many areas of computer vision. Decorrelation-based self-supervised pretraining has shown great promise among non-contrastive algorithms, yielding performance at par with supervised and contrastive self-supervised baselines. In this work, we explore the decorrelation-based paradigm of self-supervised learnin… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  13. arXiv:2408.17171  [pdf, other

    cs.LG

    SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management

    Authors: Jyoti Shokhanda, Utkarsh Pal, Aman Kumar, Soumi Chattopadhyay, Arani Bhattacharya

    Abstract: Optimizing tail latency while efficiently managing computational resources is crucial for delivering high-performance, latency-sensitive services in edge computing. Emerging applications, such as augmented reality, require low-latency computing services with high reliability on user devices, which often have limited computational capabilities. Consequently, these devices depend on nearby edge serv… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  14. arXiv:2408.14998  [pdf, other

    cs.CV

    FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

    Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós, Saumik Bhattacharya

    Abstract: The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-… ▽ More

    Submitted 12 March, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in ICPR 2024

  15. arXiv:2408.06235  [pdf, other

    cs.CV

    Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images

    Authors: Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: Medical image segmentation is one of the domains where sufficient annotated data is not available. This necessitates the application of low-data frameworks like few-shot learning. Contemporary prototype-based frameworks often do not account for the variation in features within the support and query images, giving rise to a large variance in prototype alignment. In this work, we adopt a prototype-b… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ICPR 2024

  16. arXiv:2408.00684  [pdf

    cs.CL

    Assessing the Variety of a Concept Space Using an Unbiased Estimate of Rao's Quadratic Index

    Authors: Anubhab Majumder, Ujjwal Pal, Amaresh Chakrabarti

    Abstract: Past research relates design creativity to 'divergent thinking,' i.e., how well the concept space is explored during the early phase of design. Researchers have argued that generating several concepts would increase the chances of producing better design solutions. 'Variety' is one of the parameters by which one can quantify the breadth of a concept space explored by the designers. It is useful to… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  17. MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

    Authors: Miguel A. Ferrer, Abhijit Das, Moises Diaz, Aythami Morales, Cristina Carmona-Duarte, Umapada Pal

    Abstract: Script identification plays a vital role in applications that involve handwriting and document analysis within a multi-script and multi-lingual environment. Moreover, it exhibits a profound connection with human cognition. This paper provides a new database for benchmarking script identification algorithms, which contains both printed and handwritten documents collected from a wide variety of scri… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Journal ref: Cognitive Computation, Volume 16, pages 131 to 157,(2024)

  18. arXiv:2404.00412  [pdf, ps, other

    cs.CV cs.LG

    CraftSVG: Multi-Object Text-to-SVG Synthesis via Layout Guided Diffusion

    Authors: Ayan Banerjee, Nityanand Mathur, Josep Llados, Umapada Pal, Anjan Dutta

    Abstract: Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect… ▽ More

    Submitted 28 November, 2025; v1 submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2402.11401  [pdf, other

    cs.CV cs.LG

    GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constr… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  20. Static and Dynamic Synthesis of Bengali and Devanagari Signatures

    Authors: Miguel A. Ferrer, Sukalpa Chanda, Moises Diaz, Chayan Kr. Banerjee, Anirban Majumdar, Cristina Carmona-Duarte, Parikshit Acharya, Umapada Pal

    Abstract: Developing an automatic signature verification system is challenging and demands a large number of training samples. This is why synthetic handwriting generation is an emerging topic in document image analysis. Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement. Specifically, a mot… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted version. Published on IEEE Transactions on Cybernetics [ISSN 2168-2267], v. 48(10), p. 2896-2907

    Journal ref: IEEE Transactions on Cybernetics, v. 48(10), p. 2896-2907, 2018

  21. arXiv:2312.03946  [pdf, other

    cs.CV

    A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

    Authors: Risab Biswas, Swalpa Kumar Roy, Umapada Pal

    Abstract: Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose \textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder architecture based on a To… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2312.03568

  22. arXiv:2312.03568  [pdf, other

    cs.CV

    DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

    Authors: Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang

    Abstract: In real life, various degradation scenarios exist that might damage document images, making it harder to recognize and analyze them, thus binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer (Document Binarization Transformer), a novel two-level vision transformer (TL-ViT) architecture based on vision trans… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  23. arXiv:2310.00917  [pdf, other

    cs.CV

    Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

    Authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya

    Abstract: The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  24. arXiv:2310.00558  [pdf, other

    cs.CV

    Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

    Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós

    Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of d… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024

  25. arXiv:2308.02905  [pdf, other

    cs.CV cs.MM

    FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework

    Authors: Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya

    Abstract: Scene Text Editing (STE) is a challenging research problem, that primarily aims towards modifying existing texts in an image while preserving the background and the font style of the original text. Despite its utility in numerous real-world applications, existing style-transfer-based approaches have shown sub-par editing performance due to (1) complex image backgrounds, (2) diverse font attributes… ▽ More

    Submitted 5 November, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted in WACV 2025

  26. arXiv:2308.01140  [pdf, other

    cs.LG cs.CV

    Dynamically Scaled Temperature in Self-Supervised Contrastive Learning

    Authors: Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal

    Abstract: In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regu… ▽ More

    Submitted 10 May, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  27. SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed qu… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ICDAR 2023 (San Jose, California)

  28. arXiv:2305.04524  [pdf, other

    cs.CV

    Scene Text Recognition with Image-Text Matching-guided Dictionary

    Authors: Jiajun Wei, Hongjian Zhan, Xiao Tu, Yue Lu, Umapada Pal

    Abstract: Employing a dictionary can efficiently rectify the deviation between the visual prediction and the ground truth in scene text recognition methods. However, the independence of the dictionary on the visual features may lead to incorrect rectification of accurate visual predictions. In this paper, we propose a new dictionary language model leveraging the Scene Image-Text Matching(SITM) network, whic… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ICDAR2023

  29. SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

    Authors: Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal

    Abstract: Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal… ▽ More

    Submitted 20 August, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

    Journal ref: ICDAR 2023 (International Conference on Document Analysis and Recognition) Lecture Notes in Computer Science, vol 14187, pp. 342-360. Springer Nature

  30. arXiv:2304.11993  [pdf, other

    cs.CV cs.MM

    MMC: Multi-Modal Colorization of Images using Textual Descriptions

    Authors: Subhankar Ghosh, Saumik Bhattacharya, Prasun Roy, Umapada Pal, Michael Blumenstein

    Abstract: Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of… ▽ More

    Submitted 25 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 9 pages

  31. arXiv:2304.04376  [pdf, other

    cs.CV

    ICDAR 2023 Video Text Reading Competition for Dense and Small Text

    Authors: Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

    Abstract: Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Journal ref: ICDAR 2023 competition

  32. arXiv:2303.07989  [pdf, other

    cs.CV cs.HC

    A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing

    Authors: Prasun Roy, Subhankar Ghosh, Umapada Pal

    Abstract: Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marke… ▽ More

    Submitted 18 February, 2025; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted in The International Conference on Frontiers of Handwriting Recognition (ICFHR) 2018

  33. arXiv:2302.14728  [pdf, other

    cs.CV cs.MM

    Semantically Consistent Person Image Generation

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

    Abstract: We propose a data-driven approach for context-aware person image generation. Specifically, we attempt to generate a person image such that the synthesized instance can blend into a complex scene. In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene. The proposed technique is divided into three sequential steps.… ▽ More

    Submitted 18 February, 2025; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted in The International Conference on Pattern Recognition (ICPR) 2024

  34. arXiv:2208.02843  [pdf, other

    cs.CV

    TIC: Text-Guided Image Colorization

    Authors: Subhankar Ghosh, Prasun Roy, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  35. arXiv:2207.11718  [pdf, other

    cs.CV cs.MM

    TIPS: Text-Induced Pose Synthesis

    Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying… ▽ More

    Submitted 18 February, 2025; v1 submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted in The European Conference on Computer Vision (ECCV) 2022

  36. arXiv:2207.10256  [pdf, other

    cs.CV

    SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

    Authors: Dajian Zhong, Shujing Lyu, Palaiahnakote Shivakumara, Bing Yin, Jiajia Wu, Umapada Pal, Yue Lu

    Abstract: Scene text recognition is a challenging task due to the complex backgrounds and diverse variations of text instances. In this paper, we propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images. The proposed method first generates the simple semantic feature using Semantic GAN and then recognizes the scene text with the Balanced Attention Module.… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  37. arXiv:2206.02717  [pdf, other

    cs.CV cs.MM

    Scene Aware Person Image Generation through Global Contextual Conditioning

    Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the pe… ▽ More

    Submitted 18 February, 2025; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted in The International Conference on Pattern Recognition (ICPR) 2022

  38. arXiv:2202.13078  [pdf, other

    cs.CV cs.LG eess.IV

    SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature Verification

    Authors: Siladittya Manna, Soumitri Chattopadhyay, Saumik Bhattacharya, Umapada Pal

    Abstract: Writer independent offline signature verification is one of the most challenging tasks in pattern recognition as there is often a scarcity of training data. To handle such data scarcity problem, in this paper, we propose a novel self-supervised learning (SSL) framework for writer independent offline signature verification. To our knowledge, this is the first attempt to utilize self-supervised sett… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted at IEEE ICIP 2022

  39. arXiv:2202.06777  [pdf, other

    cs.CV cs.MM

    Multi-scale Attention Guided Pose Transfer

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

    Abstract: Pose transfer refers to the probabilistic image generation of a person with a previously unseen novel pose from another image of that person having a different pose. Due to potential academic and commercial applications, this problem is extensively studied in recent years. Among the various approaches to the problem, attention guided progressive generation is shown to produce state-of-the-art resu… ▽ More

    Submitted 18 February, 2025; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted in Pattern Recognition (PR) 2023

  40. arXiv:2201.11438  [pdf, other

    cs.CV

    DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

    Authors: Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal

    Abstract: Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects (title, sections, figures etc.) has emerged as an interesting problem fo… ▽ More

    Submitted 21 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Preprint

  41. arXiv:2201.10252  [pdf, other

    cs.CV

    DocEnTr: An End-to-End Document Image Enhancement Transformer

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal

    Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: submitted to ICPR 2022

  42. arXiv:2201.10138  [pdf, other

    cs.CV

    SURDS: Self-Supervised Attention-guided Reconstruction and Dual Triplet Loss for Writer Independent Offline Signature Verification

    Authors: Soumitri Chattopadhyay, Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: Offline Signature Verification (OSV) is a fundamental biometric task across various forensic, commercial and legal applications. The underlying task at hand is to carefully model fine-grained features of the signatures to distinguish between genuine and forged ones, which differ only in minute deformities. This makes OSV more challenging compared to other verification problems. In this work, we pr… ▽ More

    Submitted 26 June, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted at ICPR 2022

  43. arXiv:2111.12664  [pdf, other

    cs.CV stat.ML

    MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

    Authors: Siladittya Manna, Umapada Pal, Saumik Bhattacharya

    Abstract: Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect. We further improve the näive loss function after removing the effect of the positive-positive repulsion and incorporating… ▽ More

    Submitted 14 April, 2025; v1 submitted 24 November, 2021; originally announced November 2021.

  44. arXiv:2111.10618  [pdf, other

    eess.IV cs.CV

    PAANet: Progressive Alternating Attention for Automatic Medical Image Segmentation

    Authors: Abhishek Srivastava, Sukalpa Chanda, Debesh Jha, Michael A. Riegler, Pål Halvorsen, Dag Johansen, Umapada Pal

    Abstract: Medical image segmentation can provide detailed information for clinical analysis which can be useful for scenarios where the detailed location of a finding is important. Knowing the location of disease can play a vital role in treatment and decision-making. Convolutional neural network (CNN) based encoder-decoder techniques have advanced the performance of automated medical image segmentation sys… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

  45. arXiv:2111.10614  [pdf, other

    eess.IV cs.CV

    GMSRF-Net: An improved generalizability with global multi-scale residual fusion network for polyp segmentation

    Authors: Abhishek Srivastava, Sukalpa Chanda, Debesh Jha, Umapada Pal, Sharib Ali

    Abstract: Colonoscopy is a gold standard procedure but is highly operator-dependent. Efforts have been made to automate the detection and segmentation of polyps, a precancerous precursor, to effectively minimize missed rate. Widely used computer-aided polyp segmentation systems actuated by encoder-decoder have achieved high performance in terms of accuracy. However, polyp segmentation datasets collected fro… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

  46. arXiv:2111.10605  [pdf, other

    cs.CV

    Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction Techniques for Text-Independent Writer Identification

    Authors: Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

    Abstract: Text independent writer identification is a challenging problem that differentiates between different handwriting styles to decide the author of the handwritten text. Earlier writer identification relied on handcrafted features to reveal pieces of differences between writers. Recent work with the advent of convolutional neural network, deep learning-based methods have evolved. In this paper, three… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 14 pages, 4 figures

  47. arXiv:2111.10591  [pdf, other

    cs.CV

    AGA-GAN: Attribute Guided Attention Generative Adversarial Network with U-Net for Face Hallucination

    Authors: Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

    Abstract: The performance of facial super-resolution methods relies on their ability to recover facial structures and salient features effectively. Even though the convolutional neural network and generative adversarial network-based methods deliver impressive performances on face hallucination tasks, the ability to use attributes associated with the low-resolution images to improve performance is unsatisfa… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 27 pages, 9 Figures

  48. arXiv:2108.09335  [pdf, other

    cs.CV cs.LG

    LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning

    Authors: Bhavya Vasudeva, Puneesh Deora, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

    Abstract: Deep metric learning has been effectively used to learn distance metrics for different visual tasks like image retrieval, clustering, etc. In order to aid the training process, existing methods either use a hard mining strategy to extract the most informative samples or seek to generate hard synthetics using an additional network. Such approaches face different challenges and can lead to biased em… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: 17 pages, 9 figures, 5 tables. Accepted at The IEEE/CVF International Conference on Computer Vision (ICCV) 2021

  49. arXiv:2107.04357  [pdf, other

    cs.CV cs.LG

    Graph-based Deep Generative Modelling for Document Layout Generation

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR Workshops-GLESDO 2021

  50. arXiv:2107.02638  [pdf, other

    cs.CV

    DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR 2021