Skip to main content

Showing 1–50 of 213 results for author: Dutta, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.02108  [pdf, ps, other

    cs.RO cs.LG

    Cross-Modal Visuo-Tactile Object Perception

    Authors: Anirvan Dutta, Simone Tasciotti, Claudia Cusseddu, Ang Li, Panayiota Poirazi, Julijana Gjorgjieva, Etienne Burdet, Patrick van der Smagt, Mohsen Kaboli

    Abstract: Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: 23 pages, 8 figures, 1 table. Submitted for review to journal

  2. arXiv:2603.23849  [pdf, ps, other

    cs.IR

    VILLA: Versatile Information Retrieval From Scientific Literature Using Large LAnguage Models

    Authors: Blessy Antony, Amartya Dutta, Sneha Aggarwal, Vasu Gatne, Ozan Gökdemir, Samantha Grimes, Adam Lauring, Brian R. Wasik, Anuj Karpatne, T. M. Murali

    Abstract: The lack of high-quality ground truth datasets to train machine learning (ML) models impedes the potential of artificial intelligence (AI) for science research. Scientific information extraction (SIE) from the literature using LLMs is emerging as a powerful approach to automate the creation of these datasets. However, existing LLM-based approaches and benchmarking studies for SIE focus on broad to… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: Under review at ACM KDD 2026 (AI for Sciences Track)

  3. arXiv:2603.19225  [pdf, ps, other

    cs.CE cs.AI cs.CL cs.IR q-fin.CP

    FinTradeBench: A Financial Reasoning Benchmark for LLMs

    Authors: Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan, Santu Karmaker, Aritra Dutta

    Abstract: Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial qu… ▽ More

    Submitted 20 March, 2026; v1 submitted 19 March, 2026; originally announced March 2026.

    Comments: 8 pages main text, 22 pages total (including references and appendix). 5 figures, 14 tables. Preprint under review. Code and data will be made available upon publication

    MSC Class: 68T50 (Primary); 91Gxx (Secondary) ACM Class: I.2.7

  4. arXiv:2603.10352  [pdf, ps, other

    cs.RO

    Adaptive Manipulation Potential and Haptic Estimation for Tool-Mediated Interaction

    Authors: Lin Yang, Anirvan Dutta, Yuan Ji, Yanxin Zhou, Shilin Shan, Lv Chen, Etienne Burdet, Domenico Campolo

    Abstract: Achieving human-level dexterity in contact-rich, tool-mediated manipulation remains a significant challenge due to visual occlusion and the underdetermined nature of haptic sensing. This paper introduces a parameterized Equilibrium Manifold (EM) as a unified representation for tool-mediated interaction, and develops a closed-loop framework that integrates haptic estimation, online planning, and ad… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  5. arXiv:2603.06731  [pdf, ps, other

    cs.PL cs.LG

    PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks

    Authors: Uday Bondhugula, Akshay Baviskar, Navdeep Katel, Vimal Patel, Anoop JS, Arnab Dutta

    Abstract: We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests and SSA, primarily relying on lightweight affine access analysis; the transformations are stitched together in specialized ways to realize high-performance code… ▽ More

    Submitted 10 March, 2026; v1 submitted 5 March, 2026; originally announced March 2026.

    Comments: Fixed the "Acknowledgments" section that was missing phrases

  6. arXiv:2603.02223  [pdf

    cs.LG cs.AI

    Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach

    Authors: Sazzad Bin Bashar Polock, Anandi Dutta, Subasish Das

    Abstract: Wildfire evacuation behavior is highly variable and influenced by complex interactions among household resources, preparedness, and situational cues. Using a large-scale MTurk survey of residents in California, Colorado, and Oregon, this study integrates unsupervised and supervised machine learning methods to uncover latent behavioral typologies and predict key evacuation outcomes. Multiple Corres… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

    Comments: This is the author's preprint version of a paper accepted for presentation at SoutheastConn 2026. The final published version will appear in the official conference proceedings. Conference site: https://ieeesoutheastcon.org/

  7. arXiv:2602.17280  [pdf, ps, other

    cs.CY

    Security at the Border? The Lived Experiences of Refugees and Asylum Seekers in the UK

    Authors: Arshia Dutta, Rikke Bjerg Jensen

    Abstract: We bring to light how some asylum seekers and refugees arriving in the UK experience border control and wider immigration systems, as well as the impact that these have on their subsequent lives in the UK. We do so through participant observation in a support organisation and interviews with caseworkers, asylum seekers and refugees. Specifically, our findings show how the first meeting with the bo… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

    Comments: To appear at ACM CHI 2026

  8. arXiv:2602.12819  [pdf, ps, other

    cs.IR cs.CV

    WISE: A Multimodal Search Engine for Visual Scenes, Audio, Objects, Faces, Speech, and Metadata

    Authors: Prasanna Sridhar, Horace Lee, David M. S. Pinto, Andrew Zisserman, Abhishek Dutta

    Abstract: In this paper, we present WISE, an open-source audiovisual search engine which integrates a range of multimodal retrieval capabilities into a single, practical tool accessible to users without machine learning expertise. WISE supports natural-language and reverse-image queries at both the scene level (e.g. empty street) and object level (e.g. horse) across images and videos; face-based search for… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

    Comments: Software: https://www.robots.ox.ac.uk/~vgg/software/wise/ , Online demos: https://www.robots.ox.ac.uk/~vgg/software/wise/demo/ , Example Queries: https://www.robots.ox.ac.uk/~vgg/software/wise/examples/

    ACM Class: H.3.3

  9. arXiv:2602.07219  [pdf, ps, other

    cs.LG cs.AI

    The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

    Authors: Abhigyan Dutta, Itay Safran, Paul Valiant

    Abstract: We study the approximation of the median of $d$ inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube. By further establishing a general reduction from the maximum to the median, our r… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  10. arXiv:2602.06806  [pdf, ps, other

    cs.CV cs.LG

    RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

    Authors: Silpa Vadakkeeveetil Sreelatha, Dan Wang, Serge Belongie, Muhammad Awais, Anjan Dutta

    Abstract: Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias id… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  11. arXiv:2601.05009  [pdf, ps, other

    cs.AI

    An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions

    Authors: Avik Dutta, Harshit Nigam, Hosein Hasanbeig, Arjun Radhakrishna, Sumit Gulwani

    Abstract: We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack an inherent ability to detect and correct subtle distortions in table representations. Only when provided with an explicit prior, via a system prompt, do models partially adjust their reasoning strate… ▽ More

    Submitted 8 January, 2026; originally announced January 2026.

    Comments: 4 pages, 1 figure, 1 table

  12. arXiv:2512.00321  [pdf, ps, other

    cs.LG eess.SY

    Introducing AI-Driven IoT Energy Management Framework

    Authors: Shivani Mruthyunjaya, Anandi Dutta, Kazi Sifatul Islam

    Abstract: Power consumption has become a critical aspect of modern life due to the consistent reliance on technological advancements. Reducing power consumption or following power usage predictions can lead to lower monthly costs and improved electrical reliability. The proposal of a holistic framework to establish a foundation for IoT systems with a focus on contextual decision making, proactive adaptation… ▽ More

    Submitted 29 November, 2025; originally announced December 2025.

    Comments: Accepted in IEEE Smart World Congress 2025, Calgary, Canada

  13. arXiv:2511.16629  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Stabilizing Policy Gradient Methods via Reward Profiling

    Authors: Shihab Ahmed, El Houcine Bergou, Aritra Dutta, Yue Wang

    Abstract: Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from unreliable reward improvements and slow convergence, due to high variance in gradient estimations. In this paper, we propose a universal reward profiling framework tha… ▽ More

    Submitted 24 January, 2026; v1 submitted 20 November, 2025; originally announced November 2025.

  14. arXiv:2511.08596  [pdf, ps, other

    cs.CL cs.AI cs.CY

    What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge

    Authors: Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh

    Abstract: Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. In the first step, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in ques… ▽ More

    Submitted 30 October, 2025; originally announced November 2025.

  15. arXiv:2510.21876  [pdf

    cs.CV cs.AI

    AI Powered Urban Green Infrastructure Assessment Through Aerial Imagery of an Industrial Township

    Authors: Anisha Dutta

    Abstract: Accurate assessment of urban canopy coverage is crucial for informed urban planning, effective environmental monitoring, and mitigating the impacts of climate change. Traditional practices often face limitations due to inadequate technical requirements, difficulties in scaling and data processing, and the lack of specialized expertise. This study presents an efficient approach for estimating green… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: Presented at IIIE Conference 2024, Jamshedpur

  16. arXiv:2510.21831  [pdf

    cs.IR cs.SE

    Development of an Automated Web Application for Efficient Web Scraping: Design and Implementation

    Authors: Alok Dutta, Nilanjana Roy, Rhythm Sen, Sougata Dutta, Prabhat Das

    Abstract: This paper presents the design and implementation of a user-friendly, automated web application that simplifies and optimizes the web scraping process for non-technical users. The application breaks down the complex task of web scraping into three main stages: fetching, extraction, and execution. In the fetching stage, the application accesses target websites using the HTTP protocol, leveraging th… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

  17. arXiv:2510.17425  [pdf, ps, other

    cs.CY cs.LG

    Quantifying Climate Policy Action and Its Links to Development Outcomes: A Cross-National Data-Driven Analysis

    Authors: Aditi Dutta

    Abstract: Addressing climate change effectively requires more than cataloguing the number of policies in place; it calls for tools that can reveal their thematic priorities and their tangible impacts on development outcomes. Existing assessments often rely on qualitative descriptions or composite indices, which can mask crucial differences between key domains such as mitigation, adaptation, disaster risk ma… ▽ More

    Submitted 13 November, 2025; v1 submitted 20 October, 2025; originally announced October 2025.

    Comments: This paper/proposal has been accepted as a poster in the NeurIPS 2025

  18. arXiv:2510.13835  [pdf, ps, other

    cs.CL cs.AI

    ConDABench: Interactive Evaluation of Language Models for Data Analysis

    Authors: Avik Dutta, Priyanshu Gupta, Hosein Hasanbeig, Rahul Pratap Singh, Harshit Nigam, Sumit Gulwani, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari

    Abstract: Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user's intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framewor… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  19. arXiv:2510.08563  [pdf, ps, other

    math.NA cs.LG math.OC

    Where Have All the Kaczmarz Iterates Gone?

    Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

    Abstract: The randomized Kaczmarz (RK) algorithm is one of the most computationally and memory-efficient iterative algorithms for solving large-scale linear systems. However, practical applications often involve noisy and potentially inconsistent systems. While the convergence of RK is well understood for consistent systems, the study of RK on noisy, inconsistent linear systems is limited. This paper invest… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

    MSC Class: 15A06; 15A09; 15A10; 15A18; 65F10; 65Y20; 68Q25; 68W20; 68W40

  20. arXiv:2509.24802  [pdf, ps, other

    cs.CV cs.CG cs.LG

    TACO-Net: Topological Signatures Triumph in 3D Object Classification

    Authors: Anirban Ghosh, Ayan Dutta

    Abstract: 3D object classification is a crucial problem due to its significant practical relevance in many fields, including computer vision, robotics, and autonomous driving. Although deep learning methods applied to point clouds sampled on CAD models of the objects and/or captured by LiDAR or RGBD cameras have achieved remarkable success in recent years, achieving high classification accuracy remains a ch… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  21. arXiv:2509.23241  [pdf, ps, other

    cs.DC

    Memory Efficient and Staleness Free Pipeline Parallel DNN Training Framework with Improved Convergence Speed

    Authors: Ankita Dutta, Nabendu Chaki, Rajat K. De

    Abstract: High resource requirement for Deep Neural Network (DNN) training across multiple GPUs necessitates development of various parallelism techniques. In this paper, we introduce two interconnected DNN training frameworks, namely, V-TiMePReSt and I-TiMePReSt, based on pipeline parallelism, a variant of model parallelism. V-TiMePReSt is a completely staleness-free system which enables the DNNs to be tra… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  22. arXiv:2509.15257  [pdf, ps, other

    cs.CV cs.LG

    RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

    Authors: Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais, Serge Belongie, Anjan Dutta

    Abstract: The rapid advancement of diffusion models has enabled high-fidelity and semantically rich text-to-image generation; however, ensuring fairness and safety remains an open challenge. Existing methods typically improve fairness and safety at the expense of semantic fidelity and image quality. In this work, we propose RespoDiff, a novel framework for responsible text-to-image generation that incorpora… ▽ More

    Submitted 8 October, 2025; v1 submitted 18 September, 2025; originally announced September 2025.

    Comments: Accepted at NeurIPS 2025

  23. arXiv:2509.14256  [pdf, ps, other

    cs.CL cs.AI

    JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies

    Authors: Arka Dutta, Agrik Majumdar, Sombrata Biswas, Dipankar Das, Sivaji Bandyopadhyay

    Abstract: This paper proposes a comprehensive framework for the generation of covert advertisements within Conversational AI systems, along with robust techniques for their detection. It explores how subtle promotional content can be crafted within AI-generated responses and introduces methods to identify and mitigate such covert advertising strategies. For generation (Sub-Task~1), we propose a novel framew… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

  24. arXiv:2509.11444  [pdf, ps, other

    cs.CL cs.SI

    CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media

    Authors: Gaurab Chhetri, Anandi Dutta, Subasish Das

    Abstract: The emergence of decentralized social media platforms presents new opportunities and challenges for real-time analysis of public discourse. This study introduces CognitiveSky, an open-source and scalable framework designed for sentiment, emotion, and narrative analysis on Bluesky, a federated Twitter or X.com alternative. By ingesting data through Bluesky's Application Programming Interface (API),… ▽ More

    Submitted 14 September, 2025; originally announced September 2025.

    Comments: This is the author's preprint version of a paper accepted for presentation at HICSS 59 (Hawaii International Conference on System Sciences), 2026, Hawaii, USA. The final published version will appear in the official conference proceedings. Conference site: https://hicss.hawaii.edu/

  25. arXiv:2509.05273  [pdf, ps, other

    cs.LG cs.PF

    Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks

    Authors: Jason Gardner, Ayan Dutta, Swapnoneel Roy, O. Patrick Kreidl, Ladislau Boloni

    Abstract: The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: Submitted to a journal - under review

  26. arXiv:2509.04456  [pdf

    cs.CL

    Mentalic Net: Development of RAG-based Conversational AI and Evaluation Framework for Mental Health Support

    Authors: Anandi Dutta, Shivani Mruthyunjaya, Jessica Saddington, Kazi Sifatul Islam

    Abstract: The emergence of large language models (LLMs) has unlocked boundless possibilities, along with significant challenges. In response, we developed a mental health support chatbot designed to augment professional healthcare, with a strong emphasis on safe and meaningful application. Our approach involved rigorous evaluation, covering accuracy, empathy, trustworthiness, privacy, and bias. We employed… ▽ More

    Submitted 26 August, 2025; originally announced September 2025.

    Comments: Preprint Version, Accepted in ISEMV 2025

  27. arXiv:2509.04123  [pdf, ps, other

    cs.CV

    TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering

    Authors: Ayan Banerjee, Josep Lladós, Umapada Pal, Anjan Dutta

    Abstract: Text-to-story visualization is challenging due to the need for consistent interaction among multiple characters across frames. Existing methods struggle with character consistency, leading to artifact generation and inaccurate dialogue rendering, which results in disjointed storytelling. In response, we introduce TaleDiffusion, a novel framework for generating multi-character stories with an itera… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  28. arXiv:2508.19724  [pdf, ps, other

    cs.CL cs.AI

    NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks

    Authors: Aritra Dutta, Swapnanil Mukherjee, Deepanway Ghosal, Somak Aditya

    Abstract: Commonsense visual-question answering often hinges on knowledge that is missing from the image or the question. Small vision-language models (sVLMs) such as ViLT, VisualBERT and FLAVA therefore lag behind their larger generative counterparts. To study the effect of careful commonsense knowledge integration on sVLMs, we present an end-to-end framework (NLKI) that (i) retrieves natural language fact… ▽ More

    Submitted 28 August, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

  29. arXiv:2508.16644  [pdf, ps, other

    cs.CV

    CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

    Authors: Anindya Mondal, Ayan Banerjee, Sauradip Nag, Josep Lladós, Xiatian Zhu, Anjan Dutta

    Abstract: Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative, structured feedback. Our method alternates between synthesis and evaluation: a VLM-based planner generates structured scene layouts, while a VLM-based critic provi… ▽ More

    Submitted 16 March, 2026; v1 submitted 18 August, 2025; originally announced August 2025.

  30. arXiv:2508.15127  [pdf, ps, other

    cs.LG

    Towards Source-Free Machine Unlearning

    Authors: Sk Miraj Ahmed, Umit Yigit Basaran, Dripta S. Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, Amit K. Roy-Chowdhury

    Abstract: As machine learning becomes more pervasive and data privacy regulations evolve, the ability to remove private or copyrighted information from trained models is becoming an increasingly critical requirement. Existing unlearning methods often rely on the assumption of having access to the entire training dataset during the forgetting process. However, this assumption may not hold true in practical s… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

    Comments: Accepted by CVPR 2025

  31. arXiv:2508.11434  [pdf, ps, other

    cs.CL cs.CY

    Online Anti-sexist Speech: Identifying Resistance to Gender Bias in Political Discourse

    Authors: Aditi Dutta, Susan Banducci

    Abstract: Anti-sexist speech, i.e., public expressions that challenge or resist gendered abuse and sexism, plays a vital role in shaping democratic debate online. Yet automated content moderation systems, increasingly powered by large language models (LLMs), may struggle to distinguish such resistance from the sexism it opposes. This study examines how five LLMs classify sexist, anti-sexist, and neutral pol… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  32. arXiv:2508.09836  [pdf, ps, other

    cs.RO

    Embodied Tactile Perception of Soft Objects Properties

    Authors: Anirvan Dutta, Alexis WM Devillard, Zhihuan Zhang, Xiaoxiao Cheng, Etienne Burdet

    Abstract: To enable robots to develop human-like fine manipulation, it is essential to understand how mechanical compliance, multi-modal sensing, and purposeful interaction jointly shape tactile perception. In this study, we use a dedicated modular e-Skin with tunable mechanical compliance and multi-modal sensing (normal, shear forces and vibrations) to systematically investigate how sensing embodiment and… ▽ More

    Submitted 13 August, 2025; originally announced August 2025.

  33. arXiv:2508.06757  [pdf, ps, other

    cs.CV cs.GR

    VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

    Authors: Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit Roy-Chowdhury

    Abstract: Human pose and shape (HPS) estimation methods have been extensively studied, with many demonstrating high zero-shot performance on in-the-wild images and videos. However, these methods often struggle in challenging scenarios involving complex human poses or significant occlusions. Although some studies address 3D human pose estimation under occlusion, they typically evaluate performance on dataset… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

    Journal ref: ICCV 2025

  34. arXiv:2507.20786  [pdf, ps, other

    cs.CL

    Automating Thematic Review of Prevention of Future Deaths Reports: Replicating the ONS Child Suicide Study using Large Language Models

    Authors: Sam Osian, Arpan Dutta, Sahil Bhandari, Iain E. Buchan, Dan W. Joyce

    Abstract: Prevention of Future Deaths (PFD) reports, issued by coroners in England and Wales, flag systemic hazards that may lead to further loss of life. Analysis of these reports has previously been constrained by the manual effort required to identify and code relevant cases. In 2025, the Office for National Statistics (ONS) published a national thematic review of child-suicide PFD reports ($\leq$ 18 yea… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: 8 pages, 1 figure

  35. arXiv:2507.09850  [pdf, ps, other

    cs.AI

    The Challenge of Teaching Reasoning to LLMs Without RL or Distillation

    Authors: Wei Du, Branislav Kisacanin, George Armstrong, Shubham Toshniwal, Ivan Moshkov, Alexan Ayrapetyan, Sadegh Mahdavi, Dan Zhao, Shizhe Diao, Dragan Masulovic, Marius Stanean, Advaith Avadhanam, Max Wang, Ashmit Dutta, Shitij Govil, Sri Yanamandara, Mihir Tandon, Sriram Ananthakrishnan, Vedant Rathi, David Zhang, Joonseok Kang, Leon Luo, Titu Andreescu, Boris Ginsburg, Igor Gitman

    Abstract: Reasoning-capable language models achieve state-of-the-art performance in diverse complex tasks by generating long, explicit Chain-of-Thought (CoT) traces. While recent works show that base models can acquire such reasoning traces via reinforcement learning or distillation from stronger models like DeepSeek-R1, previous works demonstrate that even short CoT prompting without fine-tuning is able to… ▽ More

    Submitted 16 July, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

    Comments: Accepted at the Second AI for Math Workshop at the 42nd International Conference on Machine Learning (ICML 2025)

  36. arXiv:2507.07251  [pdf, ps, other

    cs.IR cs.CL cs.LG

    A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms

    Authors: Aaron Goldstein, Ayan Dutta

    Abstract: Traditional recommendation algorithms are not designed to provide personalized recommendations based on user preferences provided through text, e.g., "I enjoy light-hearted comedies with a lot of humor". Large Language Models (LLMs) have emerged as one of the most promising tools for natural language processing in recent years. This research proposes a novel framework that mimics how a close frien… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  37. arXiv:2507.06960  [pdf, ps, other

    cs.RO

    Bounomodes: the grazing ox algorithm for exploration of clustered anomalies

    Authors: Samuel Matloob, Ayan Dutta, O. Patrick Kreidl, Swapnonel Roy, Ladislau Bölöni

    Abstract: A common class of algorithms for informative path planning (IPP) follows boustrophedon ("as the ox turns") patterns, which aim to achieve uniform area coverage. However, IPP is often applied in scenarios where anomalies, such as plant diseases, pollution, or hurricane damage, appear in clusters. In such cases, prioritizing the exploration of anomalous regions over uniform coverage is beneficial. T… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  38. From Tiny Machine Learning to Tiny Deep Learning: A Survey

    Authors: Shriyank Somvanshi, Md Monzurul Islam, Gaurab Chhetri, Rohit Chakraborty, Mahmuda Sultana Mimi, Sawgat Ahmed Shuvo, Kazi Sifatul Islam, Syed Aaqib Javed, Sharif Ahmed Rafat, Anandi Dutta, Subasish Das

    Abstract: The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely… ▽ More

    Submitted 25 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

    Journal ref: ACM CS 2025

  39. arXiv:2506.18725  [pdf, ps, other

    cs.RO cs.CG cs.CV

    TopoRec: Point Cloud Recognition Using Topological Data Analysis

    Authors: Anirban Ghosh, Iliya Kulbaka, Ian Dahlin, Ayan Dutta

    Abstract: Point cloud-based object/place recognition remains a problem of interest in applications such as autonomous driving, scene reconstruction, and localization. Extracting a meaningful global descriptor from a query point cloud that can be matched with the descriptors of the database point clouds is a challenging problem. Furthermore, when the query point cloud is noisy or has been transformed (e.g.,… ▽ More

    Submitted 31 July, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  40. arXiv:2506.13910  [pdf

    cs.CV cs.AI

    Intelligent Image Sensing for Crime Analysis: A ML Approach towards Enhanced Violence Detection and Investigation

    Authors: Aritra Dutta, Pushpita Boral, G Suseela

    Abstract: The increasing global crime rate, coupled with substantial human and property losses, highlights the limitations of traditional surveillance methods in promptly detecting diverse and unexpected acts of violence. Addressing this pressing need for automatic violence detection, we leverage Machine Learning to detect and categorize violent events in video streams. This paper introduces a comprehensive… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  41. arXiv:2506.08189  [pdf, ps, other

    cs.CV cs.CL

    Open World Scene Graph Generation using Vision Language Models

    Authors: Amartya Dutta, Kazi Sajeed Mehrab, Medha Sawhney, Abhilash Neog, Mridul Khurana, Sepideh Fatemi, Aanish Pradhan, M. Maruf, Ismini Lourentzou, Arka Daw, Anuj Karpatne

    Abstract: Scene-Graph Generation (SGG) seeks to recognize objects in an image and distill their salient pairwise relationships. Most methods depend on dataset-specific supervision to learn the variety of interactions, restricting their usefulness in open-world settings, involving novel objects and/or relations. Even methods that leverage large Vision Language Models (VLMs) typically require benchmark-specif… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted in CVPR 2025 Workshop (CVinW)

  42. arXiv:2506.05508  [pdf, ps, other

    cs.DC cs.AI

    Beyond the Buzz: A Pragmatic Take on Inference Disaggregation

    Authors: Tiyasa Mitra, Ritika Borkar, Nidhi Bhatia, Ramon Matas, Shivam Raj, Dheevatsa Mudigere, Ritchie Zhao, Maximilian Golub, Arpan Dutta, Sailaja Madduri, Dharmesh Jani, Brian Pharris, Bita Darvish Rouhani

    Abstract: As inference scales to multi-node deployments, disaggregation - splitting inference into distinct phases - offers a promising path to improving the throughput-interactivity Pareto frontier. Despite growing enthusiasm and a surge of open-source efforts, practical deployment of disaggregated serving remains limited due to the complexity of the optimization search space and system-level coordination.… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  43. arXiv:2506.04238  [pdf, ps, other

    cs.NE cs.LG

    A Review on Influx of Bio-Inspired Algorithms: Critique and Improvement Needs

    Authors: Shriyank Somvanshi, Md Monzurul Islam, Syed Aaqib Javed, Gaurab Chhetri, Kazi Sifatul Islam, Tausif Islam Chowdhury, Sazzad Bin Bashar Polock, Anandi Dutta, Subasish Das

    Abstract: Bio-inspired algorithms utilize natural processes such as evolution, swarm behavior, foraging, and plant growth to solve complex, nonlinear, high-dimensional optimization problems. However, a plethora of these algorithms require a more rigorous review before making them applicable to the relevant fields. This survey categorizes these algorithms into eight groups: evolutionary, swarm intelligence,… ▽ More

    Submitted 15 September, 2025; v1 submitted 25 May, 2025; originally announced June 2025.

  44. arXiv:2506.03160  [pdf

    cs.LG

    Applying MambaAttention, TabPFN, and TabTransformers to Classify SAE Automation Levels in Crashes

    Authors: Shriyank Somvanshi, Anannya Ghosh Tusti, Mahmuda Sultana Mimi, Md Monzurul Islam, Sazzad Bin Bashar Polock, Anandi Dutta, Subasish Das

    Abstract: The increasing presence of automated vehicles (AVs) presents new challenges for crash classification and safety analysis. Accurately identifying the SAE automation level involved in each crash is essential to understanding crash dynamics and system accountability. However, existing approaches often overlook automation-specific factors and lack model sophistication to capture distinctions between d… ▽ More

    Submitted 22 May, 2025; originally announced June 2025.

  45. arXiv:2505.22483  [pdf, ps, other

    cs.LG cs.AI cs.CV

    A Closer Look at Multimodal Representation Collapse

    Authors: Abhra Chaudhuri, Anjan Dutta, Tu Bui, Serban Georgescu

    Abstract: We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that modality collapse happens when noisy features from one modality are entangled, via a shared set of neurons in the fusion head, with predictive features from another,… ▽ More

    Submitted 14 August, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: International Conference on Machine Learning (ICML) 2025 (Spotlight)

  46. arXiv:2504.06160  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.LG cs.SI

    Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

    Authors: Rijul Magu, Arka Dutta, Sean Kim, Ashiqur R. KhudaBukhsh, Munmun De Choudhury

    Abstract: Large Language Models (LLMs) have been shown to demonstrate imbalanced biases against certain groups. However, the study of unprovoked targeted attacks by LLMs towards at-risk populations remains underexplored. Our paper presents three novel contributions: (1) the explicit evaluation of LLM-generated attacks on highly vulnerable mental health groups; (2) a network-based framework to study the prop… ▽ More

    Submitted 28 January, 2026; v1 submitted 8 April, 2025; originally announced April 2025.

    ACM Class: J.4; K.4.1; K.4.2

    Journal ref: Second Conference on Language Modeling (COLM 2025)

  47. arXiv:2504.05789  [pdf, other

    cs.CV

    Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation

    Authors: Sarosij Bose, Hannah Dela Cruz, Arindam Dutta, Elena Kokkoni, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Human pose estimation is a critical tool across a variety of healthcare applications. Despite significant progress in pose estimation algorithms targeting adults, such developments for infants remain limited. Existing algorithms for infant pose estimation, despite achieving commendable performance, depend on fully supervised approaches that require large amounts of labeled data. These algorithms a… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Accepted at ABAW@CVPR 2025

  48. arXiv:2503.16423  [pdf, ps, other

    cs.CV cs.LG

    GAEA: A Geolocation Aware Conversational Assistant

    Authors: Ron Campos, Ashmal Vayani, Parth Parag Kulkarni, Rohit Gupta, Aizan Zafar, Aritra Dutta, Mubarak Shah

    Abstract: Image geolocalization, in which an AI model traditionally predicts the precise GPS coordinates of an image, is a challenging task with many downstream applications. However, the user cannot utilize the model to further their knowledge beyond the GPS coordinates; the model lacks an understanding of the location and the conversational ability to communicate with the user. In recent days, with the tr… ▽ More

    Submitted 2 September, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: The dataset and code used in this submission is available at: https://ucf-crcv.github.io/GAEA/

    ACM Class: I.4; I.2.7; I.5

  49. arXiv:2503.15742  [pdf, ps, other

    cs.CV

    Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes

    Authors: Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy Chowdhury

    Abstract: Reconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem. Consequently, when the scene is rendered from novel camera views, existing single image to 3D reconstruction methods render incoherent and blurry views. This problem is exacerbated when the unseen regions are far away from the input camera. In this work, we ad… ▽ More

    Submitted 8 October, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: ICCV 2025

  50. arXiv:2503.15671  [pdf, ps, other

    cs.CV

    CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

    Authors: Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu

    Abstract: Reconstructing clothed humans from a single image is a fundamental task in computer vision with wide-ranging applications. Although existing monocular clothed human reconstruction solutions have shown promising results, they often rely on the assumption that the human subject is in an occlusion-free environment. Thus, when encountering in-the-wild occluded images, these algorithms produce multivie… ▽ More

    Submitted 17 October, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Accepted at ICCV 2025