Skip to main content

Showing 1–50 of 292 results for author: Kumar, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.12374  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

    Authors: NVIDIA, :, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang , et al. (522 additional authors not shown)

    Abstract: We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, a… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  2. arXiv:2604.06170  [pdf, ps, other

    cs.CL

    Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

    Authors: Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal

    Abstract: The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize various tools. In this paper, we introduce Paper Circle, a multi-agent research disc… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: 19 pages, 7 figures, 8 tables, ACL main (Oral)

  3. arXiv:2604.03449  [pdf, ps, other

    cs.LG eess.SY

    Neural Operators for Multi-Task Control and Adaptation

    Authors: David Sewell, Xingjian Li, Stepan Tretiakov, Krishna Kumar, David Fridovich-Keil

    Abstract: Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping from task description (e.g., cost or dynamics functions) to optimal control law (e.g., feedback policy). We approximate these solution operat… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 25 pages, 10 figures, 2 tables

  4. arXiv:2604.03231  [pdf, ps, other

    cs.CV

    CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

    Authors: Ankan Deria, Komal Kumar, Xilin He, Imran Razzak, Hisham Cholakkal, Fahad Shahbaz Khan, Salman Khan

    Abstract: Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style pretraining. While contrastive encoders are effective for cross-modal alignment and retrieval, self-supervised visual encoders often capture richer dense semantics and exhibit stronger robustness on recognition and understanding tasks. In this work, we i… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 16 pages, 10 figures, 5 tables

  5. arXiv:2604.02276  [pdf, ps, other

    cs.AI cs.CL cs.LG

    De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules

    Authors: Keerat Guliani, Deepkamal Gill, David Landsman, Nima Eshraghi, Krishna Kumar, Lovedeep Gondara

    Abstract: Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a fully automated, domain-agnostic pipeline for extracting structured regulatory rules from raw documents, requiring no human annotation, domain-specific pr… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  6. arXiv:2603.17175  [pdf, ps, other

    cs.LG physics.geo-ph

    Domain-informed explainable boosting machines for trustworthy lateral spread predictions

    Authors: Cheng-Hsi Hsiao, Krishna Kumar, Ellen M. Rathje

    Abstract: Explainable Boosting Machines (EBMs) provide transparent predictions through additive shape functions, enabling direct inspection of feature contributions. However, EBMs can learn non-physical relationships that reduce their reliability in natural hazard applications. This study presents a domain-informed framework to improve the physical consistency of EBMs for lateral spreading prediction. Our a… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 33 pages, 16 figures

  7. arXiv:2603.16983  [pdf, ps, other

    cs.LG cs.LO

    Formal verification of tree-based machine learning models for lateral spreading

    Authors: Krishna Kumar

    Abstract: Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either diagnose individual predictions approximately or restrict model capacity without providing exhaustive guarantees. This pa… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    ACM Class: I.2.6; D.2.4; J.2

  8. arXiv:2603.16839  [pdf, ps, other

    cs.AI

    Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

    Authors: Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam

    Abstract: Automated presentation generation remains a challenging task requiring coherent content creation, visual design, and audience-aware communication. This work proposes an OpenEnv-compatible reinforcement learning environment where LLM agents learn to research topics, plan content, and generate professional HTML slide presentations through tool use. We introduce a multi-component reward system combin… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 12 pages, 11 figures, 13 tables, 26 references. Code: https://github.com/pushing-the-frontier/slide-forge-llm Dataset: https://huggingface.co/datasets/KarthikRagunathAnandaKumar/sliderl-multi-turn-rollouts

  9. arXiv:2603.08468  [pdf, ps, other

    eess.SY cs.LG

    Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

    Authors: Shreya Das, Kundan Kumar, Muhammad Iqbal, Outi Savolainen, Dominik Baumann, Laura Ruotsalainen, Simo Särkkä

    Abstract: Model-based reinforcement learning (MBRL) is sample-efficient but depends on the accuracy of the learned dynamics, which are often modeled using black-box methods that do not adhere to physical laws. Those methods tend to produce inaccurate predictions when presented with data that differ from the original training set. In this work, we employ Lagrangian neural networks (LNNs), which enforce an un… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

    Comments: 5 pages, 3 figures

  10. "Write in English, Nobody Understands Your Language Here": A Study of Non-English Trends in Open-Source Repositories

    Authors: Masudul Hasan Masud Bhuiyan, Manish Kumar Bala Kumar, Cristian-Alexandru Staicu

    Abstract: The open-source software (OSS) community has historically been dominated by English as the primary language for code, documentation, and developer interactions. However, with growing global participation and better support for non-Latin scripts through standards like Unicode, OSS is gradually becoming more multilingual. This study investigates the extent to which OSS is becoming more multilingual,… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

  11. arXiv:2602.16935  [pdf, ps, other

    cs.AI cs.ET cs.LG

    DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

    Authors: Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar

    Abstract: While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like Crescendo and ActorAttack, slowly bleed malicious intent across turn boundaries to bypass stateless filters. We introduce DeepContext, a sta… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

    Comments: 18 Pages, 7 Tables, 1 Figure

    ACM Class: F.2.2; I.2.7

  12. arXiv:2602.15866  [pdf, ps, other

    cs.CL cs.AI cs.CR cs.CY cs.HC

    NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

    Authors: Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das

    Abstract: Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically assess these risks, we review 203 peer-reviewed papers and propose the NLP Privacy Risk Identification in Social Medi… ▽ More

    Submitted 26 January, 2026; originally announced February 2026.

    Journal ref: In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL) 2026

  13. arXiv:2602.11232  [pdf, ps, other

    cs.CR cs.PL cs.SE

    Yaksha-Prashna: Understanding eBPF Bytecode Network Function Behavior

    Authors: Animesh Singh, K Shiv Kumar, S. VenkataKeerthy, Pragna Mamidipaka, R V B R N Aaseesh, Sayandeep Sen, Palanivel Kodeswaran, Theophilus A. Benson, Ramakrishna Upadrasta, Praveen Tammana

    Abstract: Many cloud infrastructure organizations increasingly rely on third-party eBPF-based network functions for use cases like security, observability, and load balancing, so that not everyone requires a team of highly skilled eBPF experts. However, the network functions from third parties (e.g., F5, Palo Alto) are available in bytecode format to cloud operators, giving little or no understanding of the… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

  14. arXiv:2602.06965  [pdf, ps, other

    cs.CV

    MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images

    Authors: Ankan Deria, Komal Kumar, Adinath Madhavrao Dukre, Eran Segal, Salman Khan, Imran Razzak

    Abstract: Multimodal large language models have advanced rapidly, but their adoption in medicine is constrained by limited domain coverage, imperfect modality alignment, and insufficient grounded reasoning. We introduce MedMO, a medical multimodal foundation model built on a general MLLM architecture and trained exclusively on large-scale domain-specific data. MedMO uses a multi-stage training recipe that i… ▽ More

    Submitted 11 March, 2026; v1 submitted 6 February, 2026; originally announced February 2026.

    Comments: 21 pages, 6 figures and 4 tables

  15. arXiv:2601.15118  [pdf, ps, other

    cs.SD cs.CL cs.LG

    WavLink: Compact Audio-Text Embeddings with a Global Whisper Token

    Authors: Gokul Karthik Kumar, Ludovick Lepauloux, Hakim Hacid

    Abstract: Whisper has become the de-facto encoder for extracting general-purpose audio features in large audio-language models, where a 30-second clip is typically represented by 1500 frame features projected into an LLM. In contrast, audio-text embedding models like CLAP-based models have largely relied on alternative audio encoders (e.g., HTS-AT, PaSST), and have not leveraged Whisper effectively. We pres… ▽ More

    Submitted 22 January, 2026; v1 submitted 21 January, 2026; originally announced January 2026.

    Comments: Accepted at ICASSP 2026

  16. arXiv:2601.11801  [pdf, ps, other

    cs.RO cs.AI

    RobotDesignGPT: Automated Robot Design Synthesis using Vision Language Models

    Authors: Nitish Sontakke, K. Niranjan Kumar, Sehoon Ha

    Abstract: Robot design is a nontrivial process that involves careful consideration of multiple criteria, including user specifications, kinematic structures, and visual appearance. Therefore, the design process often relies heavily on domain expertise and significant human effort. The majority of current methods are rule-based, requiring the specification of a grammar or a set of primitive components and mo… ▽ More

    Submitted 16 January, 2026; originally announced January 2026.

  17. arXiv:2512.24365  [pdf, ps, other

    physics.geo-ph cs.LG

    Deep Learning in Geotechnical Engineering: A Critical Assessment of PINNs and Operator Learning

    Authors: Krishna Kumar

    Abstract: Deep learning methods -- physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) -- are increasingly proposed for geotechnical problems. This paper tests these methods against traditional solvers on canonical problems: wave propagation and beam-foundation interaction. PINNs run 90,000 times slower than finite difference with larger errors. De… ▽ More

    Submitted 30 December, 2025; originally announced December 2025.

  18. arXiv:2512.18120  [pdf, ps, other

    cs.LG

    Learning Generalizable Neural Operators for Inverse Problems

    Authors: Adam J. Thorpe, Stepan Tretiakov, Dibakar Roy Sarkar, Krishna Kumar, Ufuk Topcu

    Abstract: Inverse problems challenge existing neural operator architectures because ill-posed inverse maps violate continuity, uniqueness, and stability assumptions. We introduce B2B${}^{-1}$, an inverse basis-to-basis neural operator framework that addresses this limitation. Our key innovation is to decouple function representation from the inverse map. We learn neural basis functions for the input and out… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

  19. arXiv:2512.15945  [pdf, ps, other

    cs.CY

    Privacy Discourse and Emotional Dynamics in Mental Health Information Interaction on Reddit

    Authors: Jai Kruthunz Naveen Kumar, Aishwarya Umeshkumar Surani, Harkirat Singh, Sanchari Das

    Abstract: Reddit is a major venue for mental-health information interaction and peer support, where privacy concerns increasingly surface in user discourse. Thus, we analyze privacy-related discussions across 14 mental-health and regulatory subreddits, comprising 10,119 posts and 65,385 comments collected with a custom web scraper. Using lexicon-based sentiment analysis, we quantify emotional alignment betw… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  20. arXiv:2512.10078  [pdf, ps, other

    cs.MA

    Empirical Hardness in Multi-Agent Pathfinding: Research Challenges and Opportunities

    Authors: Jingyao Ren, Eric Ewing, T. K. Satish Kumar, Sven Koenig, Nora Ayanian

    Abstract: Multi-agent pathfinding (MAPF) is the problem of finding collision-free paths for a team of agents on a map. Although MAPF is NP-hard, the hardness of solving individual instances varies significantly, revealing a gap between theoretical complexity and actual hardness. This paper outlines three key research challenges in MAPF empirical hardness to understand such phenomena. The first challenge, kn… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

    Comments: Published in AAMAS-25

    Journal ref: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025, Pages 2885-2889

  21. arXiv:2512.08545  [pdf

    cs.CL cs.AI cs.CV cs.MA

    Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks

    Authors: Indrajit Kar, Kalathur Chenchu Kishore Kumar

    Abstract: Large Language Models and multi-agent systems have shown promise in decomposing complex tasks, yet they struggle with long-horizon reasoning tasks and escalating computation cost. This work introduces a hierarchical multi-agent architecture that distributes reasoning across a 64*64 grid of lightweight agents, supported by a selective oracle. A spatial curriculum progressively expands the operation… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: 22 pages, 2 tables, 9 figures

  22. arXiv:2512.01059  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction

    Authors: Anantha Padmanaban Krishna Kumar

    Abstract: Although scaling laws and many empirical results suggest that increasing the size of Vision Transformers often improves performance, model accuracy and training behavior are not always monotonically increasing with scale. Focusing on ViT-B/16 trained on ImageNet-1K, we study two simple parameter-reduction strategies applied to the MLP blocks, each removing 32.7\% of the baseline parameters. Our \e… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 7 pages total (6 pages main text, 1 page references), 1 figures, 2 tables. Code available at https://github.com/AnanthaPadmanaban-KrishnaKumar/parameter-efficient-vit-mlps

  23. arXiv:2511.21635  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Mechanisms of Non-Monotonic Scaling in Vision Transformers

    Authors: Anantha Padmanaban Krishna Kumar

    Abstract: Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNet, we identify a consistent three-phase Cliff-Plateau-Climb pattern that governs how representations evolve with depth. We observe that better performance is associated with progressive marginalization of the [… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

    Comments: 16 pages total (11 pages main text, 1 pages references, 4 pages appendix), 5 figures, 11 tables. Code available at https://github.com/AnanthaPadmanaban-KrishnaKumar/Cliff-Plateau-Climb

  24. arXiv:2511.21038  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels

    Authors: Anantha Padmanaban Krishna Kumar

    Abstract: Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three a… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: 13 pages total (7 pages main text, 3 pages references, 3 pages appendix), 2 figures, 14 tables. Code available at https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl

  25. arXiv:2511.06143  [pdf, ps, other

    cs.LG

    Enhancing Robustness of Graph Neural Networks through p-Laplacian

    Authors: Anuj Kumar Sirohi, Subhanu Halder, Kabir Kumar, Sandeep Kumar

    Abstract: With the increase of data in day-to-day life, businesses and different stakeholders need to analyze the data for better predictions. Traditionally, relational data has been a source of various insights, but with the increase in computational power and the need to understand deeper relationships between entities, the need to design new techniques has arisen. For this graph data analysis has become… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: Accepted at 5th Workshop on Graphs and more Complex Structures For Learning and Reasoning (GCLR), The 40th AAAI Conference on Artificial Intelligence (AAAI-26)

  26. arXiv:2511.05757  [pdf, ps, other

    eess.SY cs.LG

    Zero-Shot Function Encoder-Based Differentiable Predictive Control

    Authors: Hassan Iqbal, Xingjian Li, Tyler Ingebrand, Adam Thorpe, Krishna Kumar, Ufuk Topcu, Ján Drgoňa

    Abstract: We introduce a differentiable framework for zero-shot adaptive control over parametric families of nonlinear dynamical systems. Our approach integrates a function encoder-based neural ODE (FE-NODE) for modeling system dynamics with a differentiable predictive control (DPC) for offline self-supervised learning of explicit control policies. The FE-NODE captures nonlinear behaviors in state transitio… ▽ More

    Submitted 14 April, 2026; v1 submitted 7 November, 2025; originally announced November 2025.

  27. arXiv:2511.05456  [pdf, ps, other

    cs.LG

    Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators

    Authors: Naveen Raj Manoharan, Hassan Iqbal, Krishna Kumar

    Abstract: Graph network-based simulators (GNS) have demonstrated strong potential for learning particle-based physics (such as fluids, deformable solids, and granular flows) while generalizing to unseen geometries due to their inherent inductive biases. However, existing models are typically trained for a single material type and fail to generalize across distinct constitutive behaviors, limiting their appl… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

  28. arXiv:2510.24081  [pdf, ps, other

    cs.CL

    Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

    Authors: Tyler A. Chang, Catherine Arnett, Abdelrahman Eldesokey, Abdelrahman Sadallah, Abeer Kashar, Abolade Daud, Abosede Grace Olanihun, Adamu Labaran Mohammed, Adeyemi Praise, Adhikarinayum Meerajita Sharma, Aditi Gupta, Afitab Iyigun, Afonso Simplício, Ahmed Essouaied, Aicha Chorana, Akhil Eppa, Akintunde Oladipo, Akshay Ramesh, Aleksei Dorkin, Alfred Malengo Kondoro, Alham Fikri Aji, Ali Eren Çetintaş, Allan Hanbury, Alou Dembele, Alp Niksarli , et al. (313 additional authors not shown)

    Abstract: To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five co… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: Preprint

  29. arXiv:2510.09904  [pdf, ps, other

    cs.LG cs.AI math.OC

    Stability of Transformers under Layer Normalization

    Authors: Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Krishna Kumar, Markos A. Katsoulakis

    Abstract: Despite their widespread use, training deep Transformers can be unstable. Layer normalization, a standard component, improves training stability, but its placement has often been ad-hoc. In this paper, we conduct a principled study on the forward (hidden states) and backward (gradient) stability of Transformers under different layer normalization placements. Our theory provides key insights into t… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  30. arXiv:2509.22793  [pdf, ps, other

    cs.CV

    DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models

    Authors: Komal Kumar, Rao Muhammad Anwer, Fahad Shahbaz Khan, Salman Khan, Ivan Laptev, Hisham Cholakkal

    Abstract: Efficient fine-tuning of pre-trained Text-to-Image (T2I) models involves adjusting the model to suit a particular task or dataset while minimizing computational resources and limiting the number of trainable parameters. However, it often faces challenges in striking a trade-off between aligning with the target distribution: learning a novel concept from a limited image for personalization and reta… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 13 Figures, 21 pages, accepted in NeurIPS 2025

  31. arXiv:2509.21464  [pdf, ps, other

    cs.CV cs.RO

    Residual Vector Quantization For Communication-Efficient Multi-Agent Perception

    Authors: Dereje Shenkut, B. V. K Vijaya Kumar

    Abstract: Multi-agent collaborative perception (CP) improves scene understanding by sharing information across connected agents such as autonomous vehicles, unmanned aerial vehicles, and robots. Communication bandwidth, however, constrains scalability. We present ReVQom, a learned feature codec that preserves spatial identity while compressing intermediate features. ReVQom is an end-to-end method that compr… ▽ More

    Submitted 7 February, 2026; v1 submitted 25 September, 2025; originally announced September 2025.

    Comments: Accepted at ICASSP 2026. 5 pages

  32. arXiv:2509.18404  [pdf, ps, other

    math.OC cs.LG

    Zero-Shot Transferable Solution Method for Parametric Optimal Control Problems

    Authors: Xingjian Li, Kelvin Kan, Deepanshu Verma, Krishna Kumar, Stanley Osher, Ján Drgoňa

    Abstract: This paper presents a transferable solution method for optimal control problems with varying objectives using function encoder (FE) policies. Traditional optimization-based approaches must be re-solved whenever objectives change, resulting in prohibitive computational costs for applications requiring frequent evaluation and adaptation. The proposed method learns a reusable set of neural basis func… ▽ More

    Submitted 11 March, 2026; v1 submitted 22 September, 2025; originally announced September 2025.

    Comments: 11 pages, 6 figures, 3 tables

  33. arXiv:2509.10452  [pdf, ps, other

    cs.CL cs.LG

    WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

    Authors: Akshat Pandey, Karun Kumar, Raphael Tang

    Abstract: Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen vocabulary and parlance. In many real-world settings, collecting speech data is impractical, necessitating text-only adaptation. We propose WhisTLE, a deeply supervised, text-only adaptation method for pretrained encoder-decoder ASR models. WhisTLE trains a variationa… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: 5 pages, 2 figures

  34. arXiv:2509.07526  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.LG

    Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data

    Authors: Gokul Karthik Kumar, Rishabh Saraf, Ludovick Lepauloux, Abdul Muneer, Billel Mokeddem, Hakim Hacid

    Abstract: Large language models (LLMs) have transformed NLP, yet their integration with audio remains underexplored despite audio's centrality to human communication. We introduce Falcon3-Audio, a family of Audio-Language Models (ALMs) built on instruction-tuned LLMs and Whisper encoders. Using a remarkably small amount of public audio data, less than 30K hours (5K unique), Falcon3-Audio-7B matches the best… ▽ More

    Submitted 22 January, 2026; v1 submitted 9 September, 2025; originally announced September 2025.

    Comments: Accepted at ASRU 2025

  35. arXiv:2509.02923  [pdf, ps, other

    cs.LG

    A Narrative Review of Clinical Decision Support Systems in Offloading Footwear for Diabetes-Related Foot Ulcers

    Authors: Kunal Kumar, Muhammad Ashad Kabir, Luke Donnan, Sayed Ahmed

    Abstract: Offloading footwear helps prevent and treat diabetic foot ulcers (DFUs) by lowering plantar pressure (PP), yet prescription decisions remain fragmented: feature selection varies, personalization is limited, and evaluation practices differ. We performed a narrative review of 45 studies (12 guidelines/protocols, 25 knowledge-based systems, 8 machine-learning applications) published to Aug 2025. We t… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: 44 pages, 2 figures, and 3 tables

  36. arXiv:2507.09005  [pdf, ps, other

    cs.CV physics.geo-ph

    From images to properties: a NeRF-driven framework for granular material parameter inversion

    Authors: Cheng-Hsi Hsiao, Krishna Kumar

    Abstract: We introduce a novel framework that integrates Neural Radiance Fields (NeRF) with Material Point Method (MPM) simulation to infer granular material properties from visual observations. Our approach begins by generating synthetic experimental data, simulating an plow interacting with sand. The experiment is rendered into realistic images as the photographic observations. These observations include… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  37. arXiv:2507.07247  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention

    Authors: Zhengyu Tian, Anantha Padmanaban Krishna Kumar, Hemant Krishnakumar, Reza Rawassizadeh

    Abstract: As large language models (LLMs) and visual language models (VLMs) grow in scale and application, attention mechanisms have become a central computational bottleneck due to their high memory and time complexity. While many efficient attention variants have been proposed, there remains a lack of rigorous evaluation on their actual energy usage and hardware resource demands during training. In this w… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: 6 pages, 8 figures

  38. arXiv:2507.04137  [pdf

    cs.CL cs.LG

    Detecting Token-Level Hallucinations Using Variance Signals: A Reference-Free Approach

    Authors: Keshav Kumar

    Abstract: Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations, confidently generated yet factually incorrect outputs. We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple stochastic generations. Unlike prior methods that re… ▽ More

    Submitted 16 October, 2025; v1 submitted 5 July, 2025; originally announced July 2025.

  39. arXiv:2506.08885  [pdf, ps, other

    cs.CL cs.LG

    AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

    Authors: Danush Khanna, Gurucharan Marthi Krishna Kumar, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das

    Abstract: Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent ge… ▽ More

    Submitted 28 September, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  40. arXiv:2505.13550  [pdf, ps, other

    cs.IR cs.AI

    JIR-Arena: The First Benchmark Dataset for Just-in-time Information Recommendation

    Authors: Ke Yang, Kevin Ros, Shankar Kumar Senthil Kumar, ChengXiang Zhai

    Abstract: Just-in-time Information Recommendation (JIR) is a service designed to deliver the most relevant information precisely when users need it, , addressing their knowledge gaps with minimal effort and boosting decision-making and efficiency in daily life. Advances in device-efficient deployment of foundation models and the growing use of intelligent wearable devices have made always-on JIR assistants… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  41. arXiv:2505.04738  [pdf, ps, other

    cs.LG

    SetONet: A Set-Based Operator Network for Solving PDEs with Variable-Input Sampling

    Authors: Stepan Tretiakov, Xingjian Li, Krishna Kumar

    Abstract: Most neural-operator surrogates for PDEs inherit from DeepONet-style formulations the requirement that the input function be sampled at a fixed, ordered set of sensors. This assumption limits applicability to problems with variable sensor layouts, missing data, point sources, and sample-based representations of densities. We propose SetONet, which addresses this gap by recasting the operator input… ▽ More

    Submitted 1 April, 2026; v1 submitted 7 May, 2025; originally announced May 2025.

    ACM Class: I.2; G.1.8

  42. arXiv:2505.04572  [pdf, other

    cs.RO

    Stow: Robotic Packing of Items into Fabric Pods

    Authors: Nicolas Hudson, Josh Hooks, Rahul Warrier, Curt Salisbury, Ross Hartley, Kislay Kumar, Bhavana Chandrashekhar, Paul Birkmeyer, Bosch Tang, Matt Frost, Shantanu Thakar, Tony Piaskowy, Petter Nilsson, Josh Petersen, Neel Doshi, Alan Slatter, Ankit Bhatia, Cassie Meeker, Yuechuan Xue, Dylan Cox, Alex Kyriazis, Bai Lou, Nadeem Hasan, Asif Rana, Nikhil Chacko , et al. (12 additional authors not shown)

    Abstract: This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perfo… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2504.11397  [pdf, other

    cs.LG physics.comp-ph

    MLPs and KANs for data-driven learning in physical problems: A performance comparison

    Authors: Raghav Pant, Sikan Li, Xingjian Li, Hassan Iqbal, Krishna Kumar

    Abstract: There is increasing interest in solving partial differential equations (PDEs) by casting them as machine learning problems. Recently, there has been a spike in exploring Kolmogorov-Arnold Networks (KANs) as an alternative to traditional neural networks represented by Multi-Layer Perceptrons (MLPs). While showing promise, their performance advantages in physics-based problems remain largely unexplo… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 30 pages, 18 figures, 8 tables

  44. arXiv:2504.08766  [pdf, other

    cond-mat.soft cs.LG physics.comp-ph

    Towards scientific machine learning for granular material simulations -- challenges and opportunities

    Authors: Marc Fransen, Andreas Fürst, Deepak Tunuguntla, Daniel N. Wilke, Benedikt Alkin, Daniel Barreto, Johannes Brandstetter, Miguel Angel Cabrera, Xinyan Fan, Mengwu Guo, Bram Kieskamp, Krishna Kumar, John Morrissey, Jonathan Nuttall, Jin Ooi, Luisa Orozco, Stefanos-Aldo Papanicolopulos, Tongming Qu, Dingena Schott, Takayuki Shuku, WaiChing Sun, Thomas Weinhart, Dongwei Ye, Hongyang Cheng

    Abstract: Micro-scale mechanisms, such as inter-particle and particle-fluid interactions, govern the behaviour of granular systems. While particle-scale simulations provide detailed insights into these interactions, their computational cost is often prohibitive. Attended by researchers from both the granular materials (GM) and machine learning (ML) communities, a recent Lorentz Center Workshop on "Machine L… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 35 pages, 17 figures

  45. arXiv:2503.22678  [pdf, ps, other

    cs.CL

    Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

    Authors: Mohammad Almansoori, Komal Kumar, Hisham Cholakkal

    Abstract: In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature… ▽ More

    Submitted 1 October, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

    Comments: 14 page, 4 figures, 61 references, presented in MICCAI (Oral)

  46. arXiv:2503.13389  [pdf

    cs.LG physics.geo-ph

    Investigating the effect of CPT in lateral spreading prediction using Explainable AI

    Authors: Cheng-Hsi Hsiao, Ellen Rathje, Krishna Kumar

    Abstract: This study proposes an autoencoder approach to extract latent features from cone penetration test profiles to evaluate the potential of incorporating CPT data in an AI model. We employ autoencoders to compress 200 CPT profiles of soil behavior type index (Ic) and normalized cone resistance (qc1Ncs) into ten latent features while preserving critical information. We then utilize the extracted latent… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  47. arXiv:2503.11698  [pdf, ps, other

    cs.AR

    A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence

    Authors: Yudhishthira Kundu, Manroop Kaur, Tripty Wig, Kriti Kumar, Pushpanjali Kumari, Vivek Puri, Manish Arora

    Abstract: Cerebras' wafer-scale engine (WSE) technology merges multiple dies on a single wafer. It addresses the challenges of memory bandwidth, latency, and scalability, making it suitable for artificial intelligence. This work evaluates the WSE-3 architecture and compares it with leading GPU-based AI accelerators, notably Nvidia's H100 and B200. The work highlights the advantages of WSE-3 in performance p… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 11 pages

  48. arXiv:2502.21321  [pdf, other

    cs.CL cs.CV

    LLM Post-Training: A Deep Dive into Reasoning Large Language Models

    Authors: Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Fahad Shahbaz Khan, Salman Khan

    Abstract: Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these models, yet the research community is now increasingly shifting focus toward post-training techniques to achieve further breakthroughs. While pretraining provides a broad linguistic foundation, post-tr… ▽ More

    Submitted 24 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 32 pages, 7 figures, 3 tables, 377 references. Github Repo: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training

  49. arXiv:2502.16095  [pdf, ps, other

    cs.CV cs.LG

    Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

    Authors: Swadhin Das, Saarthak Gupta, Kamal Kumar, Raksha Sharma

    Abstract: Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes… ▽ More

    Submitted 3 July, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

  50. arXiv:2502.13982  [pdf, other

    eess.AS cs.LG

    Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics

    Authors: Kabir Kumar

    Abstract: Natural Language Processing (NLP) and Voice Recognition agents are rapidly evolving healthcare by enabling efficient, accessible, and professional patient support while automating grunt work. This report serves as my self project wherein models finetuned on medical call recordings are analysed through a two-stage system: Automatic Speech Recognition (ASR) for speech transcription and a Large Langu… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.