-
On the Fundamental Limits of LLMs at Scale
Authors:
Muhammad Ahmed Mohsin,
Muhammad Umer,
Ahsan Bilal,
Zeeshan Memon,
Muhammad Ibtsaam Qadir,
Sagnik Bhattacharya,
Hassan Rizwan,
Abhiram R. Gorle,
Maahe Zehra Kazmi,
Ayesha Mohsin,
Muhammad Usman Rafique,
Zihao He,
Pulkit Mehta,
Muhammad Ali Jamshed,
John M. Cioffi
Abstract:
Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational l…
▽ More
Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational limits of computation, information, and learning. This work closes that gap by presenting a unified, proof-informed framework that formalizes the innate theoretical ceilings of LLM scaling. First, computability and uncomputability imply an irreducible residue of error: for any computably enumerable model family, diagonalization guarantees inputs on which some model must fail, and undecidable queries (e.g., halting-style tasks) induce infinite failure sets for all computable predictors. Second, information-theoretic and statistical constraints bound attainable accuracy even on decidable tasks, finite description length enforces compression error, and long-tail factual knowledge requires prohibitive sample complexity. Third, geometric and computational effects compress long contexts far below their nominal size due to positional under-training, encoding attenuation, and softmax crowding. We further show how likelihood-based training favors pattern completion over inference, how retrieval under token limits suffers from semantic drift and coupling noise, and how multimodal scaling inherits shallow cross-modal alignment. Across sections, we pair theorems and empirical evidence to outline where scaling helps, where it saturates, and where it cannot progress, providing both theoretical foundations and practical mitigation paths like bounded-oracle retrieval, positional curricula, and sparse or hierarchical attention.
△ Less
Submitted 16 November, 2025;
originally announced November 2025.
-
Handling Image and Label Resolution Mismatch in Remote Sensing
Authors:
Scott Workman,
Armin Hadzic,
M. Usman Rafique
Abstract:
Though semantic segmentation has been heavily explored in vision literature, unique challenges remain in the remote sensing domain. One such challenge is how to handle resolution mismatch between overhead imagery and ground-truth label sources, due to differences in ground sample distance. To illustrate this problem, we introduce a new dataset and use it to showcase weaknesses inherent in existing…
▽ More
Though semantic segmentation has been heavily explored in vision literature, unique challenges remain in the remote sensing domain. One such challenge is how to handle resolution mismatch between overhead imagery and ground-truth label sources, due to differences in ground sample distance. To illustrate this problem, we introduce a new dataset and use it to showcase weaknesses inherent in existing strategies that naively upsample the target label to match the image resolution. Instead, we present a method that is supervised using low-resolution labels (without upsampling), but takes advantage of an exemplar set of high-resolution labels to guide the learning process. Our method incorporates region aggregation, adversarial learning, and self-supervised pretraining to generate fine-grained predictions, without requiring high-resolution annotations. Extensive experiments demonstrate the real-world applicability of our approach.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Revisiting Near/Remote Sensing with Geospatial Attention
Authors:
Scott Workman,
M. Usman Rafique,
Hunter Blanton,
Nathan Jacobs
Abstract:
This work addresses the task of overhead image segmentation when auxiliary ground-level images are available. Recent work has shown that performing joint inference over these two modalities, often called near/remote sensing, can yield significant accuracy improvements. Extending this line of work, we introduce the concept of geospatial attention, a geometry-aware attention mechanism that explicitl…
▽ More
This work addresses the task of overhead image segmentation when auxiliary ground-level images are available. Recent work has shown that performing joint inference over these two modalities, often called near/remote sensing, can yield significant accuracy improvements. Extending this line of work, we introduce the concept of geospatial attention, a geometry-aware attention mechanism that explicitly considers the geospatial relationship between the pixels in a ground-level image and a geographic location. We propose an approach for computing geospatial attention that incorporates geometric features and the appearance of the overhead and ground-level imagery. We introduce a novel architecture for near/remote sensing that is based on geospatial attention and demonstrate its use for five segmentation tasks. The results demonstrate that our method significantly outperforms the previous state-of-the-art methods.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Dynamic Image for 3D MRI Image Alzheimer's Disease Classification
Authors:
Xin Xing,
Gongbo Liang,
Hunter Blanton,
Muhammad Usman Rafique,
Chris Wang,
Ai-Ling Lin,
Nathan Jacobs
Abstract:
We propose to apply a 2D CNN architecture to 3D MRI image Alzheimer's disease classification. Training a 3D convolutional neural network (CNN) is time-consuming and computationally expensive. We make use of approximate rank pooling to transform the 3D MRI image volume into a 2D image to use as input to a 2D CNN. We show our proposed CNN model achieves $9.5\%$ better Alzheimer's disease classificat…
▽ More
We propose to apply a 2D CNN architecture to 3D MRI image Alzheimer's disease classification. Training a 3D convolutional neural network (CNN) is time-consuming and computationally expensive. We make use of approximate rank pooling to transform the 3D MRI image volume into a 2D image to use as input to a 2D CNN. We show our proposed CNN model achieves $9.5\%$ better Alzheimer's disease classification accuracy than the baseline 3D models. We also show that our method allows for efficient training, requiring only 20% of the training time compared to 3D CNN models. The code is available online: https://github.com/UkyVision/alzheimer-project.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Mobile Robot Path Planning in Static Environments using Particle Swarm Optimization
Authors:
M. Shahab Alam,
M. Usman Rafique,
M. Umer Khan
Abstract:
Motion planning is a key element of robotics since it empowers a robot to navigate autonomously. Particle Swarm Optimization is a simple, yet a very powerful optimization technique which has been effectively used in many complex multi-dimensional optimization problems. This paper proposes a path planning algorithm based on particle swarm optimization for computing a shortest collision-free path fo…
▽ More
Motion planning is a key element of robotics since it empowers a robot to navigate autonomously. Particle Swarm Optimization is a simple, yet a very powerful optimization technique which has been effectively used in many complex multi-dimensional optimization problems. This paper proposes a path planning algorithm based on particle swarm optimization for computing a shortest collision-free path for a mobile robot in environments populated with static convex obstacles. The proposed algorithm finds the optimal path by performing random sampling on grid lines generated between the robot start and goal positions. Functionality of the proposed algorithm is illustrated via simulation results for different scenarios.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Single Image Cloud Detection via Multi-Image Fusion
Authors:
Scott Workman,
M. Usman Rafique,
Hunter Blanton,
Connor Greenwell,
Nathan Jacobs
Abstract:
Artifacts in imagery captured by remote sensing, such as clouds, snow, and shadows, present challenges for various tasks, including semantic segmentation and object detection. A primary challenge in developing algorithms for identifying such artifacts is the cost of collecting annotated training data. In this work, we explore how recent advances in multi-image fusion can be leveraged to bootstrap…
▽ More
Artifacts in imagery captured by remote sensing, such as clouds, snow, and shadows, present challenges for various tasks, including semantic segmentation and object detection. A primary challenge in developing algorithms for identifying such artifacts is the cost of collecting annotated training data. In this work, we explore how recent advances in multi-image fusion can be leveraged to bootstrap single image cloud detection. We demonstrate that a network optimized to estimate image quality also implicitly learns to detect clouds. To support the training and evaluation of our approach, we collect a large dataset of Sentinel-2 images along with a per-pixel semantic labelling for land cover. Through various experiments, we demonstrate that our method reduces the need for annotated training data and improves cloud detection performance.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery
Authors:
Nathan Jacobs,
Adam Kraft,
Muhammad Usman Rafique,
Ranti Dev Sharma
Abstract:
We propose a neural network component, the regional aggregation layer, that makes it possible to train a pixel-level density estimator using only coarse-grained density aggregates, which reflect the number of objects in an image region. Our approach is simple to use and does not require domain-specific assumptions about the nature of the density function. We evaluate our approach on several synthe…
▽ More
We propose a neural network component, the regional aggregation layer, that makes it possible to train a pixel-level density estimator using only coarse-grained density aggregates, which reflect the number of objects in an image region. Our approach is simple to use and does not require domain-specific assumptions about the nature of the density function. We evaluate our approach on several synthetic datasets. In addition, we use this approach to learn to estimate high-resolution population and housing density from satellite imagery. In all cases, we find that our approach results in better density estimates than a commonly used baseline. We also show how our housing density estimator can be used to classify buildings as residential or non-residential.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.