Sal: Multi-modal Verification of Replicated Data Types

Authors: Pranav Ramesh, Vimala Soundarapandian, KC Sivaramakrishnan

Abstract: Designing correct replicated data types (RDTs) is challenging because replicas evolve independently and must be merged while preserving application intent. A promising approach is correct-by-construction development in a proof-oriented programming language such as F*, Dafny and Lean, where desired correctness guarantees are specified and checked as the RDTs are implemented. Recent work Neem propos… ▽ More Designing correct replicated data types (RDTs) is challenging because replicas evolve independently and must be merged while preserving application intent. A promising approach is correct-by-construction development in a proof-oriented programming language such as F*, Dafny and Lean, where desired correctness guarantees are specified and checked as the RDTs are implemented. Recent work Neem proposes the use of replication-aware linearizability (RA linearizability) as the correctness condition for state-based CRDTs and mergeable replicated data types (MRDTs), with automation in the SMT-aided, proof-oriented programming language F*. However, SMT-centric workflows can be opaque when automation fails to discharge a verification condition (VC), and they enlarge the trusted computing base (TCB). We present Sal, a multi-modal workflow to design and verify state-based CRDTs and MRDTs in Lean. Sal combines (i) kernel-checkable automation with proof reconstruction, (ii) SMT-aided automation when needed, and (iii) AI-assisted interactive theorem proving for remaining proof obligations. When a verification condition is shown to be invalid, we leverage Lean's property-based testing to automatically generate and visualize counterexamples, helping developers debug incorrect specifications or implementations. We report on our experience verifying a suite of 13 CRDTs and MRDTs with Sal: 69% of verification conditions are discharged by kernel-verified automation without SMT, and counterexamples automatically expose subtle bugs such as the well-known enable-wins flag anomaly. The codebase for Sal is open-sourced, and is available at \href{https://github.com/fplaunchpad/sal}{https://github.com/fplaunchpad/sal} △ Less

Submitted 28 March, 2026; originally announced March 2026.

arXiv:2603.14397 [pdf, ps, other]

eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Authors: Prithvi Jai Ramesh, Kaustav Chanda, Krishna Vinod, Joseph Raj Vishal, Yezhou Yang, Bharatesh Chakravarthi

Abstract: Event cameras provide high dynamic range and microsecond-level temporal resolution, making them well-suited for indoor robot navigation, where conventional RGB cameras degrade under fast motion or low-light conditions. Despite advances in event-based perception spanning detection, SLAM, and pose estimation, there remains limited research on end-to-end control policies that exploit the asynchronous… ▽ More Event cameras provide high dynamic range and microsecond-level temporal resolution, making them well-suited for indoor robot navigation, where conventional RGB cameras degrade under fast motion or low-light conditions. Despite advances in event-based perception spanning detection, SLAM, and pose estimation, there remains limited research on end-to-end control policies that exploit the asynchronous nature of event streams. To address this gap, we introduce a real-world indoor person-following dataset collected using a TurtleBot 2 robot, featuring synchronized raw event streams, RGB frames, and expert control actions across multiple indoor maps, trajectories under both normal and low-light conditions. We further build a multimodal data preprocessing pipeline that temporally aligns event and RGB observations while reconstructing ground-truth actions from odometry to support high-quality imitation learning. Building on this dataset, we propose a late-fusion RGB-Event navigation policy that combines dual MobileNet encoders with a transformer-based fusion module trained via behavioral cloning. A systematic evaluation of RGB-only, Event-only, and RGB-Event fusion models across 12 training variations ranging from single-path imitation to general multi-path imitation shows that policies incorporating event data, particularly the fusion model, achieve improved robustness and lower action prediction error, especially in unseen low-light conditions where RGB-only models fail. We release the dataset, synchronization pipeline, and trained models at https://eventbasedvision.github.io/eNavi/ △ Less

Submitted 15 March, 2026; originally announced March 2026.

arXiv:2603.13467 [pdf, ps, other]

Resolving Interference (RI): Disentangling Models for Improved Model Merging

Authors: Pratik Ramesh, George Stoica, Arun Iyer, Leshem Choshen, Judy Hoffman

Abstract: Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the repre… ▽ More Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be applied in data-scarce scenarios. RI consistently improves the performance of state-of-the-art merging methods by up to 3.8% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters. Our codebase is available at: https://github.com/pramesh39/resolving_interference △ Less

Submitted 13 March, 2026; originally announced March 2026.

arXiv:2602.21137 [pdf, ps, other]

UDVideoQA: A Traffic Video Question Answering Dataset for Multi-Object Spatio-Temporal Reasoning in Urban Dynamics

Authors: Joseph Raj Vishal, Nagasiri Poluri, Katha Naik, Rutuja Patil, Kashyap Hegde Kota, Krishna Vinod, Prithvi Jai Ramesh, Mohammad Farhadi, Yezhou Yang, Bharatesh Chakravarthi

Abstract: Understanding the complex, multi-agent dynamics of urban traffic remains a fundamental challenge for video language models. This paper introduces Urban Dynamics VideoQA, a benchmark dataset that captures the unscripted real-world behavior of dynamic urban scenes. UDVideoQA is curated from 16 hours of traffic footage recorded at multiple city intersections under diverse traffic, weather, and lighti… ▽ More Understanding the complex, multi-agent dynamics of urban traffic remains a fundamental challenge for video language models. This paper introduces Urban Dynamics VideoQA, a benchmark dataset that captures the unscripted real-world behavior of dynamic urban scenes. UDVideoQA is curated from 16 hours of traffic footage recorded at multiple city intersections under diverse traffic, weather, and lighting conditions. It employs an event-driven dynamic blur technique to ensure privacy preservation without compromising scene fidelity. Using a unified annotation pipeline, the dataset contains 28K question-answer pairs generated across 8 hours of densely annotated video, averaging one question per second. Its taxonomy follows a hierarchical reasoning level, spanning basic understanding and attribution to event reasoning, reverse reasoning, and counterfactual inference, enabling systematic evaluation of both visual grounding and causal reasoning. Comprehensive experiments benchmark 10 SOTA VideoLMs on UDVideoQA and 8 models on a complementary video question generation benchmark. Results reveal a persistent perception-reasoning gap, showing models that excel in abstract inference often fail with fundamental visual grounding. While models like Gemini Pro achieve the highest zero-shot accuracy, fine-tuning the smaller Qwen2.5-VL 7B model on UDVideoQA bridges this gap, achieving performance comparable to proprietary systems. In VideoQGen, Gemini 2.5 Pro, and Qwen3 Max generate the most relevant and complex questions, though all models exhibit limited linguistic diversity, underscoring the need for human-centric evaluation. The UDVideoQA suite, including the dataset, annotation tools, and benchmarks for both VideoQA and VideoQGen, provides a foundation for advancing robust, privacy-aware, and real-world multimodal reasoning. UDVideoQA is available at https://ud-videoqa.github.io/UD-VideoQA/UD-VideoQA/. △ Less

Submitted 24 February, 2026; originally announced February 2026.

arXiv:2601.06647 [pdf, ps, other]

eSkiTB: A Synthetic Event-based Dataset for Tracking Skiers

Authors: Krishna Vinod, Joseph Raj Vishal, Kaustav Chanda, Prithvi Jai Ramesh, Yezhou Yang, Bharatesh Chakravarthi

Abstract: Tracking skiers in RGB broadcast footage is challenging due to motion blur, static overlays, and clutter that obscure the fast-moving athlete. Event cameras, with their asynchronous contrast sensing, offer natural robustness to such artifacts, yet a controlled benchmark for winter-sport tracking has been missing. We introduce event SkiTB (eSkiTB), a synthetic event-based ski tracking dataset gener… ▽ More Tracking skiers in RGB broadcast footage is challenging due to motion blur, static overlays, and clutter that obscure the fast-moving athlete. Event cameras, with their asynchronous contrast sensing, offer natural robustness to such artifacts, yet a controlled benchmark for winter-sport tracking has been missing. We introduce event SkiTB (eSkiTB), a synthetic event-based ski tracking dataset generated from SkiTB using direct video-to-event conversion without neural interpolation, enabling an iso-informational comparison between RGB and event modalities. Benchmarking SDTrack (spiking transformer) against STARK (RGB transformer), we find that event-based tracking is substantially resilient to broadcast clutter in scenes dominated by static overlays, achieving 0.685 IoU, outperforming RGB by +20.0 points. Across the dataset, SDTrack attains a mean IoU of 0.711, demonstrating that temporal contrast is a reliable cue for tracking ballistic motion in visually congested environments. eSkiTB establishes the first controlled setting for event-based tracking in winter sports and highlights the promise of event cameras for ski tracking. The dataset and code will be released at https://github.com/eventbasedvision/eSkiTB. △ Less

Submitted 10 January, 2026; originally announced January 2026.

arXiv:2511.05215 [pdf, ps, other]

NeuroFlex: Column-Exact ANN-SNN Co-Execution Accelerator with Cost-Guided Scheduling

Authors: Varun Manjunath, Pranav Ramesh, Gopalakrishnan Srinivasan

Abstract: NeuroFlex is a column-level accelerator that co-executes artificial and spiking neural networks to minimize energy-delay product on sparse edge workloads with competitive accuracy. The design extends integer-exact QCFS ANN-SNN conversion from layers to independent columns. It unifies INT8 storage with on-the-fly spike generation using an offline cost model to assign columns to ANN or SNN cores and… ▽ More NeuroFlex is a column-level accelerator that co-executes artificial and spiking neural networks to minimize energy-delay product on sparse edge workloads with competitive accuracy. The design extends integer-exact QCFS ANN-SNN conversion from layers to independent columns. It unifies INT8 storage with on-the-fly spike generation using an offline cost model to assign columns to ANN or SNN cores and pack work across processing elements with deterministic runtime. Our cost-guided scheduling algorithm improves throughput by 16-19% over random mapping and lowers EDP by 57-67% versus a strong ANN-only baseline across VGG-16, ResNet-34, GoogLeNet, and BERT models. NeuroFlex also delivers up to 2.5x speedup over LoAS and 2.51x energy reduction over SparTen. These results indicate that fine-grained and integer-exact hybridization outperforms single-mode designs on energy and latency without sacrificing accuracy. △ Less

Submitted 7 November, 2025; originally announced November 2025.

arXiv:2510.11018 [pdf, ps, other]

The Easy Path to Robustness: Coreset Selection using Sample Hardness

Authors: Pranav Ramesh, Arjun Roy, Deepak Ravikumar, Kaushik Roy, Gopalakrishnan Srinivasan

Abstract: Designing adversarially robust models from a data-centric perspective requires understanding which input samples are most crucial for learning resilient features. While coreset selection provides a mechanism for efficient training on data subsets, current algorithms are designed for clean accuracy and fall short in preserving robustness. To address this, we propose a framework linking a sample's a… ▽ More Designing adversarially robust models from a data-centric perspective requires understanding which input samples are most crucial for learning resilient features. While coreset selection provides a mechanism for efficient training on data subsets, current algorithms are designed for clean accuracy and fall short in preserving robustness. To address this, we propose a framework linking a sample's adversarial vulnerability to its \textit{hardness}, which we quantify using the average input gradient norm (AIGN) over training. We demonstrate that \textit{easy} samples (with low AIGN) are less vulnerable and occupy regions further from the decision boundary. Leveraging this insight, we present EasyCore, a coreset selection algorithm that retains only the samples with low AIGN for training. We empirically show that models trained on EasyCore-selected data achieve significantly higher adversarial accuracy than those trained with competing coreset methods under both standard and adversarial training. As AIGN is a model-agnostic dataset property, EasyCore is an efficient and widely applicable data-centric method for improving adversarial robustness. We show that EasyCore achieves up to 7\% and 5\% improvement in adversarial accuracy under standard training and TRADES adversarial training, respectively, compared to existing coreset methods. △ Less

Submitted 13 October, 2025; originally announced October 2025.

arXiv:2508.17643 [pdf, ps, other]

SEBVS: Synthetic Event-based Visual Servoing for Robot Navigation and Manipulation

Authors: Krishna Vinod, Prithvi Jai Ramesh, Pavan Kumar B N, Bharatesh Chakravarthi

Abstract: Event cameras offer microsecond latency, high dynamic range, and low power consumption, making them ideal for real-time robotic perception under challenging conditions such as motion blur, occlusion, and illumination changes. However, despite their advantages, synthetic event-based vision remains largely unexplored in mainstream robotics simulators. This lack of simulation setup hinders the evalua… ▽ More Event cameras offer microsecond latency, high dynamic range, and low power consumption, making them ideal for real-time robotic perception under challenging conditions such as motion blur, occlusion, and illumination changes. However, despite their advantages, synthetic event-based vision remains largely unexplored in mainstream robotics simulators. This lack of simulation setup hinders the evaluation of event-driven approaches for robotic manipulation and navigation tasks. This work presents an open-source, user-friendly v2e robotics operating system (ROS) package for Gazebo simulation that enables seamless event stream generation from RGB camera feeds. The package is used to investigate event-based robotic policies (ERP) for real-time navigation and manipulation. Two representative scenarios are evaluated: (1) object following with a mobile robot and (2) object detection and grasping with a robotic manipulator. Transformer-based ERPs are trained by behavior cloning and compared to RGB-based counterparts under various operating conditions. Experimental results show that event-guided policies consistently deliver competitive advantages. The results highlight the potential of event-driven perception to improve real-time robotic navigation and manipulation, providing a foundation for broader integration of event cameras into robotic policy learning. The GitHub repo for the dataset and code: https://eventbasedvision.github.io/SEBVS/ △ Less

Submitted 25 August, 2025; originally announced August 2025.

arXiv:2505.01730 [pdf, ps, other]

PASCAL: Precise and Efficient ANN- SNN Conversion using Spike Accumulation and Adaptive Layerwise Activation

Authors: Pranav Ramesh, Gopalakrishnan Srinivasan

Abstract: Spiking Neural Networks (SNNs) have been put forward as an energy-efficient alternative to Artificial Neural Networks (ANNs) since they perform sparse Accumulate operations instead of the power-hungry Multiply-and-Accumulate operations. ANN-SNN conversion is a widely used method to realize deep SNNs with accuracy comparable to that of ANNs.~\citeauthor{bu2023optimal} recently proposed the Quantiza… ▽ More Spiking Neural Networks (SNNs) have been put forward as an energy-efficient alternative to Artificial Neural Networks (ANNs) since they perform sparse Accumulate operations instead of the power-hungry Multiply-and-Accumulate operations. ANN-SNN conversion is a widely used method to realize deep SNNs with accuracy comparable to that of ANNs.~\citeauthor{bu2023optimal} recently proposed the Quantization-Clip-Floor-Shift (QCFS) activation as an alternative to ReLU to minimize the accuracy loss during ANN-SNN conversion. Nevertheless, SNN inferencing requires a large number of timesteps to match the accuracy of the source ANN for real-world datasets. In this work, we propose PASCAL, which performs ANN-SNN conversion in such a way that the resulting SNN is mathematically equivalent to an ANN with QCFS-activation, thereby yielding similar accuracy as the source ANN with minimal inference timesteps. In addition, we propose a systematic method to configure the quantization step of QCFS activation in a layerwise manner, which effectively determines the optimal number of timesteps per layer for the converted SNN. Our results show that the ResNet-34 SNN obtained using PASCAL achieves an accuracy of $\approx$74\% on ImageNet with a 64$\times$ reduction in the number of inference timesteps compared to existing approaches. △ Less

Submitted 11 December, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

Journal ref: Transactions on Machine Learning Research, 2025

arXiv:2504.15578 [pdf, other]

Real-Time Optimal Design of Experiment for Parameter Identification of Li-Ion Cell Electrochemical Model

Authors: Ian Mikesell, Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Prashanth Ramesh, Marcello Canova

Abstract: Accurately identifying the parameters of electrochemical models of li-ion battery (LiB) cells is a critical task for enhancing the fidelity and predictive ability. Traditional parameter identification methods often require extensive data collection experiments and lack adaptability in dynamic environments. This paper describes a Reinforcement Learning (RL) based approach that dynamically tailors t… ▽ More Accurately identifying the parameters of electrochemical models of li-ion battery (LiB) cells is a critical task for enhancing the fidelity and predictive ability. Traditional parameter identification methods often require extensive data collection experiments and lack adaptability in dynamic environments. This paper describes a Reinforcement Learning (RL) based approach that dynamically tailors the current profile applied to a LiB cell to optimize the parameters identifiability of the electrochemical model. The proposed framework is implemented in real-time using a Hardware-in-the-Loop (HIL) setup, which serves as a reliable testbed for evaluating the RL-based design strategy. The HIL validation confirms that the RL-based experimental design outperforms conventional test protocols used for parameter identification in terms of both reducing the modeling errors on a verification test and minimizing the duration of the experiment used for parameter identification. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2411.12935 [pdf, other]

Improving Low-Fidelity Models of Li-ion Batteries via Hybrid Sparse Identification of Nonlinear Dynamics

Authors: Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Prashanth Ramesh, Marcello Canova

Abstract: Accurate modeling of lithium ion (li-ion) batteries is essential for enhancing the safety, and efficiency of electric vehicles and renewable energy systems. This paper presents a data-inspired approach for improving the fidelity of reduced-order li-ion battery models. The proposed method combines a Genetic Algorithm with Sequentially Thresholded Ridge Regression (GA-STRidge) to identify and compen… ▽ More Accurate modeling of lithium ion (li-ion) batteries is essential for enhancing the safety, and efficiency of electric vehicles and renewable energy systems. This paper presents a data-inspired approach for improving the fidelity of reduced-order li-ion battery models. The proposed method combines a Genetic Algorithm with Sequentially Thresholded Ridge Regression (GA-STRidge) to identify and compensate for discrepancies between a low-fidelity model (LFM) and data generated either from testing or a high-fidelity model (HFM). The hybrid model, combining physics-based and data-driven methods, is tested across different driving cycles to demonstrate the ability to significantly reduce the voltage prediction error compared to the baseline LFM, while preserving computational efficiency. The model robustness is also evaluated under various operating conditions, showing low prediction errors and high Pearson correlation coefficients for terminal voltage in unseen environments. △ Less

Submitted 19 November, 2024; originally announced November 2024.

Comments: 6 pages

arXiv:2410.19735 [pdf, other]

Model merging with SVD to tie the Knots

Authors: George Stoica, Pratik Ramesh, Boglarka Ecsedi, Leshem Choshen, Judy Hoffman

Abstract: Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared… ▽ More Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3% across several vision and language benchmarks, including our new setting. We release our code at: https://github.com/gstoica27/KnOTS. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2408.05912 [pdf, other]

Correct Wrong Path

Authors: Bhargav Reddy Godala, Sankara Prasad Ramesh, Krishnam Tibrewala, Chrysanthos Pepi, Gino Chacon, Svilen Kanev, Gilles A. Pokam, Daniel A. Jiménez, Paul V. Gratz, David I. August

Abstract: Modern OOO CPUs have very deep pipelines with large branch misprediction recovery penalties. Speculatively executed instructions on the wrong path can significantly change cache state, depending on speculation levels. Architects often employ trace-driven simulation models in the design exploration stage, which sacrifice precision for speed. Trace-driven simulators are orders of magnitude faster th… ▽ More Modern OOO CPUs have very deep pipelines with large branch misprediction recovery penalties. Speculatively executed instructions on the wrong path can significantly change cache state, depending on speculation levels. Architects often employ trace-driven simulation models in the design exploration stage, which sacrifice precision for speed. Trace-driven simulators are orders of magnitude faster than execution-driven models, reducing the often hundreds of thousands of simulation hours needed to explore new micro-architectural ideas. Despite this strong benefit of trace-driven simulation, these often fail to adequately model the consequences of wrong path because obtaining them is nontrivial. Prior works consider either a positive or negative impact of wrong path but not both. Here, we examine wrong path execution in simulation results and design a set of infrastructure for enabling wrong-path execution in a trace driven simulator. Our analysis shows the wrong path affects structures on both the instruction and data sides extensively, resulting in performance variations ranging from $-3.05$\% to $20.9$\% when ignoring wrong path. To benefit the research community and enhance the accuracy of simulators, we opened our traces and tracing utility in the hopes that industry can provide wrong-path traces generated by their internal simulators, enabling academic simulation without exposing industry IP. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: 5 pages, 7 Figures, Submited to Computer Architecture Letters

arXiv:2309.17430 [pdf, other]

FACTS: First Amplify Correlations and Then Slice to Discover Bias

Authors: Sriram Yenamandra, Pratik Ramesh, Viraj Prabhu, Judy Hoffman

Abstract: Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes (e.g. context). Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. In this work, we study the problem of identifying such slices to inform downstream bias mitigati… ▽ More Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes (e.g. context). Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. In this work, we study the problem of identifying such slices to inform downstream bias mitigation strategies. We propose First Amplify Correlations and Then Slice to Discover Bias (FACTS), wherein we first amplify correlations to fit a simple bias-aligned hypothesis via strongly regularized empirical risk minimization. Next, we perform correlation-aware slicing via mixture modeling in bias-aligned feature space to discover underperforming data slices that capture distinct correlations. Despite its simplicity, our method considerably improves over prior work (by as much as 35% precision@10) in correlation bias identification across a range of diverse evaluation settings. Our code is available at: https://github.com/yvsriram/FACTS. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: Accepted to ICCV 2023

arXiv:2305.03053 [pdf, other]

ZipIt! Merging Models from Different Tasks without Training

Authors: George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman

Abstract: Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them t… ▽ More Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for 20-60% improvement over prior work, making it more feasible to merge models trained on disjoint tasks without retraining. △ Less

Submitted 12 March, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2203.06481 [pdf, other]

GATSBI: Generative Adversarial Training for Simulation-Based Inference

Authors: Poornima Ramesh, Jan-Matthis Lueckmann, Jan Boelts, Álvaro Tejero-Cantero, David S. Greenberg, Pedro J. Gonçalves, Jakob H. Macke

Abstract: Simulation-based inference (SBI) refers to statistical inference on stochastic models for which we can generate samples, but not compute likelihoods. Like SBI algorithms, generative adversarial networks (GANs) do not require explicit likelihoods. We study the relationship between SBI and GANs, and introduce GATSBI, an adversarial approach to SBI. GATSBI reformulates the variational objective in an… ▽ More Simulation-based inference (SBI) refers to statistical inference on stochastic models for which we can generate samples, but not compute likelihoods. Like SBI algorithms, generative adversarial networks (GANs) do not require explicit likelihoods. We study the relationship between SBI and GANs, and introduce GATSBI, an adversarial approach to SBI. GATSBI reformulates the variational objective in an adversarial setting to learn implicit posterior distributions. Inference with GATSBI is amortised across observations, works in high-dimensional posterior spaces and supports implicit priors. We evaluate GATSBI on two SBI benchmark problems and on two high-dimensional simulators. On a model for wave propagation on the surface of a shallow water body, we show that GATSBI can return well-calibrated posterior estimates even in high dimensions. On a model of camera optics, it infers a high-dimensional posterior given an implicit prior, and performs better than a state-of-the-art SBI approach. We also show how GATSBI can be extended to perform sequential posterior estimation to focus on individual observations. Overall, GATSBI opens up opportunities for leveraging advances in GANs to perform Bayesian inference on high-dimensional simulation-based models. △ Less

Submitted 12 March, 2022; originally announced March 2022.

arXiv:2109.13711 [pdf, other]

One to rule them all: Towards Joint Indic Language Hate Speech Detection

Authors: Mehar Bhatia, Tenzin Singhay Bhotia, Akshat Agarwal, Prakash Ramesh, Shubham Gupta, Kumar Shridhar, Felix Laumann, Ayushman Dash

Abstract: This paper is a contribution to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) 2021 shared task. Social media today is a hotbed of toxic and hateful conversations, in various languages. Recent news reports have shown that current models struggle to automatically identify hate posted in minority languages. Therefore, efficiently curbing hate speech is a crit… ▽ More This paper is a contribution to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) 2021 shared task. Social media today is a hotbed of toxic and hateful conversations, in various languages. Recent news reports have shown that current models struggle to automatically identify hate posted in minority languages. Therefore, efficiently curbing hate speech is a critical challenge and problem of interest. We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection across three languages namely, English, Hindi, and Marathi. On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B. These results show the efficacy of exploiting a multilingual training scheme. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: submitted to FIRE 2021 in the HASOC-FIRE shared task on hate speech and offensive language detection

arXiv:1707.01961 [pdf, ps, other]

Long-Term Memory Networks for Question Answering

Authors: Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao

Abstract: Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However,… ▽ More Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook's bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance. △ Less

Submitted 6 July, 2017; originally announced July 2017.

arXiv:1704.03152 [pdf, other]

Deep Multimodal Representation Learning from Temporal Data

Authors: Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, Edgar A. Bernal, Jiebo Luo

Abstract: In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlat… ▽ More In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion model for fusing multiple input modalities that are inherently temporal in nature. Key features of our proposed model include: (i) simultaneous learning of the joint representation and temporal dependencies between modalities, (ii) use of multiple loss terms in the objective function, including a maximum correlation loss term to enhance learning of cross-modal information, and (iii) the use of an attention model to dynamically adjust the contribution of different input modalities to the joint representation. We validate our model via experimentation on two different tasks: video- and sensor-based activity classification, and audio-visual speech recognition. We empirically analyze the contributions of different components of the proposed CorrRNN model, and demonstrate its robustness, effectiveness and state-of-the-art performance on multiple datasets. △ Less

Submitted 11 April, 2017; originally announced April 2017.

Comments: To appear in CVPR 2017

arXiv:1404.4107 [pdf]

Invisibility System Using Image Processing and Optical Camouflage Technology

Authors: Vasireddy Srikanth, Pillem Ramesh

Abstract: Invisible persons are seen in fiction stories only, but in the real world it is proved that invisibility is possible. This paper describes the creation of invisibility with the help of technologies like Optical camouflage; Image based rendering and Retro reflective projection. The object that needs to be made transparent or invisible is painted or covered with retro reflective material. Then a pro… ▽ More Invisible persons are seen in fiction stories only, but in the real world it is proved that invisibility is possible. This paper describes the creation of invisibility with the help of technologies like Optical camouflage; Image based rendering and Retro reflective projection. The object that needs to be made transparent or invisible is painted or covered with retro reflective material. Then a projector projects the background image on it making the masking object virtually transparent. Capturing the background image requires a video camera, which sits behind the person wearing the cloak. The video from the camera must be in a digital format so it can be sent to a computer for image processing using image based rendering technical. There are some useful applications for this simple but astonishing technology. △ Less

Submitted 8 February, 2014; originally announced April 2014.

Comments: IJETT, 2013

Showing 1–20 of 20 results for author: Ramesh, P